Wang et al. [
4] used pelvic CT scan images to detect and segment out ovarian cancer tumours simultaneously, i.e., creating a multi-task deep learning model. They proposed a model called YOLO-OCv2, which was an enhancement of their previously proposed algorithm. Mosaic enhancement was also used here, in order to improve the background information of the object. However, the multitask model YOLO-OCv2 outperformed other algorithms like Faster-RCNN, SSD and RetinaNet, which were trained on the COCO dataset. In this work, Mahmood et al. [
5] created a Nuclei segmentation model that could be used to segment out the nuclei in multiple locations of the body. The authors used Conditional Generative Adversarial Networks, or cGAN, as they can control the GAN training output depending on a class. The model was trained on synthetically generated data along with real data in order to make sure that sufficient input was present. The model was trained with data from nine organs and was tested on four organs, where it outperformed its peers, such as FCN, U-Net and Mask R-CNN. Guan et al. [
6] used mammographic images to detect breast cancer using CNN models. The authors focused on Affine transformations and synthetic data generation using GANs.
According to Karimi et al. [
7], a Vision Transformers (ViT) algorithm was proposed which divided images into Image Patches. The proposed algorithm using transformers did not use any convolution operations to segment the brain cortical plate and the hippocampus in MRI images of the brain. The results were compared with FCN architectures like 3D UNet++, Attention UNet, SE-FCN and DSRNet. The proposed network performed segmentation accurately when compared to the other, and with a significantly smaller number of labelled training images. Xu et al. [
8] worked on histopathological whole-slide images (WSIs) to detect ovarian cancer using CNNs trained on images of multiple resolutions. The authors proposed a new modified version of ResNet50 called the Heatmap ResNet50 algorithm for CNN-based patch selection, and ResNet18 along with MR-ViT was used for ViT-based slide classification. Li et al. [
9] introduced a variation of UNet known as CR-UNet to simultaneously segment out ovaries and follicles from transvaginal ultrasound (TVUS) images. The proposed model was then compared with models like DeepLabV3+, PSPNet-1, PSPNet-2 and U-Net to find that the proposed model outperformed them all. In the proposed work by Goodfellow et al. [
10], an adversarial net framework was suggested that loosely resembles a minimax two-player game. Nagarajan et al. [
11] and Zhao et al. [
12], in their research work, provided three approaches that were used to classify ovarian cancer types using CT images. The first approach used a deep convolutional neural network (DCNN) based on AlexNet, which did not provide satisfactory results. The second approach had an overfitting problem. To overcome this, GAN was used in the third approach to augment the image samples along with the DCNN, which provided the best results out of the three approaches in metrics such as precision, recall, f-measure and accuracy. The research work of Saha et al. [
13] included a novel 2D segmentation network called MU-net, which was a combination of MobileNetV2 and U-Net used to segment out follicles in ovarian ultrasound images. An USOVA3D Training Set 1 dataset was used. The proposed model was evaluated against several other models from previous works in the literature, and was shown to be more accurate, with an accuracy of 98.4%. Jin, J et al. [
14], in their work, used four UNet models: U-net, U-net++, U-net with Resnet and CE-Net to perform automatic segmentation. In Thangamma et al. [
15], the k-means algorithm and fuzzy c-means algorithm were used on ultrasound images of ovaries. It was concluded that the fuzzy c-means algorithm provided a better result than the k-means algorithm The work by Hema et al. [
16] involved FaRe-ConvNN, which applied annotations on the image dataset, where the images had three categories: epithelial, germ and stroma cells. In order to avoid overfitting and other issues due to the small dataset size, image augmentation using image enhancement and transformation techniques like resizing, masking, segmentation, normalization, vertical or horizontal flips and rotation was undertaken. FaRe-ConvNN was used to compensate for manual annotation. After the region-based training in FaRe-ConvNN, a combination of SVC and Gaussian NB classifiers was used to classify the images, which resulted in impressive precision and recall values [
17]. In the works carried out by Ashwini et al. [
18,
19,
20], various Deep Learning models were used to segment the CT scanned images and classify them using variants of CNN. In the work [
18,
19], Otsu’s method was used to segment the tumour and a dice score of 0.82 and Jaccard score of 0.8356 were obtained. Further, to perform segmentation, cGAN was used [
20] and, in this study, the segmentation and classification of tumours were carried out in a single pipeline, which obtained the dice score of 0.91 and the Jaccard score of 0.89. Similarly, in the works carried out by Fernandes et al. [
21,
22], according to the work [
21], the authors proposed the segmentation of brain MRI images using entropy-based techniques. As per [
22], the detection and classification of brain tumours by parallel processing was carried out using big data tools such as Kafka and PySpark.
Koonce et al. [
23] shed light on EfficientNet, which comprised the inverted residual blocks of MobileNet v2 combined with the MnasNet architecture to form a robust model for performing Image Recognition. Rehman et al. [
24] at BU-Net used a Residual Extended Skip (RES) block and a Wide Context (WC) block in a U-Net architecture to implement the proposed model, BU-Net, to segment Brain tumour cells in MRI scanned images. In the current work by Rehman et al. [
25], the authors proposed a model named BrainSeg-Net to achieve the segmentation of tumour. The proposed model included a Feature Enhancer (FE) block at every encoder stage to protect critical information that could be tampered with during the convolution and transformation processes. Jalali et al. [
26] proposed ResBCDU-Net for lung segmentation in CT images, which was used in applications such as in detecting lung cancer. To form the ResBCDU-Net, a pre-trained ResNet-34 network was used in place of an encoder in a typical U-Net model. The proposed method performed better than models like U-Net, RU-Net, ResNet34-UNet and BCDU-Net when measured using several evaluation metrics. Maureen et al. [
27] and Neelima et al. [
28] carried out an extensive review of bone image segmentation by considering the methods used in medical additive manufacturing. According to this review, global thresholding is the most commonly used method for segmentation and has obtained an accuracy of under 0.6 mm. Further, the authors have proposed using other advanced thresholding methods that may improve the accuracy to 0.38 mm. In the work carried out by Minnema et al. [
29], the CNN-based STL method was applied for bone segmentation in CT scan images, which was able to accurately segment the skull and obtain a mean dice value of 0.92 ± 0.4. As per [
30], a residual spatial pyramid pooling (RASPP) module was proposed to minimize the loss of location information in different modules. On similar lines, the work proposed by [
31] optimized the CNN UNet model by applying it on a CT dataset generated from the MRI images. The results showed that the model performed well on the CT images when compared with the MRI images.