Application of Deep Learning in Cancer Diagnoses

Application of Deep Learning in Cancer Diagnoses: Comparison

Please note this is a comparison between Version 1 by Yudong Zhang and Version 2 by Fanny Huang.

The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Deep learning has succeeded greatly in medical image-based cancer diagnosis.

cancer diagnosis
medical image
deep learning

1. Introduction

In recent years, the global incidence of cancer has remained high. Tens of millions of people are newly diagnosed with various types of cancer every year. At the same time, millions to nearly tens of millions of people around the world are killed by various types of cancer [1]. According to the 2020 global cancer burden data released by the International Agency for Research on Cancer (IARC) of the World Health Organization, the latest incidence and mortality trends of 36 cancer types in 185 countries worldwide are still grim. Based on the latest incidence rates, the world’s top ten cancers are female breast, lung, skin, prostate, colon, stomach, liver, rectum, esophageal, and cervix uteri [2].

At present, the diagnosis of cancer mainly depends on imaging diagnosis and pathological diagnosis ^[3][4][3,4]. In this case, early detection is the key to improving the survival rate of cancer patients [5]; non-invasive and efficient early screening has become an essential research topic. Imaging techniques include B-ultrasound, X-ray, computed tomography (CT), magnetic resonance imaging (MRI), etc. [6]. Through these imaging techniques, some cancerous symptoms of the body can be seen. A shadow in the lungs can be detected by CT, which can determine whether it is a symptom of lung cancer ^[7][8][7,8]. MRI is not only used to assist in the diagnosis and differentiation of nasopharyngeal carcinoma but also can be used to evaluate the extent of the cancer lesion: whether it involves the surrounding soft tissue and bone and whether there is metastasis to nearby lymph nodes [9]. Nodules or masses of different sizes in the thyroid can be found through a B-ultrasound examination and can also directly observe the size, shape, location, and boundary of the tumor in the thyroid through B-ultrasound [10]. Faced with a large amount of complex medical imaging information and growing demand for medical imaging diagnosis, artificial imaging diagnosis has many shortcomings, such as a heavy workload, susceptibility to subjective cognition, low efficiency, and high misdiagnosis rate.

Deep learning (DL), a branch of machine learning, is an algorithm based on an artificial neural network to learn features from data [11]. Deep learning proposes a method that enables computers to learn pattern features automatically and integrates feature learning into the process of model building, thus reducing the incompleteness caused by artificial design features and realizing the development of end-to-end prediction models [12].

Algorithms based on deep learning have advantages over humans in processing large data and complex non-deterministic data as well as in-depth mining of potential information in data [13]. Using deep learning to interpret medical images can help doctors locate lesions, assist in diagnosis, reduce the burden on doctors, reduce medical misjudgments, and improve the accuracy and reliability of diagnosis and prediction results. Deep learning techniques were successfully applied in various fields through medical images and physiological signals. Deep models have demonstrated excellent performance in many fields, such as medical image classification, segmentation, lesion detection, and registration ^[14][15][16][14,15,16]. Various types of medical images, such as X-ray, MRI, CT, etc., were used to develop accurate and reliable DL models to help clinicians diagnose lung cancer, rectal cancer, pancreatic cancer, gastric cancer, prostate cancer, brain tumors, breast cancer, etc. ^[17][18][19][17,18,19].

2. Application of Deep Learning in Cancer Diagnoses

2.1. Image Classification

A cancer diagnosis is a classification problem whose nature requires very high classification accuracy. In recent years, deep learning theory has broken through the bottleneck of manual feature selection and has made breakthrough progress in image classification, improving the accuracy of medical image classification and recognition and improving the generalization and promotion capabilities of its applications. CNN is the most successful architecture among deep learning methods. In 1995, Lo et al. ^[20][390] applied CNN to medical image analysis.

Fu’adah et al. ^[21][391] designed a model that can automatically identify skin cancer and benign tumor lesions using CNN. The model used several optimizers, including RMSprop, Adam, SGD, and Nadam, with a learning rate of 0.001. Among them, the Adam optimizer can identify skin lesions in the ISIC dataset into four categories with an accuracy rate of 99%. DBN was also introduced into the diagnosis of cancer. The detection performance of the DBN-based CAD system for breast cancer is significantly better than that of the traditional CAD system ^[22][392]. Anand et al. ^[23][393] developed a convolutional extreme learning machine (DC-ELM) algorithm for the assessment of cancer of the knee bone based on histopathological images. The test results showed an accuracy rate of 97.27%. The multi-classifier system based on a deep belief network designed by Beevi et al. ^[24][394] can accurately detect mitotic cells, which provides better performance for important diagnosis of breast cancer grading and prognosis.

Shahweli ^[25][395] applied an enhancer deep belief network constructed with two restricted Boltzmann machines to identify lung cancer with an accuracy of 96%. For brain tumor detection based on MRI, Kumar et al. ^[26][396] used group search-based multi-verse optimization (GS-MVO) to reduce the feature length and handed over the optimally selected features to the DBN to achieve higher classification accuracy. Abdel-Zaher et al. ^[27][397] proposed an automatic diagnostic system for detecting breast cancer. The system is based on the DBN unsupervised pre-training stage and then a back-propagation neural network stage with supervision. A pretrained back-propagation neural network with an unsupervised stage DBN achieves higher classification accuracy than a classifier with only one supervised stage. In breast cancer cases, the overall neural network accuracy was increased to 99.68%, the sensitivity was 100%, and the specificity was 99.47%.

Jeyaraj et al. ^[28][398] developed an unsupervised generative model based on the DBM for the classification of patterns of regions of interest in complex hyperspectral medical images; the classification accuracy and the success rate are superior to the traditional convolution network. Nawaz et al. ^[29][399] designed a multi-class breast cancer classification method based on the CNN model, which can not only classify breast tumors into benign or malignant but can also predict the subclasses of tumors, such as fibroadenoma, lobular carcinoma, etc. Jabeen et al. ^[30][400] proposed a breast cancer classification framework from ultrasound images using deep learning and the best-selected features fusion. The experiments were performed on the augmented Breast Ultrasound Image (BUSI) dataset with the best accuracy of 99.1%.

El-Ghany et al. ^[31][401] proposed a fine-tuned learning model based on a pretrained ResNet101 network for the diagnosis of multi-types of cancer lesions. A benchmark cancer lesion dataset of over 25,000 histopathology images trained with transfer learning can classify photos of colon and lung cancer into five categories: adenocarcinomas lung, squamous cell carcinomas, benign lung, benign colon, and adenocarcinomas colon.

2.2. Image Detection

The goal of image detection is to determine where objects are located in a given image (usually with a bounding box) and which class each object belongs to ^[32][407]. That is, it includes the two tasks of correct localization and accurate classification. In recent years, object detection algorithms based on deep learning have mainly formed two categories: candidate region-based algorithms and regression-based algorithms. The object detection algorithm based on the candidate region is also called the two-stage method, which divides object detection into two stages: one is to generate a candidate region and the other is to put the candidate region into the classifier for classification and detection.

The object detection algorithm based on candidate regions can obtain richer features and a higher accuracy but the detection speed is relatively slow. Typical algorithms based on candidate regions include R-CNN series, such as R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, etc. In 2014, R-CNN was proposed by Girshick et al. ^[33][408]. Although R-CNN has achieved great success in the field of object detection, there are problems, such as the need to train multiple models and a large number of repeated calculations, which leads to the low detection efficiency of R-CNN. In 2015, the team further improved the R-CNN method and proposed the Fast R-CNN model ^[34][409]. Fast R-CNN only extracts features from the input image once during the detection process, thereby avoiding the waste of computing resources in feature extraction. However, it still uses selective search to generate proposals, which will also limit the detection speed. In 2015, Ren et al. ^[35][410] proposed the Region Proposal Network (RPN) and combined it with Fast R-CNN, which is called the Faster R-CNN network.

In the research on automatic detection and classification of oral cancer, Welikala et al. ^[36][411] used ResNet-101 for object classification and Faster R-CNN for object detection. Image classification achieved an F1 score of 87.07% in identifying images containing lesions and 78.30% in identifying images requiring referral. Mask R-CNN ^[37][412] is one of the most popular object detection algorithms at present. It abandons the process of pre-determining the region of interest in the traditional two-stage detection algorithm and can directly process the image and then perform network detection on the target area. Therefore, the algorithm can greatly reduce the detection time, and at the same time, the accuracy rate is also greatly improved.

In ^[38][413], Mask R-CNN was used to detect gastric cancer pathological sections, segment cancer fossa, and optimize by adjusting parameters. This method finally made it obtained the test result with an AP value of 61.2 when detecting medical images. In ^[39][414], a Mask R-CNN is used to search the entire set of images and detect suspicious lesions, which allows the entire image without prior breast segmentation search and achieves an accuracy of 0.86 on a per-slice basis analysis in the training dataset.

The regression-based object detection algorithm has only one stage and directly regresses the category and coordinates of the target on the input image, and the calculation speed is relatively fast. Typical regression-based algorithms mainly include YOLO series ^[40][415], SSD series ^[41][416], etc.

Due to a large number of medical image data modalities and large changes in lesion scales, scholars often improve object detection algorithms in the field of natural images to make them suitable for lesion detection in the field of medical images. Gao et al. ^[42][417] used the LSTM model and long-distance LSTM (DLSTM) to collect the time changes of longitudinal data in CT detection of lung cancer. The temporal emphasis model (TEM) in DLSTM supports learning at regular and irregular sampling intervals. Three-dimensional CNN was used to diagnose breast cancer in a weakly supervised manner and locate lesions in dynamic contrast enhancement (DCE) MRI data, showing high accuracy with an overall Dice distance of 0.501 ± 0.274. Studies have shown that the weakly supervised learning method can locate lesions in volumetric radiological images with image-level labels only ^[43][418].

Asuntha et al. ^[44][419] proposed a CNN method based on the Fuzzy Particle Swarm Optimization (FPSO) algorithm (FPSOCNN) to detect and classify lung cancer. This method greatly reduces the computational complexity of CNN. Shen et al. ^[45][420] proposed a globally-aware multiple instance classifier (GMIC) to localize malignant lesions in a weakly supervised manner. The model is capable of processing medical images at the original resolution and is able to generate pixel-level saliency maps, which provide additional interpretability. Applying the model to screening mammography classification and localization requires less memory and is faster to train than ResNet-34 and faster than RCNN.

Ranjbarzadeh et al. ^[46][421] proposed C-ConvNet/C-CNN to simultaneously mine local features and global features through two different paths and introduced a new distance attention (DWA) mechanism to consider the influence of tumor center location and brain within the model. In ^[47][422], a center-point matching detection network (SCPM-Net) based on 3D sphere representation was proposed, which consisted of two parts: sphere representation and center points matching. The model automatically predicts nodule radius, position, and offsets of nodules without manually designing nodule/anchor parameters. In ^[48][423], a two-stage model for breast cancer detection using thermographic images was proposed. In the first stage, VGG16 is used to extract features from the image. In the second stage, the Dragonfly Algorithm (DA) based on the Grunwald-Letnikov (GL) method is used to select the optimal feature subset. The evaluation results show that the model reduces 82% of the features compared with the VGG16 model.

In ^[49][424], a dense dual-task network (DDTNet) was used to simultaneously realize the automatic detection and segmentation of tumor-infiltrating lymphocytes (TILs) in histopathological images. A semi-automatic method (TILAnno) was used to generate high-quality boundary annotations for TILs in H- and E-stained histopathology images. Maqsood et al. ^[50][425] proposed the transferable texture convolutional neural network (TTCNN), which only includes three layers of convolution and one energy layer. The network integrates the energy layer to extract texture features from the convolutional layer. The convolutional sparse image decomposition method is used to fuse all the extracted feature vectors, and finally, the entropy-controlled firefly method is used to select the best features.

2.3. Image Segmentation

Image segmentation refers to dividing an entire image into a series of regions ^[51][428] which can be considered a problem of pixel-level classification. Accurate medical image segmentation can assist doctors in judging the condition, quantitatively analyzing the lesion area, and providing a reliable basis for correct disease diagnosis. Image segmentation tasks can be divided into two categories: semantic segmentation and instance segmentation ^[52][429]. Semantic segmentation assigns a class to each pixel in an image but objects within the same class are not differentiated. Instance segmentation only classifies specific objects. This looks similar to target detection. The difference is that target detection outputs the boundary box and category of the target and instance segmentation outputs the mask and category of the target.

The early traditional medical image segmentation methods mainly include boundary ex-traction, region-based segmentation, threshold-based segmentation, etc. ^[51][428]. Among them, NormalizedCut ^[53][430], GraphCut ^[54][431], and GrabCut ^[55][432] are the most commonly used segmentation techniques. With the progress of deep learning networks, a new generation of image segmentation models, such as FCN, U-Net, and their variants, were produced and their segmentation performance was significantly improved.

The image segmentation method based on deep learning establishes the mapping of pixels and the pixels in a certain range to instances or categories through the known sample data. This kind of method uses the powerful nonlinear fitting ability of deep learning and uses a large number of sample data to participate in training. The mapping model established by this method has high accuracy.

FCN is applied to the cancer region segmentation system to fully extract the feature information of different scales in the input image and has a better segmentation effect than the network that does not introduce information of different scales. Recurrent Fully Convolutional Networks (RFCNs) ^[56][433] are used to directly solve the segmentation problem in multi-slice MR images.

Wang et al. ^[57][434] designed a fully convolutional network which applied multi-parametric magnetic resonance images to detect prostate cancer, including a stage of prostate segmentation and a stage of tumor detection. Dong et al. ^[58][435] applied a hybrid fully convolutional neural network (HFCNN) to segment the liver and detect liver metastases. In the system, the loss function is evaluated on the whole image segmentation object. The network processes full images rather than patches, merging different scales by adding links that blend the last detection with lower layers with finer metrics, producing lesion heat maps. Shukla et al. ^[59][436] proposed a classification and detection method for liver cancer, which used cascaded fully convolutional neural networks (CFCNs) to train using the input of segmented tumor regions. On 3DIRCAD datasets of different volumes, the total accuracy of the training and testing process is 93.85%, which can minimize the error rate.

The U-Net model ^[60][212] is an improvement and extension of FCN. It follows the idea of FCN for image semantic segmentation, that is, using a convolutional layer and pooling layer for feature extraction and then using the deconvolution layer to restore image size. Experiments have proved that the U-Net model can obtain more accurate classification results with fewer training samples. U-Net was developed specifically for medical image data and did not require many annotated images. In addition, due to the existence of high-performance GPU computing, networks with more layers can be trained ^[61][437], and the emergence of U-Net in the field of medical image segmentation has been increasing in recent years.

Ayalew et al. ^[62][438] designed a method based on the U-Net architecture to segment the liver and tumors from abdominal CT scan images. The number of filters and network layers of the original U-Net model were modified to reduce the network complexity and improve segmentation performance. Dice scores using this algorithm were 0.96 for liver segmentation, 0.74 for the segmentation of tumors from the liver, and 0.63 for the segmentation of tumors from abdominal CT scan images. However, it still faces great challenges in segmenting small and irregular tumors.

2.4. Image Registration

Medical image registration refers to seeking a kind of (or a series of) spatial transformations for a medical image to make it consistent with the corresponding points on another medical image ^[63][460]. This consistency means that the same anatomical point on the human body has the same spatial position in the two matching images. Using the correct image registration method can accurately fuse a variety of information into the same image, making it easier and more accurate for doctors to observe lesions and structures from various angles. At the same time, through the registration of dynamic images collected at different times, the changes in lesions and organs can be quantitatively analyzed, making a medical diagnosis, surgical planning, and radiotherapy planning more accurate and reliable.

To solve the problem of missing data caused by tumor resection, Wodzinski et al. ^[64][461] proposed a nonrigid image registration method based on an improved U-Net architecture for breast tumor bed localization. The algorithm works simultaneously at several image resolutions to handle large deformations and a specialized volume penalty is proposed to incorporate medical knowledge of tumor resection into the registration process.

In order to realize the prediction of organ-at-risk (OAR) segmentation on cone-beam CT (CBCT) from the segmentation on planning CT, Han et al. ^[65][462] proposed a CT-to-CBCT deformable registration model, which can enable accurate deformation registration CT and CBCT images of pancreatic cancer patients treated with high biological radiation doses. The model uses regularity loss, image similarity loss, and OAR segmentation similarity loss to penalize the mismatch between warped CT segmentation and manually drawn CBCT segmentation. Compared with intensity-based algorithms, this registration model not only improves segmentation accuracy but also reduces the processing time by an order of magnitude. In ^[66][463], a novel pipeline was proposed to achieve accurate registration from 2D US to 3D CT/MR. This registration pipeline starts with a classification network for coarse orientation estimation, followed by a segmentation network for predicting ground-truth planes in 3D volumes, enabling fully automated slice-to-volume registration in one shot. In ^[67][464], a method for the deformation simulation of inter-fraction in high-dose rate brachytherapy was proposed, which is applied to the deformable image registration (DIR) algorithm based on deep learning, which can directly realize the inter-fraction image alignment of HDR sessions for inter-fraction high dose rate brachytherapy in cervical cancer.

In ^[68][465], a CBCT–CBCT deformable image registration was proposed for radiotherapy of abdominal cancer patients. It is based on unsupervised deep learning and its registration workflow includes training and reasoning stages, which share the same feedforward path through a spatial transformation-based network (STN). STNS consists of global generative adversarial networks (GlobalGAN) and local GAN (LocalGAN), which predict coarse and fine-scale motions, respectively. Xie et al. ^[69][466] introduced point metric and masking techniques and proposed an improved B-splines-based DIR method to address the large deformation and non-correspondence due to tumor resection and clip insertion to improve registration accuracy. The point metric minimizes the distance between two point sets with known correspondences for regularization of intensity-based B-spline registration. Masking techniques reduce the impact of non-corresponding regions in breast computed tomography (CT) images. This method can be applied to the determination of the target body in the radiotherapy treatment planning after breast cancer surgery. In ^[70][467], LungRegNet was proposed for unsupervised deep learning-based deformable image registration (DIR) of 4D-CT lung images. It consists of two subnetworks, CoarseNet and FineNet, which predict lung motion on coarse-scale images and local lung motion on fine-scale images, respectively. The method showed excellent robustness, registration accuracy, and computational speed, with a mean and standard deviation of Target Registration Error (TRE) of 1.00±0.53 mm and 1.59±1.58 mm, respectively. In order to maintain the original topology during deformation to enhance image registration performance, a cycle-consistent deformable image registration called CycleMorph was proposed by ^[71][468]. The model can be applied to both 2D and 3D registration and can be easily extended to a multi-scale to solve the memory problem in large-volume registration.

2.5. Image Reconstruction

Image reconstruction is a method to reconstruct an image based on the data obtained from object detection. Due to the limitations of the physical imaging system, it is difficult for some medical equipment to obtain real medical images and doctors cannot clearly see the specific conditions of lesions in the images, resulting in misdiagnosis, which is not conducive to accurate diagnosis and treatment. Therefore, under the existing hardware conditions, image reconstruction technology can break through the inherent limitations of hardware, improve the quality of medical images, and reduce operating costs but also provide medical personnel with clear images and further improve the accurate diagnosis of diseases. Technology based on deep learning improves the speed, accuracy, and robustness of medical image reconstruction ^[72][73][472,473].

Kim et al. ^[72][472] used conventional methods and deep learning-based imaging reconstruction (DLR) with two different noise reduction factors (MRIDLR30 and MRIDLR50) to reconstruct axial T2WI in patients who underwent long-term rectal cancer chemoradiotherapy (CRT) and high-resolution rectal MRI and measured the tumor signal-to-noise ratio (SNR). The results showed that the MR images produced by DLR had a higher resolution and signal-to-noise ratio and that the specificity of identifying pathological complete responses (pCR) was significantly higher than that of conventional MRI. In ^[74][474], their results confirmed that the DLIR algorithm for pancreatic protocol dual-energy computed tomography (DECT) significantly improved image quality and reduced the variability of iodine concentration (IC) values compared with hybrid infrared. Kuanar et al. ^[75][475] proposed a GAN-based autoencoder network capable of denoising low-dose CT images, which first maps CT images to a low-dimensional manifold and then images are recovered from their corresponding manifold representations.

A major disadvantage of MRI is the long examination time. The deep learning image reconstruction (DLR) method was introduced and achieved good results in scanning acceleration. In the diagnosis of prostate cancer using multiparametric magnetic resonance imaging (mpMRI), Gassenmaier et al. ^[76][476] proposed an accelerated deep learning image reconstruction T2-weighted turbo spin echo (TSE) sequence, which reduced the acquisition time by more than 60%. In order to solve the problem that traditional optical image reconstruction methods based on the finite element method (FEM) are time-consuming and cannot fully restore the lesion contrast, Deng et al. ^[77][477] proposed FDU-Net, which consists of a fully connected subnetwork, a convolutional encoder–decoder subnetwork, and a U-Net. Among them, the U- Net is used for fast and end-to-end reconstruction of 3D diffuse optical tomography (DOT) images. Training is performed on digital phantoms consisting of randomly located singular spherical inclusions of various sizes and contrasts. The results show that after training, the ability of the FDU-Net to recover the real inclusion contrast and location without using any inclusion information in the reconstruction process is greatly improved and the FDU-Net trained on simulated data can be successfully measured from real patients. Breast tumors were reconstructed from the data.

Feng et al. ^[78][478] developed a deep learning-based algorithm (Z-Net) for MRI-guided non-invasive near-infrared spectral tomography (NIRST) reconstruction, which was the first algorithm to use DL for combined multimodality image reconstruction and contributed to better detection of breast cancer. The method avoids MRI image segmentation and light propagation modeling and can simultaneously recover chromophor concentrations of oxy-hemoglobin (HbO), deoxy-hemoglobin (Hb), and water through end-to-end training. Trained neural networks with only simulated datasets can be directly used to distinguish between malignant and benign breast tumors. Wei et al. ^[79][479] proposed a real-time 3D MRI reconstruction method from cine-MRI based on unsupervised network for radiation therapy for thoracic and abdominal tumors in MRI-guided radiotherapy for liver cancer. In this method, reference 3D-MRI and cinema-mri are used to generate training data.

2.6. Image Synthesis

Medical image fusion can be divided into intra-modality and inter-modality image synthesis. Inter inter-modality image synthesis refers to the synthesis of images between two different imaging modalities, such as MR-to-CT, CT-to-MR, PET-to-CT, etc. On the other hand, intra-modality synthesis refers to converting images between two different protocols of the same imaging mode, such as between MRI sequences or restoring images from low-quality protocols to high-quality ones ^[80][481]. Deep learning-based methods show higher performance in medical synthetic image accuracy than traditional methods. Deep learning models can more effectively map the correlation between input and output in the design of image transformation and synthesis rules ^[81][482] and construct high-level features (such as shapes and objects) by learning low-level features (such as edges and textures) ^[82][483] for medical image synthesis with supervised or unsupervised network architectures. Among these architectures, the representative methods include GAN and U-net.

In ^[83][484], a generative model was used to generate new images to reduce dataset imbalances to improve the performance of the automatic gastric cancer detection system. The synthetic network can also generate realistic images even when the lesion image dataset is small. This method allows lesion patches to be attached to various parts of normal images. The experimental results show that the dataset bias is reduced but when the number of synthetic images input to the training dataset is changed, the performance of the model changes. When 20,000 synthetic images were added, the model achieved the highest AP score and when more images were added, the performance of the model decreased. Yu et al. ^[84][485] proposed a three-dimensional conditional generation adversarial network (cGAN) and local adaptive synthesis scheme to synthesize fluid-attenuated inversion recovery (FLAIR) images from T1 so as to effectively deal with the single modal brain tumor segmentation based on T1.

Saha et al. ^[85][486] proposed TilGAN, an efficient generative adversarial network, to generate high-quality synthetic pathological images of tumor-infiltrating lymphocytes (TILs) and then classify TIL and non-TIL regions. The TilGAN-generated image obtained higher Inception scores and lower initial kernel distances as well as Fréchet Inception distances than the real image. In the classification and diagnosis of skin cancer, Abhishek et al. ^[86][487] proposed a conditional GAN-based Mask2Lesion model, which is trained with segmentation masks available in the training dataset and is used to generate new lesion images with any arbitrary mask, augmenting the original training dataset.

Qin et al. ^[87][488] proposed a style-based generation adversarial network (GAN) model to generate skin lesion images in order to solve the problem of a lack of labeled data and an imbalance in dataset categories. This model applies the data augmentation technique based on GANs to the classification of skin lesions and improves the classification performance. Baydoun et al. ^[88][489] proposed the sU-cGAN model for MR–CT image translation for cervical cancer diagnosis and treatment. In this model, a shallow U-Net (sU-Net) with an encoder/decoder depth of 2 is used as the generator of a conditional GAN (cGAN) network. The trainable parameters of sU-cGAN are less than those of ordinary cGAN networks.

Zhang et al. ^[89][490] used a conditional generative adversarial network to generate synthetic computed tomography images of head and neck cancer patients from CBCT, while maintaining the same cone-beam computed tomography anatomy with accurate computed tomography numbers. The proposed method overcomes the limitations of cone beam computed tomography (CBCT) imaging artifacts and Hounsfield unit imprecision. Sun et al. ^[90][491] proposed double U-Net CycleGAN for 3D MR to CT image synthesis. This method solves the problem that GAN and its variants often lead to spatial inconsistencies in contiguous slices when applied to medical image synthesis. Experimental results show that this method can realize the conversion of MRI images to CT images by using unordered paired data and synthesizing better 3D CT images with less computation and memory. Chen et al. ^[91][492] used U-net to generate synthetic CT images from conventional MR images in seconds for intensity-modulated radiation therapy treatment planning for the prostate.

Bahrami et al. ^[92][493] designed an efficient convolutional neural network (eCNN) for generating accurate synthetic computed tomography (sCT) images from MRI. The network is based on the encoder–decoder network in the U-net model, without softmax layers and the residual network, and the eCNN model shows effective learning ability using only 12 training objects. Synthetic MRI (SyMRI) technology can generate various image contrasts (including fluid-attenuated inversion recovery images, T1WIs, T2-weighted images, and double inversion recovery sequences) and adjust subtle parameter values with information from a single scan to improve the visualization of lesions, which can also quantify T1, T2, and proton density (PD) ^[93][494]. In ^[93][494], studies have shown that synthetic MRI variables can be used to quantitatively assess bone lesions in the lower trunk of prostate cancer patients and that PD values can be used to determine the viability of prostate cancer bone metastases. Trans-rectal ultrasound (TRUS) images are one of the effective methods to visualize prostate structures. Pang et al. ^[94][495] proposed a method to generate TRUS-style images from prostate MR images for prostate cancer brachytherapy.