Semantic Segmentation of Medical Images: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , ,

There have been major developments in deep learning in computer vision since the 2010s. Deep learning has contributed to a wealth of data in medical image processing, and semantic segmentation is a salient technique in this field. This study retrospectively reviews recent studies on the application of deep learning for segmentation tasks in medical imaging and proposes potential directions for future development, including model development, data augmentation processing, and dataset creation. The strengths and deficiencies of studies on models and data augmentation, as well as their application to medical image segmentation, were analyzed. Fully convolutional network developments have led to the creation of the U-Net and its derivatives. Another noteworthy image segmentation model is DeepLab.

  • semantic segmentation
  • medical image processing
  • deep learning
  • fully-convolutional network

1. Introduction

Medical imaging has long functioned as an assistive means for diagnosis and treatment. Advancements in technology have increased the types and qualities of medical images. Lesion detection is one of the primary objectives of medical imaging, as the size and location of lesions are often directly associated with a patient’s diagnosis, treatment, and prognosis. Previously, the size and location of lesions were determined by radiologists through medical image examination. At best, the instruments and software used were only able to enhance the image quality by adjusting the brightness and contrast features to facilitate better observation. Since the development of computer vision algorithms, however, researchers have begun to utilize these algorithms in the field of medical imaging [1].
As the core of deep learning, convolutional neural networks (CNN) (Figure 1) have a considerably long development history [2]. However, due to hardware-related limitations, it was only in the 2010s that breakthroughs in the effectiveness of CNNs were made [3]. Meanwhile, deep learning models that meet specific targets have gradually been proposed, from classification models, to object detection, to object segmentation. Consequently, advancements such as the detection of lung diseases through X-ray [4], the detection of lesion locations [5], and segmentation have been applied in medical imaging. Due to improvements in model performance, deep learning models have exhibited diagnostic capabilities approximating those of clinical physicians, based on the inference of specific datasets. However, in traditional CNN models, there are limitations to segmentation. One of them is about the features extracted from the models. When using a smaller kernel, the features will become more local to the original image. Global information, such as location, may be lost. However, when using a larger kernel, the context of features may decrease [6]. Another limitation is the data availability of biomedical segmentation tasks. Due to privacy in the medical profession, the medical data volume is often small when compared with the data volume in other fields [6]. New models have been developed to solve these problems. Another solution is data augmentation.
Figure 1. CNN kernel. Before input, the figure will be transformed into a signal, as shown on the left; 0/1, for example. The arrays in the middle are called the convolutional kernel. The size of the kernel must not be larger than the input. The neural network applies several kernels with different weight compositions to the input to obtain feature maps, which are usually dot products, as shown on the right. The neural network extracts a feature that can make them accomplish the tasks from those kernels.

2. Clinical Datasets and Relevant Studies

2.1. Lung Lesions

Some chest lesions have been studied, including lung lesions such as pneumothorax, lung nodules, pneumonia, cardiac lesions such as ventricle hypertrophy, and bony lesions such as rib fractures. Except for rib fractures, there are obvious targets for segmentation in these lesions, along with open datasets [44,45,46] for constructing a pretrained model so that the training for those tasks is more likely to succeed.
Table 2 shows research on lung lesion segmentation. Singadkar et al. [47] applied an FCN-based neural network, combined with residual blocks in the decoder section and long skip connections between the encoder and decoder, for lung nodule segmentation in CT scan images. They successfully reached an average Dice score of 0.95 and a Jaccard index of 0.887. Abedalla et al. [8] utilized multiple U-Net models with different backbones in each network for training and used a method similar to ensemble learning, in which four models are first summated according to fixed weights and then subjected to a threshold in order to accomplish segmentation via a pneumothorax during in inference phase. The weights and thresholds are manually adjusted. The network achieved a DSC of 0.86 in the 2019 Pneumothorax Challenge dataset.
Table 2. Research on lung lesion segmentation.
Authors (Year) Method Medical Image Performance Notes
Wang et al. [48] (2017) CF-CNN Computed tomography DC: 0.82 Central-focused CNN: extract features from 3D and 2D simultaneously
Wang et al. [49] (2017) MV-CNN Computed tomography DC: 0.77 Multi-scaled CNN
Maqsood et al. [50] (2021) DA-Net Computed tomography DC: 0.81
IoU: 0.76
U-Net-based, with atrous convolution and dense connection
Meraj et al. [51] (2020) CNN Computed tomography Accuracy: 0.83 For nodule detection using PCA and other machine learning techniques
Singadkar et al. [47] (2020) DDRN Computed tomography DSC: 0.95 ResNet-based, with deep deconvolution (residual block at the decoder)
Zhao et al. [36] (2020) 3D U-Net Computed tomography   3D U-Net combined with GAN for segmentation; another CNN for classifying nodule
Usman et al. [52] (2020) 3D U-Net Computed tomography DSC: 0.88
3D voxel feature, ResUNet, with semi-automated ROI selection
Keetha et al. [53] (2020) U-Det Computed tomography DSC: 0.83 U-Net cooperates with a bidirectional feature network (Bi-FPN)
Ozdemir et al. [54] (2020) 3D Vnet Computed tomography Sensitivity: 0.97 Combined segmentation and classification for lung nodule diagnosis
Hesamian et al. [55] (2019) FCN Computed tomography DSC: 0.81 Atrous convolution and residual block in FCN combined with conditioned random field (CRF)

2.2. Brain Lesions

Brain lesion detection includes brain tumors, strokes, traumatic brain injuries, and brain metastases. BraTS [56] (Figure 7a) is a brain tumor dataset with labels not only the location and size but also the cell type of tumors, primarily low-grade and high-grade gliomas. Magnetic resonance imaging (MRI) scans are divided into pretreatment and posttreatment images. In addition, each patient is scanned via instruments with varying magnetic field intensities (1.5 and 3 T) and protocols (2D and 3D). There are four major types of MR images: T1, T1c, T2, and FLAIR. The tumor edge is difficult to identify in segmentation tasks because of infiltrations, particularly those of high-grade gliomas, and the variety of degrees of contrast enhancement across different MRI scans.
Figure 7. Clinical dataset for training segmentation model. (a) BraTS is a dataset for glioblastoma multiforme. The figures show part of a case series with different MRI weights (from the left: T1, T1ce, T2, FLAIR) and annotations (rightmost: white—perifocal edema; yellow—tumor; red—necrosis). (b) LiTSis a dataset about liver and liver tumor segmentation. The figures show part of a case series with annotations (upper: red—liver; white—tumor) and CT images (lower). Reference: (a) from BRATS (Menze et al. [25,56,57]); (b) from LiTS (Bilic et al. [58]). The figures were illustrated by using Python 3.6 from the datasets.
Table 3 shows research on segmenting brain lesions. Isensee et al. [16] attempted to modify the structure of the U-Net architecture by using batch normalization and short skip connections such as s residual block in ResNet instead of a traditional convolutional block. Finally, they summated the outputs of each layer in the ascending part before entering the output part. The Dice coefficient was superior to that of the traditional U-Net architecture. In summary, most of the leading models in the BraTS dataset over the years have been based on U-Net architecture. Some of them have been modified from convolutional blocks, while others have been adjusted at the ascending part.
The intensity of stroke lesions in CT images can change over time after examination, especially infarction strokes. [59] In addition to CT, MRI datasets such as ISLES [60] have been established in recent years. Models trained with those datasets not only determine the location and region of stroke lesions but also facilitate physicians to determine the severity of brain damage and may predict the prognosis and potential of recovery. Zhang et al. [61] developed a multi-plane neural network structure to segment stroke lesions from diffusion-weighted magnetic resonance images. In contrast with the direct usage of 3D neural networks, they applied three neural networks that correspond to images on three different planes—axial, coronal, and sagittal—then integrated them into a multi-angle neural network, which is called a multi-plane fusion network. This neural network offers both segmentation and detection functions and can retain the original information from the input. Based on images from three different planes, the edges of lesions can be identified more accurately. The authors achieved a Dice coefficient of 62.2% and a sensitivity of 71.7% in the ISLES dataset.
Table 3. Research on brain lesion segmentation.
Authors (Year) Method Medical Image Performance Notes
Brain tumor        
Havaei et al. [62] (2016) Deep CNN Magnetic resonance images DC 1: 0.88 Cascade architecture using pre-output concatenation
Pereira et al. [63] (2016) CNN-based Magnetic resonance images DC: 0.88 Patch extraction from an image before entering the CNN
Isensee et al. [16] (2018) 3D U-Net Magnetic resonance images DC: 0.85 Modified from U-Net; summation for multi-level features
Xu et al. [33] (2020) U-Net Magnetic resonance images DC: 0.87 Attention-U-Net
McKinley et al. [64] (2018) deepSCAN Magnetic resonance images Mean DC
ET 2: 0.7
WT 3: 0.86
TC 4: 0.71
Bottleneck CNN design; dilated convolution
Wang et al.[65] (2016) Deep Lesion Symmetry ConvNet Magnetic resonance images Mean DSC 5: 0.63 Combined unilateral (local) and bilateral (global) voxel descriptor
Monteiro et al. [66] (2020) DeepMedic Computed tomography Differs according to size Three parallel 3D CNNs for different resolutions
Zhang et al. [61] (2020) U-Net Magnetic resonance images DSC: 0.62
IoU 6: 0.45
FPN for extraction first
1 Dice coefficient, 2 enhanced tumor, 3 whole tumor, 4 tumor core, 5 Dice similarity coefficient, 6 intersection over union.

2.3. Abdomen

Abdominal Organ Segmentation

The solid organs in the abdomen such as the liver, kidneys, spleen, and pancreas, as well as lower abdomen organs such as the prostate, have more prominent edges and distinct intensity values compared with the background, which is usually fat or peritoneum. Thus, they are obvious targets for segmentation. Convincing results could be achieved with traditional computer vision techniques [67,68]. Regarding the urinary bladder, due to its prominent edges, despite being a hollow organ, segmentation tasks could still be accomplished with trained models (particularly in the case of a distended bladder) [69]. There is a wealth of data focusing on abdominal organ segmentation [70,71,72]. In recent years, the application of deep learning for segmentation tasks has been considerably robust [73,74].

This entry is adapted from the peer-reviewed paper 10.3390/diagnostics12112765

This entry is offline, you can click here to edit this entry!
Video Production Service