There have been major developments in deep learning in computer vision since the 2010s. Deep learning has contributed to a wealth of data in medical image processing, and semantic segmentation is a salient technique in this field. Lesion detection is one of the primary objectives of medical imaging, as the size and location of lesions are often directly associated with a patient’s diagnosis, treatment, and prognosis. Since the development of computer vision algorithms, however, researchers have begun to utilize these algorithms in the field of medical imaging.
1. Introduction
Medical imaging has long functioned as an assistive means for diagnosis and treatment. Advancements in technology have increased the types and qualities of medical images. Lesion detection is one of the primary objectives of medical imaging, as the size and location of lesions are often directly associated with a patient’s diagnosis, treatment, and prognosis. Previously, the size and location of lesions were determined by radiologists through medical image examination. At best, the instruments and software used were only able to enhance the image quality by adjusting the brightness and contrast features to facilitate better observation. Since the development of computer vision algorithms, however, researchers have begun to utilize these algorithms in the field of medical imaging
[1].
As the core of deep learning, convolutional neural networks (CNN) (
Figure 1) have a considerably long development history
[2]. However, due to hardware-related limitations, it was only in the 2010s that breakthroughs in the effectiveness of CNNs were made
[3]. Meanwhile, deep learning models that meet specific targets have gradually been proposed, from classification models, to object detection, to object segmentation. Consequently, advancements such as the detection of lung diseases through X-ray
[4], the detection of lesion locations
[5], and segmentation have been applied in medical imaging. Due to improvements in model performance, deep learning models have exhibited diagnostic capabilities approximating those of clinical physicians, based on the inference of specific datasets. However, in traditional CNN models, there are limitations to segmentation. One of them is about the features extracted from the models. When using a smaller kernel, the features will become more local to the original image. Global information, such as location, may be lost. However, when using a larger kernel, the context of features may decrease
[6]. Another limitation is the data availability of biomedical segmentation tasks. Due to privacy in the medical profession, the medical data volume is often small when compared with the data volume in other fields
[6]. New models have been developed to solve these problems. Another solution is data augmentation.
Figure 1. CNN kernel. Before input, the figure will be transformed into a signal, as shown on the left; 0/1, for example. The arrays in the middle are called the convolutional kernel. The size of the kernel must not be larger than the input. The neural network applies several kernels with different weight compositions to the input to obtain feature maps, which are usually dot products, as shown on the right. The neural network extracts a feature that can make them accomplish the tasks from those kernels.
2. Clinical Datasets and Relevant Studies
2.1. Lung Lesions
Some chest lesions have been studied, including lung lesions such as pneumothorax, lung nodules, pneumonia, cardiac lesions such as ventricle hypertrophy, and bony lesions such as rib fractures. Except for rib fractures, there are obvious targets for segmentation in these lesions, along with open datasets
[7][8][9] for constructing a pretrained model so that the training for those tasks is more likely to succeed.
Table 1 shows research on lung lesion segmentation. Singadkar et al.
[10] applied an FCN-based neural network, combined with residual blocks in the decoder section and long skip connections between the encoder and decoder, for lung nodule segmentation in CT scan images. They successfully reached an average Dice score of 0.95 and a Jaccard index of 0.887. Abedalla et al.
[11] utilized multiple U-Net models with different backbones in each network for training and used a method similar to ensemble learning, in which four models are first summated according to fixed weights and then subjected to a threshold in order to accomplish segmentation via a pneumothorax during in inference phase. The weights and thresholds are manually adjusted. The network achieved a DSC of 0.86 in the 2019 Pneumothorax Challenge dataset.
Table 1. Research on lung lesion segmentation.
Authors (Year) |
Method |
Medical Image |
Performance |
Notes |
Wang et al. [12] (2017) |
CF-CNN |
Computed tomography |
DC: 0.82 |
Central-focused CNN: extract features from 3D and 2D simultaneously |
Wang et al. [13] (2017) |
MV-CNN |
Computed tomography |
DC: 0.77 |
Multi-scaled CNN |
Maqsood et al. [14] (2021) |
DA-Net |
Computed tomography |
DC: 0.81 IoU: 0.76 |
U-Net-based, with atrous convolution and dense connection |
Meraj et al. [15] (2020) |
CNN |
Computed tomography |
Accuracy: 0.83 |
For nodule detection using PCA and other machine learning techniques |
Singadkar et al. [10] (2020) |
DDRN |
Computed tomography |
DSC: 0.95 |
ResNet-based, with deep deconvolution (residual block at the decoder) |
Zhao et al. [16] (2020) |
3D U-Net |
Computed tomography |
|
3D U-Net combined with GAN for segmentation; another CNN for classifying nodule |
Usman et al. [17] (2020) |
3D U-Net |
Computed tomography |
DSC: 0.88 (consensus) |
3D voxel feature, ResUNet, with semi-automated ROI selection |
Keetha et al. [18] (2020) |
U-Det |
Computed tomography |
DSC: 0.83 |
U-Net cooperates with a bidirectional feature network (Bi-FPN) |
Ozdemir et al. [19] (2020) |
3D Vnet |
Computed tomography |
Sensitivity: 0.97 |
Combined segmentation and classification for lung nodule diagnosis |
Hesamian et al. [20] (2019) |
FCN |
Computed tomography |
DSC: 0.81 |
Atrous convolution and residual block in FCN combined with conditioned random field (CRF) |
2.2. Brain Lesions
Brain lesion detection includes brain tumors, strokes, traumatic brain injuries, and brain metastases. BraTS
[21] (
Figure 2a) is a brain tumor dataset with labels not only the location and size but also the cell type of tumors, primarily low-grade and high-grade gliomas. Magnetic resonance imaging (MRI) scans are divided into pretreatment and posttreatment images. In addition, each patient is scanned via instruments with varying magnetic field intensities (1.5 and 3 T) and protocols (2D and 3D). There are four major types of MR images: T1, T1c, T2, and FLAIR. The tumor edge is difficult to identify in segmentation tasks because of infiltrations, particularly those of high-grade gliomas, and the variety of degrees of contrast enhancement across different MRI scans.
Figure 2. Clinical dataset for training segmentation model. (
a) BraTS is a dataset for glioblastoma multiforme. The figures show part of a case series with different MRI weights (from the left: T1, T1ce, T2, FLAIR) and annotations (rightmost: white—perifocal edema; yellow—tumor; red—necrosis). (
b) LiTSis a dataset about liver and liver tumor segmentation. The figures show part of a case series with annotations (upper: red—liver; white—tumor) and CT images (lower). Reference: (
a) from BRATS (Menze et al.
[21][22][23]); (
b) from LiTS (Bilic et al.
[24]). The figures were illustrated by using Python 3.6 from the datasets.
Table 2 shows research on segmenting brain lesions. Isensee et al.
[25] attempted to modify the structure of the U-Net architecture by using batch normalization and short skip connections such as s residual block in ResNet instead of a traditional convolutional block. Finally, they summated the outputs of each layer in the ascending part before entering the output part. The Dice coefficient was superior to that of the traditional U-Net architecture. In summary, most of the leading models in the BraTS dataset over the years have been based on U-Net architecture. Some of them have been modified from convolutional blocks, while others have been adjusted at the ascending part.
The intensity of stroke lesions in CT images can change over time after examination, especially infarction strokes.
[26] In addition to CT, MRI datasets such as ISLES
[27] have been established in recent years. Models trained with those datasets not only determine the location and region of stroke lesions but also facilitate physicians to determine the severity of brain damage and may predict the prognosis and potential of recovery. Zhang et al.
[28] developed a multi-plane neural network structure to segment stroke lesions from diffusion-weighted magnetic resonance images. In contrast with the direct usage of 3D neural networks, they applied three neural networks that correspond to images on three different planes—axial, coronal, and sagittal—then integrated them into a multi-angle neural network, which is called a multi-plane fusion network. This neural network offers both segmentation and detection functions and can retain the original information from the input. Based on images from three different planes, the edges of lesions can be identified more accurately. The authors achieved a Dice coefficient of 62.2% and a sensitivity of 71.7% in the ISLES dataset.
Table 2. Research on brain lesion segmentation.
Authors (Year) |
Method |
Medical Image |
Performance |
Notes |
Brain tumor |
|
|
|
|
Havaei et al. [29] (2016) |
Deep CNN |
Magnetic resonance images |
DC 1: 0.88 |
Cascade architecture using pre-output concatenation |
Pereira et al. [30] (2016) |
CNN-based |
Magnetic resonance images |
DC: 0.88 |
Patch extraction from an image before entering the CNN |
Isensee et al. [25] (2018) |
3D U-Net |
Magnetic resonance images |
DC: 0.85 |
Modified from U-Net; summation for multi-level features |
Xu et al. [31] (2020) |
U-Net |
Magnetic resonance images |
DC: 0.87 |
Attention-U-Net |
McKinley et al. [32] (2018) |
deepSCAN |
Magnetic resonance images |
Mean DC ET 2: 0.7 WT 3: 0.86 TC 4: 0.71 |
Bottleneck CNN design; dilated convolution |
Stroke |
|
|
|
|
Wang et al.[33] (2016) |
Deep Lesion Symmetry ConvNet |
Magnetic resonance images |
Mean DSC 5: 0.63 |
Combined unilateral (local) and bilateral (global) voxel descriptor |
Monteiro et al. [34] (2020) |
DeepMedic |
Computed tomography |
Differs according to size |
Three parallel 3D CNNs for different resolutions |
Zhang et al. [28] (2020) |
U-Net |
Magnetic resonance images |
DSC: 0.62 IoU 6: 0.45 |
FPN for extraction first |
2.3. Abdomen
Abdominal Organ Segmentation
The solid organs in the abdomen such as the liver, kidneys, spleen, and pancreas, as well as lower abdomen organs such as the prostate, have more prominent edges and distinct intensity values compared with the background, which is usually fat or peritoneum. Thus, they are obvious targets for segmentation. Convincing results could be achieved with traditional computer vision techniques
[35][36]. Regarding the urinary bladder, due to its prominent edges, despite being a hollow organ, segmentation tasks could still be accomplished with trained models (particularly in the case of a distended bladder)
[37]. There is a wealth of data focusing on abdominal organ segmentation
[38][39][40]. In recent years, the application of deep learning for segmentation tasks has been considerably robust
[41][42].
This entry is adapted from the peer-reviewed paper 10.3390/diagnostics12112765