You're using an outdated browser. Please upgrade to a modern browser for the best experience.
Lung Cancer: Lung Segmentation in Computer-Aided Decision Systems: Comparison
Please note this is a comparison between Version 3 by Conner Chen and Version 2 by Conner Chen.

Lung segmentation is a critical task necessary in the majority of lung imaging computer-aided decision (CAD) systems studies. Despite not being provided to radiologists in real scenarios, an accurate lung mask is absolutely crucial in the development of clinical support tools, avoiding the inclusion of noise and non-relevant background information, which also improves the efficiency of the usage of the computational resources. However, the main challenge to overcome remains the lack of robustness of the developed tools when analyzing lung images with completely different properties. The large diversity of lung pathological status and biological phenomena associated with severe imaging manifestations often result in extremely difficult segmentation cases, and models tend to fail in these scenarios.

  • Lung Segmentation
  • Learning Methods
  • Computer-Aided Decision Systems

1. Lung Segmentation

Lung segmentation is a critical task necessary in the majority of lung imaging CAD studies. Despite not being provided to radiologists in real scenarios, an accurate lung mask is absolutely crucial in the development of clinical support tools, avoiding the inclusion of noise and non-relevant background information, which also improves the efficiency of the usage of the computational resources. However, the main challenge to overcome remains the lack of robustness of the developed tools when analyzing lung images with completely different properties. The large diversity of lung pathological status and biological phenomena associated with severe imaging manifestations often result in extremely difficult segmentation cases, and models tend to fail in these scenarios.
In the work by Shaziya et al. [145][1], a comprehensive review of the state-of-the-art solutions regarding conventional, machine learning, and deep learning solutions was made, collecting several works from 2001 to 2018. El-Baz et al. The authors of [146][2] also reviewed the most relevant challenges associated with the lung cancer diagnosis research field, including several works regarding the lung segmentation task. Since these were the only published review articles found on this subject, the works included in this section were carefully compared to ensure the absence of overlapping. The search queries selected were (“Lung segmentation”) AND (“CT”) using the IEEE Xplore, Science Direct, and PubMed databases, which resulted in a total of 26 selected articles. This following content is divided into conventional and learning methods. The first includes a wide group of fundamental computer vision-based methodologies from 2014 to 2021. The second comprises a selection of machine and deep learning solutions from 2019 to 2021, considering a large number of recent approaches and the articles already discussed in [145][1]

2. Conventional Methods

Approaching lung segmentation through conventional computer vision methods often requires manual interventions for the initialization of the algorithm [147][3]. Filtering operations, such as histogram-based thresholding [148[4][5],149], maybe susceptible to several abnormalities present in lung tissues with higher or lower density values compared with the rest of the lung. To overcome this, a possible direction was proposed by Shi et al. [149][5], consisting of combining the “weak” and the “strong” from the multiple methods, with the intuition that it would result in an improved segmentation ability over single-method approaches. Morphological operations were also used as a post-processing option to fine-tune the predicted masks by eliminating some common mistakes, such as holes inside the lung tissues [148][4].
More complex methodologies based on active contour models [150[6][7],151], and modifications to the random walker method [152][8] were also recently proposed. These methodologies showed a robustness increase, even in the presence of tissue abnormalities, also enabling more automatic pipelines at the same time. A multi-atlas segmentation approach for thoracic organs at risk (OAR) was also proposed by Oliveira et al. [153][9], by considering the spatial relationships between the different thoracic organs to produce a single spatially coherent mask.
Table 1 summarizes the reviewed conventional methodologies for lung segmentation in chronological order.
Table 1. Overview of published works regarding conventional methodologies for the segmentation of lung CT images (2014–2021).
Authors Year Dataset Methods Performance Results (%)
Lai and Wei [148][4] 2014 Private (10 patients) Filtering process + morphological operations (threshold, region filling, closing) TPR = 97.0

TNR = 99.0

AAE = 1.58
[12][13][14][15][16][17][18][19][20][21]. To increase the complexity of the feature extraction task, the encoder module could reuse transferred weights from pre-trained networks, as in the works by Vu et al. [163][19] and Jalali et al. [166][22], where the VGG-16 and ResNet-34 models were adopted to work as encoder blocks, respectively. More investigations on improvements in typical convolutional blocks can also be found, integrating residual blocks [164[20][23],167], inception modules with dense connections [162][18], and squeeze-and-excitation blocks to target specific thoracic organs at risk [165][21]. Still, on feature extraction enhancement, adversarial training approaches were explored in [155,168[11][24][25],169], enabling approximating the predicted masks to the ground-truth by discriminating between both. More meaningful features can also be extracted by aggregating an auxiliary classification branch, enriching the information used for backpropagation [160,170][16][26]. Liu et al. [171][27] integrated different feature extraction branches by combining deep, textured, and intensity features, to be classified as part of the lung mask or background.
In two-stage pipelines, approaches based on lung detection followed by proper segmentation of the cropped input have been proposed [156[12][28][29],172,173], the Mask R-CNN [174][30] architecture was employed and predictions were refined through combining differently supervised and unsupervised methods. These regularisation techniques confirmed that, as expected, less noisy inputs would allow to obtain better predictions.
The lack of training data diversity has been recognized as a major barrier to achieve robust segmentation models, with better results obtained with larger and more heterogeneous private data, even with simple networks [158][14].
Table 2 summarizes the reviewed machine/deep learning methodologies for lung segmentation in chronological order.
Table 2. Overview of published works regarding learning-based methodologies for the segmentation of lung CT images (2019–2021).
Authors Year Dataset Methods Performance Results (%)
Dong et al. [155][11] 2019 LCTSC U-net generator with a FCN discriminator DSC = 97.0
Li et al. [147][3]
Feng et al. [156][12]2015 Private (15 patients) Edge-based recursive geometric active contour (GAC) model OV = 98.0
2019 LCTSC Two-stage segmentation process with 3D U-net DSC = 97.2 (RL),

97.9 (LL)
Shi et al. [149][5] 2016
Park et al. [157][13]Private (23 patients) Histogram thresholding + region growing and random walk OR = 1.87

UR = 2.36
2019

ABD = 0.620 mm
LCTSC Private (30 patients) U-net DSC = 98.8

JSC = 97.7

MSD = 0.270 mm

HSD = 25.5 mm
Zhang et al. [150][6] 2017 LIDC-IDRI Region- and edge-based GAC (REGAC) method
Hofmanninger et al. [158][14]DSC = 97.7

HD-95 = 2.50 mm
2020 LCTSC, LTRC, VISCERAL, VESSEL12 Private (5300 patients) U-net, ResUNet, Dilated residual network-D-22, DeepLab v3+ (merged dataset)

DSC = 98.0

HD95 = 3.14 mm

MSD = 0.620 mm
Rebouças Filho et al. [151][7] 2017 Private (40 patients) 3D ACACM
Yoo et al. [159]F-score = 99.2 (ACACM),

97.6 (RG),

97.4 (OsiriX),

97.2 (LSCPM)
Oliveira et al. [153][9] 2018 VISCERAL Anatomy3 Multi-atlas alignment + label fusion (voting and statistical selection) DSC = 97.4 (LL),

97.9 (RL)

HD-95 = 4.65 mm (LL),

2.81 mm (RL)
Chen et al. [152][8] 2021 LOLA11 Private (65 patients) Random walker (Private)

DSC = 98.6 (LL),

98.5 (RL)

(LOLA11)

DSC = 97.4
AAE: average area error; ABD: absolute border distance; ACM: active contour method; DSC: Sørensen–Dice coefficient; LL: left lung; LSCPM: level-set based on coherent propagation method; HD: Hausdorff distance; OR: over-segmentation rate; OV: overlap volume; RG: region growing; RL: right lung; TPR: true positive rate; TNR: true negative rate; UR: under-segmentation rate.
 

 

3. Learning Methods

The most recent approaches for CT lung segmentation show a clear predominance of learning algorithms capable of directly learning the distribution of the data used for training. Methodologies inspired on U-net [154][10] cover the majority of deep learning-based attempts [155,156,157,158,159,160,161,162,163,164,165][11]
[
15
]
2020
HUG-ILD Private (203 patients) 2D and 3D U-net (Private - 2D; 3D)

DSC = 99.6; 99.4

TPR = 99.5; 99.1

PPV = 99.6; 99.7

HD = 17.7 px; 18.7 px

(HUG-ILD - 2D; 3D)

DSC = 98.4; 95.3

TPR = 98.7; 98.0

PPV = 98.1; 92.8

HD = 7.66 px; 15.6 px
Khanna et al. [167][23] 2020 LUNA16 VESSEL12 2HUG-ILD ResUNet + false positive removal algorithm (LUNA16)

DSC = 96.6

JI = 93.4

TPR = 97.5

(VESSEL12)

DSC = 98.3

JI = 97.9

TPR = 98.8

(HUG-ILD)

DSC = 98.1

JI = 96.3

TPR = 98.3
Shi et al. [160][16] 2020 StructSeg 2019 TA-Net DSC = 96.8 (LL),

97.1 (RL)

HD = 0.188 mm (LL),

0.171 mm (RL)
Nemoto et al. [161][17] 2020 NSCLC-Radiomics 2D and 3D U-net DSC = 99.0 (2D/3D U-net)
Zhang et al. [162][18] 2020 Lung dataset (Kaggle “Finding and Measuring Lungs in CT Data” competition) Dense-Inception U-net (DIU-net) DSC = 98.6

JI = 98.7

ACC = 99.4

TPR = 98.5

TNR = 99.8

F-score = 98.5

AUC = 99.0
Vu et al. [163][19] 2020 Private (168 patients) U-net with pre-trained VGG16 DSC = 97.0 (RL and LL)

HD-95 = 5.10 mm (RL),

4.09 mm (LL)
Liu et al. [171][27] 2020 HUG-ILD Random forest fusion classification of deep, texture and intensity features DSC = 96.4

JI = 91.1

OR = 5.04

UR = 4.76
Hu et al. [172][28] 2020 Private (39 patients) Mask R-CNN + supervised and unsupervised classifiers DSC = 97.3

ACC = 97.7

TPR = 96.6

TNR = 97.1
Han et al. [173][29] 2020 Private Xception + VGG with SVM-RBF Detectron2 + contour fine-tuning DSC = 97.0

ACC = 99.0

TPR = 96.5

TNR = 99.4
Xu et al. [170][26] 2021 Private (217 patients) COVID-19-CT-Seg HUG-ILD VESSEL12 Boundary-Guided Network (BG-Net) DSC = 98.6 (Private),

96.5 (StructSeg),

98.9 (HUG-ILD),

99.5 (VESSEL12)

HD = 2.77 mm (Private),

1.39 mm (StructSeg),

0.665 mm (HUD-ILD),

1.40 mm (VESSEL12)
Jalali et al. [166][22] 2021 LIDC-IDRI ResBCDU-Net DSC = 97.1
Wang et al. [164][20] 2021 Lung dataset (Kaggle “Finding and Measuring Lungs in CT Data” competition) HDA-ResUNet DSC = 97.9

JI = 96.0

ACC = 99.3
Tan et al. [168][24] 2021 LIDC-IDRI QIN lung CT dataset LGAN (LIDC-IDRI)

IOU = 92.3

HD = 3.38 mm

(QIN)

IOU = 93.8

HD = 2.68 mm
Pawar and Talbar [169][25] 2021 HUG-ILD LungSeg-Net DSC = 96.3 (Fibrosis),

96.5 (Ground glass),

91.4 (Reticulation),

97.6 (Consolidation),

97.8 (Emphysema),

99.0 (Nodules)

JI = 93.7 (Fibrosis),

93.9 (Ground glass),

86.9 (Reticulation),

95.3 (Consolidation),

96.2 (Emphysema),

98.0 (Nodules)
Cao et al. [165][21] 2021 StructSeg 2019 C-SE-ResUNet DCS = 97.0 (LL)

96.6 (RL)
ACC: accuracy; AUC: area under the ROC curve; DSC: Sørensen–Dice coefficient; HD: Hausdorff distance; IOU: intersection over union; JI: Jaccard index; LL: left lung; OR: over-segmentation rate; PPV: positive predictive vale; RL: right lung; TNR: true negative rate; TPR: true positive rate; UR: under-segmentation rate.

 

3.1 Discussion and Future Work: Lung Segmentation

The use of conventional methods for lung segmentation has, to some extent, achieved satisfactory results for certain scenarios of data distributions. Image threshold-based algorithms often lack robustness, not being able to cope with higher variances on the density values of more heterogeneous lung structures. To achieve decent results, these algorithms require an extensive amount of post-processing work, employing highly data-dependent fine-tuning methods, which improves the performance by creating tight boundaries on the properties of a specific dataset. Regarding more dynamic algorithms, such as active contour models (ACM) and their variations, initial contours are often necessary for method initialization and the energy functions used for mask propagation can be susceptible to heterogeneous imaging variations in shape or intensity and, therefore, must be extensively tested over distinct sources of data to be considered clinically reliable.
The majority of the most recent segmentation approaches proposed to incorporate deep learning mechanisms, allowing the development of completely automatic solutions without the need to design and apply specific algorithms to solve specific problems. Since the publication of U-net [154][10] as a general biomedical image segmentation network, multiple approaches have been proposed to improve segmentation capabilities by increasing the complexity of the network. To improve the knowledge obtained in the extracted feature maps, the inclusion of handcrafted imaging features and auxiliary guided classification branches are examples of some technical innovations that were proposed, motivated by the chance of increasing the information that could be used to deal with more heterogeneous tissue patterns.
However, there still exist several issues that have hindered the development of universal segmentation systems capable of being adopted in clinical routines. The differences in contouring guidelines between databases are a crucial discussion point when evaluating lung segmentation approaches. Several models are developed with ground-truth labels that may not be adequate for every context of analysis. In the LOLA11 data description section, the statement “… lung segmentation images are not intended to be used as the reference standard for any segmentation study.” alerted the authors for this issue when selecting the data sources for their segmentation experiments. In several databases with available lung masks, these were often obtained using automatic segmentation tools or previously developed algorithms. In the cases where the main purpose of the database publication was not related to the segmentation tasks, the criteria for the included patients were often biased for specific pathological diagnoses, which made it more difficult to obtain the desired diversity of patients. Moreover, since lung masks are made available as a supplement, the processes for the contour quality assurance, the agreement rates of the annotators, and the contouring guidelines are not disclosed in most cases. This problem is emphasized by the fact that discrepancies related to the inclusion or exclusion of certain regions, such as trachea, main/secondary bronchi, and tumor regions in training data may create a substantial impact on the quantitative evaluation and performance comparison of different segmentation models. From the articles reviewed, it is possible to see that, in general, the models developed using privately collected data achieved higher generalization abilities in comparison with the ones trained using only public data sources. However, the generalization achieved was still limited, caused by using data from one single healthcare institution or a single country, which created a significant bias on the data collected.
Considering these facts, universal segmentation tools are needed for the future where CAD systems are implemented in the clinical routines. Innovation on the modeling fundamentals should continue to be investigated, to increase generalization, in order to cope with the large heterogeneity of tissues caused by the pathological phenomena occurring in the lung structures. Moreover, the implementation of measures to encourage the sharing of biomedical data for research purposes would automatically push the challenges that researchers face while addressing such tasks, which would cause a massive improvement in the utility of their outcomes for clinical practice.
Academic Video Service