1. Nodule Detection and Segmentation
The automated detection and segmentation of lung nodules are of great significance in the treatment of lung cancer and in increasing patient survival
[1]. In clinical settings, radiologists must extract suspicious lung nodules from numerous images, a rigorous task since many destabilising factors, such as distraction and fatigue, as well as the limitations of professional experience, can contribute to misinterpretation of the available data. Therefore, several studies have focused on overcoming these difficulties, helping radiologists make more accurate diagnoses by proposing CAD systems that perform automatic detection and segmentation of lung nodules.
1.1 Nodule Detection
The heterogeneity and high variability of nodule imaging characteristics bring significant complexity to this task, and so lung nodule detection can naturally be seen separated into two sub-modules: (1) where multiple candidates are first proposed, and (2) the nodule/non-nodule distinction is refined. Considering DL-based approaches, encoder-decoder architectures are widely used as the base methods for initial nodule detection
[2][3][4][5][6][7][8][9]. The extraction of hand-crafted statistical, shape, and texture features also brought valuable information for candidate detection, being further classified by SVM
[10][11] or by using ensemble strategies to combine the learning abilities of different classifiers
[12]. Other traditional vision algorithms found successful results in juxtapleural nodules detection
[13]. In the context of this problem, missing a true nodule should be more penalized than predicting too many false suspicions; however, there is an obvious effort in the literature to decrease false positive mistakes, mostly approached by combining different classification networks
[2][14], using multi-scaled patches for capturing features at different expression levels
[4][5][15][16], employing other classification algorithms, such as SVM
[6][10][11][17][18][19], Bayesian networks, and neuro-fuzzy classifiers
[19], or proposing a graph-based image representation with deep point cloud models
[20].
On the other hand, single-stage methods have also been explored. In the work by Harsono et al.
[21], inspired by RetinaNet
[22] successes, transfer learning techniques were employed to make use of ImageNet
[23] pre-trained architectures, building a modified feature pyramid network (FPN) to combine the feature maps obtained at specific dimension levels, outperforming the previous state-of-the-art. A different single-stage approach can be found in
[24], where the YOLO v3 architecture was firstly adapted for lung nodule detection, showing the capability of detecting these small imaging elements.
Table 1 summarizes the different nodule detection systems mentioned above in chronological order.
Table 1. Overview of published works regarding nodule detection approaches in lung CT images (2020–2021).
Authors |
Year |
Dataset |
Methods |
Performance Results (%) |
Tan et al. [2] |
2020 |
LIDC-IDRI |
3D CNNs, based on FCN, DenseNet, and U-Net |
TPR = 97.5 |
Mukherjee et al. [12] |
2020 |
LIDC-IDRI |
Ensemble stacking |
ACC = 99.5 TPR = 99.2 TNR = 98.8 FPR = 1.09 FNR = 0.85 |
Shi et al. [3] |
2020 |
LUNA16 |
3D Res-I and U-Net network |
TPR = 96.4 FROC = 83.7 |
Khehrah et al. [10] |
2020 |
LIDC-IDRI |
SVM |
ACC = 92 TPR = 93.7 TNR = 91.2 PPV = 83.3 MCC = 83.8 |
Kuo et al. [11] |
2020 |
LIDC-IDRI Private (320 patients) |
SVM |
TPR = 92.1 |
Zheng et al. [4] |
2020 |
LIDC-IDRI |
3D multiscale dense CNNs |
TPR = 94.2 (1.0 FP/scan), 96.0 (2.0 FPs/image) |
Paing et al. [13] |
2020 |
LIDC-IDRI |
Optimized random forest |
ACC = 93.1 TPR = 94.9 TNR = 91.4 |
Liu et al. [24] |
2020 |
LIDC-IDRI |
CNN algorithm: You Only Look Once v3 |
TPR = 87.3 |
Harsono et al. [21] |
2020 |
LIDC-IDRI Private (546 patients) |
I3DR-Net |
mAP = 49.6 (LIDC), 22.9 (private) AUC = 81.8 (LIDC), 70.4 (private) |
Xu et al. [5] |
2020 |
LUNA16 |
3D CNN networks: V-Net and multi-level contextual 3D CNNs |
TPR = 93.1 (1.64 FP/scan) CPM = 75.7 |
Drokin and Ericheva [20] |
2020 |
LIDC-IDRI |
Algorithm for sampling points from a point cloud |
FROC = 85.9 |
El-Regaily et al. [14] |
2020 |
LIDC-IDRI |
Multi-view CNN |
ACC = 91.0 TPR = 96.0 TNR = 87.3 F-score = 78.7 |
Ye et al. [6] |
2020 |
LUNA16 |
Three modified V-Nets with multilevel receptive fields |
ACC = 66.7 TPR = 81.1 PPV = 78.1 F-score = 78.7 |
Baker and Ghadi [17] |
2020 |
LIDC-IDRI |
SVM |
NRR = 94.5 FPR = 7 cluster/image |
Halder et al. [18] |
2020 |
LIDC-IDRI |
SVM |
ACC = 88.2 TPR = 86.9 TNR = 86.9 |
Jain et al. [7] |
2020 |
LUNA16 |
SumNet |
ACC = 94.1 TNR = 94.0 DSC = 93.0 |
Mahersia et al. [19] |
2020 |
LIDC-IDRI |
SVM, Bayesian back-propagation neuronal classifier and neuro-fuzzy classifier |
NRR = 97.9 (neuronal classifier), 97.3 (SVM), 94.2 (neuro-fuzzy classifier) |
Mittapalli and Thanikaiselvan [15] |
2021 |
LUNA16 |
Multiscale CNN with Compound Fusions |
CPM = 94.8 |
Vipparla et al. [16] |
2021 |
LUNA16 |
3D Attention-based CNN architectures: MP-ACNN1, MP-ACNN2 and MP-ACNN3 |
CPM = 93.1 |
Luo et al. [8] |
2021 |
LUNA16 |
SCPM-Net |
TPR = 92.2 (1 FPs/image), 93.9 (2 FPs/image), 96.4 (8FPs/image) |
Bhaskar and Ganashree [9] |
2021 |
DSB-2017 |
Gaussian mixture convolutional auto encoder + 3D deep CNN |
ACC = 74.0 |
ACC: accuracy; AUC: area under the ROC curve; CPM: competition performance metric; DSC: Sørensen–Dice coefficient; FDR: false discovery rate; FNR: false negative rate; FP: false positive; FPR: false positive rate; FROC: free-response receiver operating characteristic; mAP: mean average precision; MCC: Matthews correlation coefficient; NPV: negative predictive value; NRR: nodule recognition rate; PPV: positive predictive value; TNR: true negative rate; TPR: true positive rate.
1.2 Nodule Segmentation
Although the popularity of deep learning approaches has caused a take over of the majority of nodule segmentation tasks, other learning algorithms have also been used. Machine learning-based approaches are still used for segmentation tasks. Hybrid models combining ML classifiers have been applied
[25], standard level set image segmentations
[26], or regions growing—that merge regions with similar features
[27]. DL methods have shown the capability of outperforming the results presented by the previous works. The U-Net, 3D-UNet, and VNet approaches are the most common architectures applied
[28][29][30]. The deep deconvolutional residual network was proposed for nodule segmentation, using a summation-based long skip connection from convolutional to deconvolutional parts of the network
[31].
All of these methodologies are summarized in Table 2 in chronological order.
Table 2. Overview of the published works regarding nodule segmentation approaches in lung CT images (2020–2021).
Authors |
Year |
Dataset |
Methods |
Performance Results (%) |
Sharma et al. [25] |
2020 |
SPIE-AAPM Lung CT Challenge |
SVM + k-NN |
ACC = 93.9 TPR = 94.5 GM = 94.2 |
Xiao et al. [28] |
2020 |
LUNA16 |
3D-UNet + Res2Net Neural Network |
TPR = 99.1 DSC = 95.3 |
Singadkar et al. [31] |
2020 |
LIDC-IDRI |
Deep deconvolutional residual network |
DSC = 95.0 JI = 88.7 |
Kumar and Raman [29] |
2020 |
LUNA16 |
V-Net (3D CNN) |
DSC = 96.1 |
Rocha et al. [30] |
2020 |
LIDC-IDRI |
Sliding Band Filter + U-Net + SegU-Net |
DSC = 66.3 (SBF), 83.0 (U-Net), 82.3 (SegU-Net) |
Hancock and Magnan [26] |
2021 |
LIDC-IDRI |
Level set machine learning method |
DSC = 83.6 JI = 71.8 |
Savic et al. [27] |
2021 |
LIDC-IDRI Private—phantom (108 patients) |
Algorithm based on the fast marching method |
DSC = 93.3 (solid round nodules), 90.1 (solid irregular nodules), 79.9 (non-solid nodules), 61.4 (cavity nodules) |
ACC: accuracy; DSC: Sørensen–Dice coefficient; GM: Geometric mean; JI: Jaccard index; TPR: true positive rate.
2. Nodule Classification
Identification of lung nodule malignancy at the early stage has a positive impact on lung cancer prognosis. Therefore, there is a need for CAD systems to classify the lung nodule into benign and malignant types with maximum accuracy to avoid delays in diagnosis. This section will provide an overview of the current technology for lung nodule classification, a subject of study that is heavily explored by researchers who see the mortality rate increasing each day.
Science Direct, IEEE Xplore, Web of Science, and PubMed were the databases used during the search for articles pertaining to the classification of pulmonary nodules. The keywords used were “lung”, “nodule”, “classification”, “malignant”, “benign”, “pulmonary”, “tumor”, “cancer”, “CAD”, and “CADe”, with various combinations of logical expressions containing “AND” and “OR.” The articles were filtered according to their relevance, performance results, year of publication, and presence/absence in other reviews. Of the 33 articles selected, one was published in 2021 and the rest in 2020.
In recent years, many deep learning techniques have been used in lung nodule classification and have shown promising results when compared to other state-of-the-art machine learning methods. Thus, not surprisingly, the most recent review article found, published in the year 2021, and written by Naik and Edla, focused on 108 research papers, published up until 2019, which proposed novel deep learning methodologies for the nodule classification in lung CT scans
[32].
The development of CAD systems for lung nodule malignancy has focused on a binary analysis, which basically resumes into finding imaging characteristics with values for distinguishing benign from malignant nodules. Although more complexity could be extracted from this problem (e.g., the assessment of more detailed malignancy levels), the literature still focused on the “two-class” version of this problem.
Regarding nodule feature extraction, CNN has became the standard approach in this field, either with single-network approaches
[33][34][35][36][37][38][39][40][41][42][43] or using ensemble strategies to combine multiple models
[44][45][46][47][48][49]. A combination of local and more nodule-specific features with more global information is captured by processing the input image at different dimensions
[50][51][52][53][54][55], enabling to bring together features from different levels of analysis. Regarding training techniques, the possibility of making use of ImageNet
[23] pre-trained architectures, as in
[35][37][56][57], was shown to provide improvements in the predictive ability. In works that explored multi-task learning strategies, taking advantage of related tasks to enhance the extraction of relevant information, features captured by generative models while discriminating between real and fake lung nodules, have also shown valuable roles in training options
[58][59], as well as the use of knowledge obtained by learning to reconstruct nodule images
[44][48][60].
Although the majority of the literature on this topic relies on end-to-end neural network-based methodologies, algorithms, such as SVM, XGBoost, and KNN, have also been employed, serving as classifiers for previous extracted deep features
[41][44], combined multimodal features
[36], or hand-crafted features, such as nodule textures, intensity, and shape
[61].
Table 3 summarizes the detailed nodule classification works with the reported best performance in chronological order.
Table 3. Overview of published works regarding nodule classification approaches in lung CT images (2020–2021).
Authors |
Year |
Dataset |
Methods |
Performance Results (%) |
Wang et al. [33] |
2020 |
Private (1478 patients) |
Adaptive-boost deep learning strategy with multiple 3D CNN-based weak classifiers |
ACC = 73.4 TPR = 70.5 TNR = 76.2 PPV = 83.8 AUC = 82.0 F-score = 71.6 |
Xiao et al. [44] |
2020 |
LIDC-IDRI |
ResNet-18 + Denoising autoencoder classifier + handcrafted features |
ACC = 93.1 TPR = 81.7 PPV = 83.8 AUC = 82.0 |
Wang et al. [51] |
2020 |
LUNGx |
ConvNet |
ACC = 90.4 TPR = 88.7 TNR = 92.4 AUC = 94.8 |
Lin et al. [34] |
2020 |
LUNA16 |
GVGG + ResCon network |
TPR = 92.5 TNR = 96.8 PPV = 93.6 F-score = 93.0 |
Onishi et al. [58] |
2020 |
Private (60 patients) |
M-Scale 3D CNN |
TPR = 90.9 TNR = 74.1 |
Zhao et al. [50] |
2020 |
LIDC-IDRI |
Multi-stream multi-task network |
ACC = 93.9 TPR = 92.6 TNR = 96.2 AUC = 97.9 |
Zia et al. [56] |
2020 |
LIDC-IDRI |
Multi-deep model |
ACC = 90.7 TPR = 90.7 TNR = 90.8 |
Jiang et al. [45] |
2020 |
LUNA16 |
Ensemble of 3D Dual Path Networks |
ACC = 90.2 TPR = 92.0 FPR = 11.1 F-score = 90.4 |
Bao et al. [55] |
2020 |
LIDC-IDRI |
Global-local residual network |
ACC = 90.4 TPR = 90.1 PPV = 89.9 AUC = 96.1 |
Shah et al. [35] |
2020 |
LUNA16 |
NoduleNet (transfer learning from VGG16 and VGG19 models) |
ACC = 95.0 TPR = 84.0 TNR = 97.0 |
Tong et al. [36] |
2020 |
LIDC-IDRI |
3D-ResNet + SVM with RBF and polynomial kernels |
ACC = 90.6 TPR = 87.5 TNR = 94.1 |
Xu et al. [52] |
2020 |
LIDC-IDRI |
Multi-scale cost-sensitive methods |
ACC = 92.6 TPR = 85.6 TNR = 95.9 PPV = 90.4 AUC = 94.0 F-score = 87.9 |
Huang et al. [37] |
2020 |
LIDC-IDRI |
Deep transfer convolutional neural network + Extreme learning machine |
ACC = 94.6 TPR = 93.7 TNR = 95.1 AUC = 94.9 |
Naik et al. [46] |
2020 |
LUNA16 |
FractalNet + CNN |
ACC = 94.1 TPR = 97.5 TNR = 86.8 AUC = 98.0 |
Zhang et al. [42] |
2020 |
LUNA16 |
3D squeeze-and-excitation network and aggregated residual transformations |
ACC = 91.7 AUC = 95.6 |
Liu et al. [47] |
2020 |
LIDC-IDRI |
Multi-model ensemble learning architecture based on 3D CNNs: VggNet, ResNet, and InceptionNet |
ACC = 90.6 TPR = 83.7 TNR = 93.9 AUC = 93.0 |
Afshar et al. [53] |
2020 |
LIDC-IDRI |
3D Multi-scale Capsule Network |
ACC = 93.1 TPR = 94.9 TNR = 90.0 AUC = 96.4 |
Lyu et al. [38] |
2020 |
LIDC-IDRI |
Multi-level cross ResNet |
ACC = 92.2 TPR = 92.1 TNR = 91.5 AUC = 97.1 |
Wu et al. [39] |
2020 |
LIDC-IDRI |
Deep residual network (ResNet + residual learning + migration learning) |
ACC = 98.2 TPR = 97.7 TNR = 98.3 PPV = 98.5 F-score = 98.1 FPR = 1.60 |
Lin and Li [40] |
2020 |
LIDC-IDRI |
Taguchi-based AlexNet CNN |
ACC = 99.6 |
Kuang et al. [59] |
2020 |
LIDC-IDRI |
Combination of a multi-discriminator generative adversarial network and an encoder |
ACC = 95.3 TPR = 94.1 TNR = 90.8 AUC = 94.3 |
Lima et al. [61] |
2020 |
LIDC-IDRI |
SVM with Gaussian kernel + Relief + Evolutionary Genetic Algorithm |
AUC = 85.6 |
Veasey et al. [57] |
2020 |
NLST |
Recurrent neural network with 2D CNN |
PPV = 55.9 (t0), 66.9 (t1) AUC = 80.6 (t0), 83.5 (t1) |
Bansal et al. [41] |
2020 |
LUNA16 |
Deep3DSCan |
TPR = 87.1 TNR = 89.7 AUC = 88.3 F-score = 88.5 |
Zhai et al. [48] |
2020 |
LUNA16 LIDC-IDRI |
Multi-task learning CNN |
TPR = 84.0 (LUNA16), 95.6 (LIDC-IDRI) TNR = 96.8 (LUNA16), 88.9 (LIDC-IDRI) AUC = 97.3 (LUNA16), 95.6 (LIDC-IDRI) |
Paul et al. [49] |
2020 |
NLST |
Ensemble of CNNs |
ACC = 90.3 AUC = 96.0 TPR = 73.0 FNR = 27.0 |
Ali et al. [43] |
2020 |
LIDC-IDRI LUNGx |
Transferable texture CNN |
ACC = 96.6 (LIDC-IDRI), 90.9 (LUNGx) TPR = 96.1 (LIDC-IDRI), 91.4 (LUNGx) TNR = 97.4 (LIDC-IDRI), 90.5 (LUNGx) AUC = 99.1 (LIDC-IDRI), 94.1 (LUNGx) |
Silva et al. [60] |
2020 |
LIDC-IDRI |
Transfer learning (convolutional autoencoder) |
AUC = 93.6 PPV = 79.4 TPR = 84.8 F-score = 81.7 |
Xia et al. [54] |
2021 |
LIDC-IDRI |
Gradient boosting machine algorithm |
ACC = 91.9 TPR = 91.3 F-score = 91.0 FPR = 8.00 |
ACC: accuracy; AUC: area under the ROC curve; FNR: false negative rate; FPR: false positive rate; PPV: positive predictive value; TNR: true negative rate; TPR: true positive rate.
3. Interpretability Methods for Nodule-Focused CADs
As mentioned in the previous section, nodule classification models based on deep learning (DL) algorithms are able to achieve the highest performances. However, DL models are considered the least interpretable machine learning models due to the inherent mathematical complexity; thus, not providing reasoning for the prediction and, consequently, decreasing the trust in these models
[62]. When utilizing these black-box models in the medical domain, it is critical to have systems that are trustworthy and reliable to the clinicians, therefore raising the need to make these approaches more transparent and understandable to humans
[63].
Explainable AI (XAI) refers to techniques or methods that aim to find a connection between the input features and the prediction of the black-box; thus, looking to justify the decision and its reliability. Perceptive interpretability includes XAI methods that focus on generating interpretations that can be easily perceived by humans, despite not actually ‘unblackboxing’ the algorithm
[64]. Visual explanations are the most commonly used XAI methodologies in deep learning image analysis approaches
[65], namely in radiology image-based predictive models, where the trust in a CAD system can increase substantially by presenting the areas of a medical image with a higher contribution to the prediction, along with the prediction itself
[62].
A large portion of the most utilized XAI methods in the medical domain is post-hoc models, which consist of methods external to the already trained predictive model, performing evaluations on the predictions without altering the model itself. These are off-the-shelf agnostic methods that can be found in libraries, such as PyTorch Captum
[66]. This post-model approach was implemented by Knapič et al.
[63], where two popular post-hoc methods, local interpretable model-agnostic explanations (LIME), and SHAPley Additive exPlanations (SHAPs) were compared in terms of understandability for humans in the predictive model with the same medical image dataset.
Furthermore, in-model XAI methods for lung nodule classification were also implemented by Li et al.
[67], where an importance estimation network returns a diagnostic visual interpretation that is utilized by the classifier for an irrelevant feature destruction process in each pooling layer. In the developed model, only the essential features are preserved in the visual interpretation, being the optimization of the model achieved by a trade-off between the accuracy of the model and the amount of information used in the classification. In Jiang et al.
[68], a convolutional block attention module (CBAM) was implemented to develop a partially explainable classification model for pulmonary nodules, allowing to build a relationship between the features of the input images and the symptom descriptions and infer that the rationale of the network shows some correlation with the diagnosis of physicians.
The concern for interpretability is increasing, especially in the medical field, where there are higher stakes and responsibilities in the CAD systems that are implemented. However, the research in the area of interpretable models is still in progress, despite the recent rise in the development of this approach. The increase in research efforts of interpretable CAD systems is already noticeable, mainly regarding the verification and explanation of the predicted diagnosis, rather than the unravelling of the black-box
[64]. These methods may show future potential, not only in providing trustworthy explanations to physicians, but also in assuring the reliability and consistency of the developed models.
4. Discussion and Future Work: Nodule Detection, Segmentation, and Classification
In the current methods, direct comparisons of research results were hampered by the heterogeneity in the selection of included scans, different parameters for the algorithms, and inconsistent use of performance metrics and evaluation protocols. Overall, the selected works have shown good capabilities in the detection, segmentation, and classification of pulmonary nodules in CT images. It can be found that the machine learning techniques showed satisfactory performance results, while deep learning, especially CNN, outperformed conventional models and emerged as a promising approach. The main advantages of CNN lie in its ability to directly learn from a variety of data sources and automatically generate relevant and possibly unknown features, allowing for prompt and efficient development of CAD systems. The major challenge is achieving robustness to diverse clinical data of varying quality. Although the availability of heterogeneous private datasets has been shown to improve model performance, results comparability, and generalization becomes limited. Furthermore, to ensure robustness, the proposed methods need to be validated with sufficiently large datasets that include all nodule types and sizes. Thus, methods that were evaluated with fewer nodules will likely lose accuracy under clinical conditions where nodule types are more varied. The next challenge is the discrepancy or variability between the manual annotations. For image-based annotations, such as detection, segmentation, and classification, such variability may reflect a possible ceiling performance for AI-based methods. In addition, feature extraction serves as an important step in differentiating nodules from other anatomic structures present in lung lobes. Yet, the optimal set of features for nodule detection remains a subject of debate. Moreover, although deep-learning technologies avoid handcrafting and selecting image features, they instead require the selection of a loss function, network architecture, and an efficient optimization method, all of which influence the learning process. Additionally, the images used for training and testing in nodule analysis algorithms may have excluded pathological conditions in addition to lung nodule screening. Incorporating day-to-day chest CT images from multiple centers and dealing with these real-life situations are challenges and are reasons why manual correction and interaction are necessary to help physicians read the images.
4.1 Improvements Needed
To improve CADe and further develop its contribution to lung cancer treatment, some areas need to be explored:
- Large and different public lung nodule databases for algorithm evaluation to provide replication of desired results and enhance the stringency of the algorithm so that lung nodule analysis tools can be validated by mimicking real clinical scenarios.
- The ability to deal with pulmonary nodules based on location (isolated, juxtapleural, or juxta-vascular) and internal texture (solid, semi-solid, ground-glass opacity, and non-solid). In particular, the detection of ground glass optical and non-nodules is difficult and is explored by very few researchers.
- The ability to deal with pulmonary nodules with extremely small diameters. Most early-stage malignant tumors are smaller in size, and if these tumors are detected at an early stage, the survival chance of the individual can be increased.
- The ability to classify nodules not only as benign or malignant, but as benign, early-stage cancerous nodule, primary malignant, and metastasis malignant, decreasing the level of abstraction related to some clinical phenomena that must be considered.
- Develop a system capable of segmenting out large solid nodules attached to the pleural wall, which is quite challenging.
- Build a set of useful and efficient features based mainly on shape or geometry, intensity, and texture for better false-positive reduction.
- Develop a new CAD system based on powerful feature map visualization techniques to better analyze CNN’s decision and transfer it to radiologists.
- Fine-tune a pre-trained CNN model instead of training it from scratch to increase its robustness and surpass the limitation of annotated medical data.
- Develop in-depth research on GAN models, which can solve the problem of the lack of medical databases.
- Design new CAD systems, including two or more CNN architectures to address the problem of overfitting that occurs during the training process due to imbalance in the datasets.
- Develop new deep learning techniques or optimize existing techniques to improve the performance of the CADe system, such as using a contracting path (to capture context) and a symmetric expanding path (to enable precise localization) to strengthen the use of available annotated samples, training multilayer networks efficiently by residual learning to gain accuracy from considerably increased depth.
- Promote cooperation and communication between academic institutions and medical organizations to combine real clinical requirements and the latest scientific achievements.