Deep Learning in Neuro-Oncology Data Analysis

Deep Learning in Neuro-Oncology Data Analysis: Comparison

Please note this is a comparison between Version 1 by Carla Pitarch-Abaigar and Version 2 by Lindsay Dong.

Machine Learning is entering a phase of maturity, but its medical applications still lag behind in terms of practical use. The field of oncological radiology (and neuro-oncology in particular) is at the forefront of these developments, now boosted by the success of Deep-Learning methods for the analysis of medical images.

machine learning
neuro-oncology
radiology
deep learning

1. Introduction

Although Machine Learning (ML) is entering a phase of maturity, its applications in the medical domain at the point of care are still few and tentative at best. This paradoxical contradiction has been explained according to several different factors. One of them is the lack of experimental reproducibility, a requirement in which ML models in health have been reported to fare badly in comparison to other application areas [1]. One main reason to explain this is the mismatch between a data-centered (and often data-hungry) approach and the scarcity of publicly available and properly curated medical databases, combined with a nascent but insufficient data culture at the clinical level [2]. Another factor has to do with regulatory issues of ML (and Artificial Intelligence in general) in terms of both lack of maturity and geographical heterogeneity [3].

The field of oncological radiology (and neuro-oncology in particular) is arguably at the forefront of the practical use of ML in medicine ^[4][5], now boosted by the success of Deep-Learning (DL) methods for the analysis of medical images ^[5][6][6,7]. Unfortunately, though, imaging does not escape the challenges and limitations summarized in the previous paragraph. Central to them, what has been called the “long-tail effect” ^[7][8]: pathologies for which only small and scattered datasets exist due to the scarcity of clinical data management strategies (technically complex and expensive) at levels beyond the local (regional, national, international).

2. Ml-Based Analytical Pipelines and Their Use in Neuro-Oncology

Ultimately, the whole point of using ML methods for data-based problems in the area of neuro-oncology is to provide radiologists with evidence-based medical tools at the point of care that can assist them in decision-making processes, especially with ambiguous or borderline cases. This is why it makes sense to embed these methods in Clinical Decision Support Systems (CDSS). A thorough and systematic review of intelligent systems-based CDSS for brain tumor analysis based on magnetic resonance data (spectra or images) is presented in this same Special Issue of Cancers ^[8][26]. It reports their increasing use over the last decade, addressing problems that include general ones such as tumor detection, type classification, and grading, but also more specific ones such as physicians’ alerting of treatment change plans. At the core of ML-based CDSS, scholars need not just ML methods, models, and techniques but, more formally, ML pipelines. An ML pipeline goes beyond the use of a collection of methods to encompass all stages of the data mining process, including data pre-processing (data cleaning, data transformations potentially including feature selection and extraction, but also other aspects of data curation such as data extraction and standardization, missing data imputation and data clinical validation ^[9][27]) and models’ post-processing, potentially including evaluation, implementation and the definition of interpretability and explainability processes ^[10][25]. Pipelines can also accommodate specific needs, such as those related to the analysis of “big data”, with their corresponding challenges of standardization and scalability. An example of an ML pipeline for the specific problem of differentiation of glioblastomas from single brain metastases based on MR spectroscopy (MRS) data can be found in ^[11][30]. In this same issue of Cancers, Pitarch and co-workers ^[12][31] describe an ML pipeline for glioma grading from MRI data with a focus on the trustworthiness of the predictions generated by the ML models. This entails robustly quantifying the uncertainty of the models regarding their predictions, as well as implementing procedures to avoid data leakage, thus avoiding the risk of unreliable conclusions. Radiomics is an image transformation approach that aims to extract either hard-coded statistical or textural features based on expert domain knowledge or feature representations learned from data, often using DL methods. The former may include first-order statistics, size and shape-based features, image intensity histogram descriptors, image textural information, etc. The use of this method for the pre-processing of brain tumor images prior to the use of ML has been recently and exhaustively reviewed in ^[13][32]. From that review, it is clear that the predominant problem under analysis is diagnosis, with only a limited number of studies addressing prognosis, survival, and progression. The types of brain tumors under investigation are dominated by the most frequent classes. The use of radiomics as a data transformation strategy in pre-processing is facilitated by the existence of off-the-shelf software such as the open-source PyRadiomics package ^[14][33]. Source extraction methods have a very different analytical rationale for data dimensionality reduction as a pre-processing step. They do not achieve it through plain feature transformation, as in radiomics. Instead, they aim to find the underlying and unobserved sources of observed radiological data. In doing so, they achieve dimensionality reduction as a byproduct of a process that may provide insight into the generation of the images themselves. The ICA technique ^[15][34] has a long history in medical applications, most notoriously for the analysis of electroencephalographic signals. Source extraction is natural in this context as a tool for spatially locating sources of the EEG from electric potentials measured in the skull surface. In ICA, scholars assume that the observed data can be expressed as a linear combination of sources that are estimated to be statistically independent or as independent as possible. This technique has mostly been applied to brain tumor segmentation, but some alternative recent studies have extended its possibilities to dynamic settings, such as that in ^[16][35], where dynamic contrast-enhanced MRI is analyzed using temporal ICA (tICA), and in ^[17][36], where probabilistic ICA is used for the analysis of dynamic susceptibility contrast (DSC) perfusion MRI. The NMF technique ^[18][37], on the other hand, was originally devised for the extraction of sources from images and assumes data non-negativity but does not assume statistical independence. Data are still approximated by linear combinations of factors. Although NMF and variants of this family of methods have extensively been used for the pre-processing and analysis of MRS and MRS imaging (MRSI) signal ^[19][20][38,39], they have only scarcely been used for the pre-processing of MRI.

3. Deep Learning in Neuro-Oncology Data Analysis

3.1. Overview of the Main DL Methods of Interest

Recent advances in the DL field have brought about new possibilities in medical imaging analysis and diagnosis. One of its arguably most successful models is Convolutional Neural Networks (CNNs), a widely used type of DL algorithm, well known for its ability to capture spatial correlations within image pixel data hierarchically. They have shown promise in medical imaging tasks ^[21][22][23][43,44,45], enabling improved tumor detection, classification, and prognosis assessment. The input data of a CNN is represented as a tensor with dimensions in the format of (channels, depth, height, width). Notably, the “depth” dimension is specific to 3D images and not applicable to 2D data, and “height” and “width” correspond to the image’s spatial dimensions. In practical terms, the number of channels for color images is translated into three, representing Red, Green, and Blue (RGB) components, while gray-scale images consist of a single channel. The most characteristic operation in a CNN is called convolution, which gives the name to the convolutional layers. These layers capture spatial correlations by applying a set of filters or kernels across all areas of the input image data and compute the weighted sum, resulting in the generation of a feature map as an output. This feature map contains essential characteristics extracted by the actual layer and serves as the input for subsequent layers of processing. CNNs often consist of multiple layers that work together to learn hierarchical high-level image features. These layers progressively extract more abstract and complex information from the input image data. In the final step, the last feature map is passed through a fully connected layer, resulting in a one-dimensional vector. To obtain the class probabilities, Sigmoid or SoftMax are applied to this vector. Several networks have made significant contributions to the world of DL. AlexNet ^[24][46], GoogLeNet ^[25][47], InceptionNet, VGGNet ^[26][48], ResNet ^[27][49], DenseNet ^[28][50], and EfficientNet ^[29][51] are among the most widely used CNNs to extract patterns from medical imaging. DL models are considered data-hungry since they require substantial amounts of data for effective training. In the realm of medical data analysis, a primary challenge, as previously mentioned, is the inherent data scarcity and class imbalance. Common solutions to address this challenge include the application of data-augmentation (DA) methods and transfer-learning (TL) techniques. Data Augmentation techniques are a crucial strategy to mitigate the challenge of limited annotated data in medical image analysis. These methods encompass a range of transformations applied to existing images, effectively expanding the dataset in terms of both size and diversity. Former approaches involve a wide range of geometric modifications such as rotation, scaling, flipping, cropping, zooming, or color changes. Beyond traditional augmentations, advanced methods like Generative Adversarial Networks (GANs) ^[30][52] are used to generate new synthetic and realistic examples. The idea behind TL is to leverage pre-trained models, typically trained in large and diverse datasets, and adapt them for the specific task at hand, for which scholars might not have such a representative sample. Widely used pre-trained CNNs, such as ImageNet ^[31][53] or MS-COCO ^[32][54], have been originally developed from 2D large-scale datasets. However, a notable challenge when dealing with medical image data is the limited availability of large and diverse 3D datasets for universal pre-training ^[33][55]. Transferring the knowledge acquired from the 2D to the 3D domain proves to be a non-trivial task, primarily due to the fundamental differences in data structure and representation between these two contexts. To tackle this challenge and address the limitation of limited data, a broadly used strategy is to decompose 3D volumes into individual 2D slices within a determined anatomical plane. However, the decomposition of 3D volumes into individual 2D slices introduces a potential data leakage concern. This issue arises when 2D slices from the same individual inadvertently end up in both the training and testing datasets in an analytical pipeline. Such data leakage can lead to overestimations of model performance and affect the validity of experimental results. In addition, it is important to note that this approach comes with the trade-off of losing the 3D context present in the original data. Recent efforts have aimed at overcoming these challenges. Banerjee et al. ^[34][56] classified low-grade glioma (LGG) and high-grade glioma (HGG) multi-sequence brain MRIs from TCGA and BraTS2017 data using multiple slice-based approaches. In their work, they provided a comparison of the performance obtained with CNNs trained from scratch on 2D image patches (PatchNet), entire 2D slices (SliceNet), and multi-planar slices through a final ensemble method that averages the classification obtained from each anatomical view (VolumeNet). The classification obtained with these models is also compared with pre-trained VGGNet and ResNet on ImageNet. The multi-planar method outperformed the rest of the approaches with an accuracy of 94.74%, and the lowest accuracy (68.07%) was obtained with pre-trained VGGNet. Unfortunately, TCGA and BraTS data share some patient data, which could involve an overlap between training and testing samples and hence be prone to data leakage. Ding et al. ^[35][57] combined radiomics and DL features using 2D pre-trained CNNs using single-plane images and performing a subsequent multi-planar fusion. VGG16, in combination with radiomics and RF, achieved the highest accuracy of 80% when combining the information obtained from the three views. Even though the multi-planar approach processes the information gathered from the axial, coronal, and sagittal views, it is still essentially a 2.5D approach, weak at fully capturing 3D contexts. Zhuge et al. ^[36][58] presented a properly native 3D CNN for tumor segmentation and subsequent binary glioma grade classification and compared it with a pre-trained 2D ResNet50 on ImageNet with previous tumor detection, employing a Mask R-CNN. The results of the 3D approach were slightly higher than the 2D ones, reporting 97.10% and 96.30% accuracy, respectively. In their study, Chatterjee et al. ^[37][59] explored the role of (2+1)D, mixed 2D–3D, and native 3D convolutions based on ResNet. This study highlights the effectiveness of mixed 2D–3D convolutions, achieving an accuracy of 96.98%, surpassing both the (2+1)D and the pure 3D approaches. Furthermore, the use of pre-trained networks demonstrated enhanced performance in the spatial models, yet, intriguingly, the pure 3D model performed better when trained from scratch. A study conducted by Yang et al. ^[33][55] introduced ACS convolutions, a novel approach that facilitates TL from models pre-trained on large publicly accessible 2D datasets. In this method, 2D convolutions are divided by channel into three parts and applied separately to the three anatomical views (axial, coronal, and sagittal). The effectiveness of this approach was demonstrated using a publicly available nodule dataset. Subsequently, Baheti et al. ^[38][60] further advanced the application of ACS convolutions by showcasing their enhanced performance on 3D MRI brain tumor data. Their study provides evidence of notable improvements in both segmentation and radiogenomic classification tasks.

3.2. Publicly Available Datasets

Access to large and high-quality datasets plays a crucial role in the development and evaluation of robust DL classification algorithms. These datasets encompass diverse tumor types, imaging modalities, and annotated labels, facilitating the advancement of computational methods for accurate tumor classification (Table 1).

Table 1.

An overview of publicly available MRI datasets for brain tumor classification benchmarking.

Dataset		Categories	Dim.	Sample Size	MRI Modalities
BraTS ^[39][61]	2020	Low-Grade Glioma (LGG) High-Grade Glioma (HGG)	3D	369 (LGG: 76, HGG: 293)	T1, T1c, T2, FLAIR
	2019		3D	335 (LGG: 76, HGG: 259)	T1, T1c, T2, FLAIR
	2018		3D	284 (LGG: 75, HGG: 209)	T1, T1c, T2, FLAIR
	2017		3D	285 (LGG: 75, HGG: 210)	T1, T1c, T2, FLAIR
	2015		3D	274 (LGG: 54, HGG: 220)	T1, T1c, T2, FLAIR
	2013		3D	30 (LGG: 10, HGG: 20)	T1, T1c, T2, FLAIR
	2012		3D	30 (LGG: 10, HGG: 20)	T1, T1c, T2, FLAIR
CPM-RadPath ^[40][62]		Astrocytoma (AS) IDH-mutant Oligodendroglioma (OG) IDH-mutant 1p/19q codeletion Glioblastoma (GB) IDH-wildtype	3D	Training: 221 (AS: 54, OG: 34, GB: 133) [unseen sets] Val: 35, Test: 73	T1, T1c, T2, FLAIR
Figshare ^[41][63]		Meningioma (MN), Glioma (GL), Pituitary (PT)	2D	233 (MN: 82, GL: 89, PT: 62)	T1c
IXI ^[42][64]		Healthy	3D	600	T1, T2, PD, DW
Kaggle-I ^[43][65]		Healthy (H), Tumor (T)	2D	3000 (H: 1500, T: 1500)	-
Kaggle-II ^[44][66]		Healthy (H), Meningioma (MN), Glioma (GL), Pituitary (PT)	2D	3264 (H: 500, MN: 937, GL: 926, PT: 901)	-
Kaggle-III ^[45][67]		Healthy (H), Tumor (T)	2D	253 (H: 98, T: 155)	-
Radiopaedia ^[46][68]		-	-	-	-
REMBRANDT ^[47][69]		Oligodendroglioma (OG), Astroctyoma (AS), Glioblastoma (GB)	3D	111 (OG: 21, AS: 47, GB: 44)	T1, T1c, T2, FLAIR
REMBRANDT ^[47][69]		Grade II (G.II), Grade III (G.III), Grade IV (G.IV)	3D	109 (G.II: 44, G.III:24, G.IV: 44)	T1, T1c, T2, FLAIR
TCGA-GBM ^[48][70]		Glioblastoma	3D	262	T1, T1c, T2, FLAIR
TCGA-LGG ^[49][71]		Grade II (G.II), Grade III (G.III)	3D	197 (G.II: 100, G.III: 96, discrepancy: 1)	T1, T1c, T2, FLAIR
TCGA-LGG ^[49][71]		Astroctyoma (AS), Oligodendroglioma (OG), Oligoastrocytoma (OAS)	3D	197 (AS: 64, OG: 86, OAS: 47)	T1, T1c, T2, FLAIR

DW: Diffusion-weighted, FLAIR: Fluid Attenuated Inversion Recovery, PD: Proton Density, T1c: contrast-enhanced T1 weighted.

4. Machine Learning Applications to Ultra-Low Field Imaging

A completely different area of application of ML to neuroradiology has recently emerged with the availability of ultra-low field magnetic resonance imaging devices for point-of-care applications, typically with <0.1 T permanent magnets ^[50][51][52]. In the 0.055 T implementation described by Liu et al. ^[53], DL was used to improve the quality of the acquisition by detecting and canceling external electromagnetic interference (EMI) signals, eliminating the need for radio-frequency shielded rooms. They compared the results of the DL EMI cancelation in 13 patients with brain tumors, both in the 0.055 T and in another 3 T machine, on same-day acquisitions, finding that it was possible to identify the different tumor types.

Another example is the

4. Machine Learning Applications to Ultra-Low Field Imaging

A completely different area of application of ML to neuroradiology has recently emerged with the availability of ultra-low field magnetic resonance imaging devices for point-of-care applications, typically with <0.1 T permanent magnets [161,162,163]. In the 0.055 T implementation described by Liu et al. [11], DL was used to improve the quality of the acquisition by detecting and canceling external electromagnetic interference (EMI) signals, eliminating the need for radio-frequency shielded rooms. They compared the results of the DL EMI cancelation in 13 patients with brain tumors, both in the 0.055 T and in another 3 T machine, on same-day acquisitions, finding that it was possible to identify the different tumor types. Another example is the

Hyperfine

system, which received FDA clearance in 2020 for brain imaging and in 2021 (K212456) for DL-image reconstruction to enhance the quality of the generated images. In particular, DL is used as part of the image reconstruction pipeline of T1, T2, and FLAIR images. There are two DL steps: the first one is a so-called DL gridding, where the undersampled k-space data are transformed into images not by Fourier transformations but with DL. The transformed images are then combined, and a final post-processing DL step is applied to eliminate noise. However, no details about the specific algorithms are provided.