Brain tumors (BT) present a considerable global health concern because of their high mortality rates across diverse age groups. A delay in diagnosing BT can lead to death. Therefore, a timely and accurate diagnosis through magnetic resonance imaging (MRI) is crucial. A radiologist makes the final decision to identify the tumor through MRI. However, manual assessments are flawed, time-consuming, and rely on experienced radiologists or neurologists to identify and diagnose a BT. Computer-aided classification models often lack performance and explainability for clinical translation, particularly in neuroscience research, resulting in physicians perceiving the model results as inadequate due to the black box model. Explainable deep learning (XDL) can advance neuroscientific research and healthcare tasks.
1. Introduction
A brain tumor (BT) develops due to the abnormal growth of brain tissue which can harm brain cells
[1][2][1,2]. It is a severe neurological disorder affecting people of all ages and genders
[3][4][5][3,4,5]. The brain controls the functionality of the entire body, and tumors can alter both the behavior and structure of the brain. Therefore, brain damage can be harmful to the body
[6]. According to projections by the American Cancer Society, there will be 1,958,310 cancer cases and 609,820 cancer-related deaths in the United States by 2023
[7]. Thus, early and accurate diagnosis through magnetic resonance imaging (MRI) can enhance the evaluation and prognosis of BT. Brain cancer can be treated in various ways: mainly including surgery, radiotherapy, and chemotherapy
[8]. However, visually differentiating a BT from the surrounding brain parenchyma is difficult, and physically locating and removing pathological targets is nearly impossible
[9].
In practice, MRI is often used to detect BT because it provides soft tissue images that assist physicians in localizing and defining tumor boundaries
[10]. BT-MRI images have varying shapes, locations, and image contrasts, making it challenging for radiologists and neurologists to interpret them in multi-class (glioma, meningioma, pituitary, and no tumor) and binary-class (tumor and no tumor) classifications. However, early diagnosis is crucial for patients, and failure to provide one within a short time period could cause physical and financial discomfort
[8]. To minimize these inconveniences, computer-aided diagnoses (CADs) can be used to detect BTs using multi-class and binary-class BT-MRI images
[11]. The CAD system assists radiologists and neurologists in comprehensively interpreting, analyzing, and evaluating BT-MRI data within a short time period
[12][13][12,13].
With the tremendous advances in artificial intelligence (AI) and deep learning (DL) brain imaging, image processing algorithms are helping physicians detect disorders as early as possible compared with human-led examinations
[14]. Technological advancements in the medical field require reliable and efficient solutions because they are intimately connected to human life and mistakes could endanger life. An automated method is required to support medical diagnosis. Despite their potential, DL techniques have limitations in clinical settings
[15]. However, in the traditional method, features are hand-crafted and rely on human interaction. In contrast, DL automatically extracts salient features to improve performance despite trade-offs in computational resources and training time. However, DL has shown much better results than traditional computer vision techniques in addressing these challenges
[16]. A major problem of DL is that it only accepts input images and outputs results without providing a clear understanding of how information flows within the internal layers of the network. In sensitive applications, such as brain imaging, understanding the reasons behind a DL network’s prediction to obtain an accurate correction estimate is crucial. In
[17], Huang et al. proposed an end-to-end ViT-AMC network using adaptive model fusion and multi-objective optimization to combine ViT with attention mechanism-integrated convolution (AMC) blocks. In laryngeal cancer grading, the ViT-AMC performed well. This research
[18] presented an additional approach to recognizing laryngeal cancer in the early stages. This research developed a model to analyze laryngeal cancer utilizing the CNN method. Furthermore, the researchers evaluated the performance parameters compared to the existing approach in a series of trials for testing and validating the proposed model. The accuracy of this method was increased by 25% over the previous method. However, this model is inefficient in the modern technological age, where many datasets are being generated daily for medical diagnosis. Recently, explainable deep learning (XDL) has gained significant interest for studying the “black box” nature of DL networks in healthcare
[15][19][20][15,19,20]. Using XDL methods, researchers, developers, and end users can develop transparent DL models that explain their decisions clearly. Medical end users are increasingly demanding that they feel more confident about DL techniques and are encouraged to use these systems to support clinical procedures. There are several DL-based solutions for the binary classification of tumors. However, almost all of these are black boxes. Consequently, they are less intelligible to humans. Regardless of human explainability, most existing methods aim to increase accuracy
[21][22][23][24][25][26][27][21,22,23,24,25,26,27]. In addition, the model should be understood by medical professionals.
2. The Method for Classification and Localization of BT
Several studies on the classification of BT-MRI images using CNN
[28][29][30][31][32][37,38,39,40,41], pre-trained CNN models using TL
[33][34][35][42,43,44], and tumor, polyp, and ulcer detection using a cascade approach
[36][45] have been reported with remarkable results. However, these models lack explainability
[21][22][37][38][21,22,31,46]. Although many XDL methods have been proposed for natural image problems
[39][40][41][47,48,49], relatively less attention has been paid to model explainability in the context of brain imaging applications
[19][42][19,50]. Consequently, the lack of interpretability in the models has been a concern for radiologists and healthcare professionals that find the black-box nature of the models inadequate for their needs. However, the development of XDL frameworks can advance neuroscientific research and healthcare by providing transparent and interpretable models. For this purpose, a fast and efficient multi-classification BT and localization framework using an XDL model has to be developed. An explainable framework is required to explain why particular predictions were made
[43][51]. Many researchers have applied attribution-based explainability approaches to interpret DL
[44][52]. In attribution-based techniques for medical images, multiple methods, such as saliency maps
[45][53], activation maps
[46][54], CAM
[47][55], Grad-CAM
[48][56], Gradient
[49][57], and shapely additive explanations, are used
[50][58]. The adoption of CAMs in diverse applications has recently seen the emergence of CNN-based algorithms
[47][48][51][52][53][54][55][56][55,56,59,60,61,62,63,64].
The Grad-CAM technique
[48][56] has recently been proposed to visualize essential features from the input image conserved by the CNN layers for classification. Grad-CAM has been used in various disciplines; however, it is preferred by the health sector. An extension of Grad-CAM, segmented-Grad-CAM, which enables the creation of heat maps that show the relevance of specific pixels or regions within the input images for semantic segmentation, has been proposed
[55][63]. It generates heatmaps that indicate the relevance of certain pixels or regions within the input images for segmentation. In
[56][64], class-selective relevance mapping (CRM), CAM, and Grad-CAM approaches were presented for the visual interpretation of different medical imaging modalities (i.e., abdomen CT, brain MRI, and chest X-ray) to clarify the prediction of the CNN-based DL model. Yang and Ranka enhanced the Grad-CAM approach to provide a 3D heatmap to visually explain and categorize cases of Alzheimer’s disease
[57][65]. These techniques have seldom been employed for binary tumor localization
[58][66] and are not used for multi-class BT MRI localization for model explainability. However, they are often used to interpret classification judgments
[44][52]. In
[59][67], a modified version of the ResNet-152 model was introduced to identify cutaneous tumors. The performance of the model was comparable with that of 16 dermatologists, and Grad-CAM was used to enhance the interpretability of the model. The success of an algorithm is significant. However, to improve the performance of explainable models, a method has to be developed for evaluating the effectiveness of an explanation
[60][68]. In
[61][69], deep neural networks (particularly InceptionV3 and DenseNet121) were used to generate saliency maps for chest X-ray images. They then evaluated the effectiveness of these maps by measuring the degree of overlap between the maps and human-annotated ground truths. The maps generated by these models were found to have a high degree of overlap with human annotations, indicating their potential usefulness in explainable AI in medical imaging. Interestingly, the study reported in
[62][70] identified the improved indicators via regions (XRAI) as an effective method for generating explanations for DL models. The various DL and XDL methods proposed for the automatic classification and localization of tumors are summarized in
Table 1.
Table 1.
Summarized related works on the classification and localization of BT.
Refs. |
Method |
Classification |
Mode of Explanation |
[63][71] |
Feedforward neural network and DWT |
Binary-class classification |
Not used |
[64][72] |
CNN |
Three-class BT classification |
Not used |
[65][73] |
Multiscale CNN (MSCNN) |
Four-class BT Classification |
Not used |
[66][74] |
Multi-pathway CNN |
Three-class BT classification |
Not used |
[67][75] |
CNN |
Multi-class brain tumor Classification |
Not used |
[68][76] |
CNN with Grad-CAM |
X-ray breast cancer mammogram image |
Heatmap |
[69][77] |
CNN |
Chest X-ray image |
Heatmap |
[70][78] |
CNN |
Multiple sclerosis MRI image |
Heatmap |
Table 1 shows the related approaches discussed in [68][69][70][76,77,78]. These studies evaluated the performances of various CNN-based classifiers on medical images and compared their characteristics by generating heatmaps. Based on these studies, Grad-CAM exhibits the most accurate localization, which is desirable for heatmaps. A localized heatmap makes it easier to identify the features that significantly contribute to the CNN classification results. Unlike the feature maps of convolutional layers, these heatmaps show the hierarchy of the importance of locations in the feature maps that contribute to classification.