Skull stripping removes non-brain tissues from magnetic resonance (MR) images, but it is hard because of brain variability, noise, artifacts, and pathologies. The existing methods are slower and limited to a single orientation, mostly axial. Researchers' proposed and experimented method uses the modern and robust architecture of deep learning neural networks, viz., Mask–region convolutional neural network (RCNN) to learn, detect, segment, and to apply the mask on brain features and patterns from many thousands of brain MR images.
1. Traditional Techniques
Traditional techniques of skull stripping, include Brain Extraction Tools, Brain Sur-face Extraction, and Robust Brain Extraction (ROBEX)
[1][2]. B
rain E
T xtraction Tools (BET) is a widely used method for skull stripping based on intensity thresholding and morphological operations
[3]. BSE is a semi-automated method that uses a deformable surface model to segment the brain from non-brain tissue
[4]. The R
obust Brain Extraction (ROBEX
) algorithm uses intensity thresholding, mathematical morphology, and a multi-scale watershed transform to extract the brain tissue from non-brain tissue
[1]. Otsu’s method is a non-parametric approach for image segmentation and is an attractive alternative to the Bayes decision rule
[5].
2. Artificial Intelligence-Based Techniques
The first quarter of the 21st century has been the era of artificial intelligence (AI), and more is expected in the forthcoming decades. Contemporary techniques are more effective and efficient for skull stripping, especially in large lesions or with significant anatomical abnormalities
[6][7]. Such methods can robustly deal with variations in image contrast and noise, but may require enormous amounts of training data and therefore greater computational cost
[8]. Machine learning (ML)-based AI models of skull stripping have evolved through Deep Learning Neural Networks (DLNN) such as Convolutional Neural Networks (CNN). AI experts have utilized the cutting-edge technology of DLNN to train models for medical image processing using chest X-ray images for the detection of COVID-19
[8]. CNN has revolutionized the field of image processing based on the principle of feature extraction. The CNN-based models progress through Region-based Neural Networks (RCNN) followed by Faster RCNN to today’s state-of-the-art neural networks, viz., Mask–RCNN
[9][10].
3. Machine Learning
Machine learning has contributed significantly to the processing of brain magnetic resonance (MR) images by segmenting and contouring different brain structures and regions of interest
[11]. There are at least four types of ML-based brain MR image processing techniques, viz., supervised, unsupervised, semi-supervised, and deep learning. Supervised ML algorithms establish learning from annotated images, where MRI experts manually delineate the brain regions
[12]. Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) are popular ML-based supervised techniques. Unsupervised ML algorithms discover hidden patterns within brain MR images by identifying similarities and differences in intensity, texture, or shape, and grouping similar voxels or regions
[13]. K-means and Gaussian Mixture Models (GMMs) are popular ML-based unsupervised techniques. Semi-supervised ML algorithms use a small amount of annotated data to guide the segmentation process while taking unlabeled data to learn more generalizable features. Deep learning models such as CNNs can automatically learn hierarchical features from raw image data
[14][15]. Popular deep learning architectures include U-Net, Fully Convolutional Networks (FCNs), and U-Net.
4. Deep Learning
Deep learning has revolutionized the processing, segmentation, and identification of regions in brain MR images
[11]. ResNet and U-Net are the most popular models of deep learning. ResNet is known for its residual connections and achieves high accuracy in tasks such as brain tumor segmentation
[16]. U-Net precise localization has been extensively applied to segment brain MR images
[17]. Many researchers have combined ResNet and U-Net to leverage their respective strengths for enhanced accuracy and precision
[18]. In contrast, the challenges related to such architectures include class imbalance and limited training data
[19]. Their application has advanced brain MR image analysis including anatomical structure identification
[20].
5. Convolutional Neural Networks and Region-Based Neural Networks
Convolutional Neural Network (CNN) consists of convolutional layers that apply filters to input images and generate feature maps
[21]. Non-linear activation functions and pooling layers follow the CNN for down sampling. RCNN combines CNN with recurrent layers such as LSTM or GRU to capture temporal dependencies in sequential data
[22]. RCNN effectively models both spatial and temporal information, making them valuable for tasks involving sequential or spatiotemporal understanding.
6. Mask–Region-Based Neural Network
Region-based Neural Network (RCNN) uses selective searching to identify regions of interest (ROIs) and applies a convolutional neural network to classify ROIs as foreground or background
[23]. Faster RCNN, an extended version of RCNN, uses a Region Proposal Network (RPN) to generate ROIs by adding more CNNs
[24][25].
Nonetheless, Mask–RCNN is the most modern type of DLNN, an extended version of Faster RCNN by adding a mask branch, which generates binary masks for each ROI. Mask–RCNN masks the detected object
[22][26][27]. Mask–RCNN applies Residual Neural Network (ResNet)
[28] architecture for feature extraction from the input image, which comprises cerebral cortex pyramidal cells and uses jumps over applied layers
[29]. ResNet works as the backbone architecture of the Mask–RCNN
[30]. It also applies Region Proposal Network (RPN) to predict the presence or absence of the targeted object
[22][31]. It forwards feature maps having target objects. Pooling layers make uniform shapes of received featured maps to connect layers for further processing
[32][33]. Eventually, the fully connected convolutional layers receive feature maps to predict class labels and coordinates of bounding boxes
[26][29]. Mask–RCNN computes Regions of Interest (RoIs) with a benchmark value = 0.5
[34] for declaring the regions as RoIs, followed by the addition of the mask branch to apply a mask on established RoI
[35].
The application of Mask–RCNN contours object detection, segmentation, image captioning, and instance segmentation
[22][36]. It has achieved state-of-the-art performance in various benchmark datasets
[37]. In object detection, Mask–RCNN has shown excellent performance in detecting multiple objects with high accuracy. Instance segmentation uses individual instances of objects in an image and provides a more detailed understanding of the scene. In image captioning, Mask–RCNN generates captions to describe target objects and their locations.
Mask–RCNN offers more precise and accurate results
[38]; however, in some cases, it takes around 48 h to train the system. Numerous public databases such as Common Objects in Context (COCO) are available with training weights to train systems using the transfer learning approach
[39][40]. Overall, Mask–RCNN is an effective tool for image analyses and has the potential for further advancements in computer vision.
Figure 1 depicts the structure of Mask–RCNN.
Figure 1. Mask–RCNN architecture comprising three major segments, viz., input, regions of interest and application of mask.
In Mask–RCNN architecture, the input image passes through convolutional layers (C1, C2, …, Cn) while extracting feature maps (P1, P2, …, Pn) at each layer. RPN is applied on each ex-traced feature map.