This research proposes a hybrid encoder decoder-based model for semantic se segmentation of the Ggastrointestinal tract. Here EfficientNet B0 is used as a bottom-up encoder architecture(GI) organs is crucial in radiation therapy for treating GI cancer. It allows for downsampling to capture contextual informeveloping a targeted radiation by extracting meaningful and discriminative features from input images. The performance of the EfficientNet B0 encoder is compared with three encoders: ResNet 50, MobileNet V2, and Timm Gernet. Here, Feature Pyramid Network (FPN) is used as a top-down decoder architecture for upsampling to recover spatial information. The performance of the FPN decoder is compared with three decoders: PAN, Linknet, and MAnet. Furthermore, the proposed hybrid model is analyzed using Adam, Adadelta, SGD, and RMSprop optimizerstherapy plan while minimizing radiation exposure to healthy tissue, improving treatment success, and decreasing side effects. Medical diagnostics in GI tract organ segmentation is essential for accurate disease detection, precise differential diagnosis, optimal treatment planning, and efficient disease monitoring.
3. Input Dataset
This research employs magnetic resonance imaging (MRI) data collected from patients who underwent MRI-guided radiotherapy at the University of Wisconsin-Madison Carbone Cancer Centre. This research uses a dataset comprising 85 patients, encompassing 38496 scans of various GI parts. The 16-bit grayscale Portable Network Graphics (PNG) layout represents the scans, while the annotations are given in comma-separated values (CSV) representations.The ground truth mask are generated from these annotations using RLE encoder. Hence there are 14085 masks of large bowel, 11201 masks for small bowel whereas 8627 masks are for stomach. 33913 masks have no organ from the GI tract, so these are blank masks. The RLE-encoded masks are used to describe the segmented areas. The dataset is available on the kaggle website [24]. The dimensions of each slice exhibit variability, ranging from 234x234 to 384x384 pixels. Figure 1 shows the image of the dataset with its ground truth masks. Figure 1(a) shows the input image of case32_day19_slice_0089. Figure 1(b) shows the mask for the large bowel, figure 1(c) shows the small bowel, figure 1(d) shows the mask for the stomach, and figure 1(e) shows an image with three concatenated masks.
(a) |
(b) |
(c) |
(d) |
(e) |
Figure 1. UW Madison GI Tract Dataset, (a) Input Image Mask, (b) Large bowel Mask, (c) Small Bowel Mask, (d) Stomach Mask, and (e) Concatenated Mask [24]
This research presents a segmentation model for segmenting GI tract parts such as the stomach and small and large bowel. Figure 2 depicts the suggested technique, which includes the input dataset, which is the UW Madison GI tract dataset. The second block is a downsampling encoder. Several encoders are used for downsampling in semantic segmentation to derive meaningful and hierarchical representations from the input data. To discover the optimum encoder for the segmentation job, four different encoders were implemented: ResNet 50 [25], EfficientNet B0 [26], MobileNet V2 [27], and Timm Gernet [28]. These encoders are pre-trained transfer learning models that did well on the imagenet dataset. These encoders play a vital role in downsampling the input data, allowing the decoder network to construct accurate and complete semantic segmentation maps of the gastrointestinal system. Different performance measures were used to assess these encoders. The best encoder will then be finalized based on the results and utilized as the encoder component of the final optimized model.
Several decoders are used for upsampling in semantic segmentation to regain spatial resolution and construct high-resolution segmentation maps. Upsampling is required because it restores the fine-grained details lost during downsampling. Dilated convolution-based decoders maintain spatial resolution while increasing the receptive field. By varying the dilation rates in the decoder, these devices successfully capture fine features and contextual information at several scales. The sort of decoder employed is decided by the application's specific requirements and the nature of the target objects. Some decoders are better at capturing little details than others at keeping spatial context. Four alternative decoders were used to determine the optimum decoder for GI tract segmentation. Feature Pyramid Network (FPN) [29], Pyramid Attention Network (PAN) [30], Linknet [31], and MAnet [32] are the names of the four decoders. These segmentation models were chosen for their excellent performance in earlier medical imaging research and their versatility in dealing with characteristics of various sizes. The best decoder is selected based on the findings of these four models.
Figure 2. Proposed Methodology
Optimizers for hyperparameter tuning are the next component of the suggested technique. Semantic segmentation employs several optimizers to improve training efficacy and subsequent model performance. Several variables impact the selection of which optimizer to utilize, including the dataset, model design, available computational resources, and the demands of the segmentation task. In this case, four different optimizers were evaluated: Adam [33], Adadelta [34], RMSprop [35], and SGD [36]. The best optimizer was chosen based on the results obtained by several optimizers. After all the encoder, decoder, and optimizer selection experiments, the most optimized model will be finalized. The final model will partition the input picture into three classes: small bowel, big colon, and stomach. In both the mask and the segmented image, yellow represents the big intestine, green represents the small colon, and red represents the stomach.
This section displays the results of the different encoder, decoder, and optimizer evaluations. The implementation used the Google Colab platform, Keras and TensorFlow environments, and the Python programming language.
Figure 3 evaluates four encoders that segment GI organs in the GI tract using the Dice coefficient, Jaccard coefficient, and loss. The four encoders are EfficientNet B0, MobileNet V2, Timm_Gernet_S, and ResNet 50. Figure 4 compares different encoders in terms of the processing time required by each encoder model. The findings reveal that EfficientNet B0 had the most significant Dice coefficient of 0.8975 and Jaccard coefficient of 0.8832, with a loss of 0.1251 and the shortest processing time of 2 hours and 25 minutes. MobileNetV2 likewise performed well, with a Dice coefficient of 0.8968, a Jaccard coefficient of 0.866, and a loss of 0.1378, but needed slightly more processing time than EfficientNet B0. Timm_gernet_s obtained a Dice coefficient of 0.8917, a Jaccard coefficient of 0.8610, and a loss of 0.1351 in 2 hours and 26 minutes. ResNet 50 got the same Dice and Jaccard coefficients as Adam, with a loss of 0.1301 and a processing time of 2 hours and 39 minutes. In conclusion, the results indicate that EfficientNet B0 is the most effective encoder model for segmenting GI organs in the GI tract.
Figure 3. Dice, Jaccard, and Loss Comparison of Different Encoders
Figure 4. Processing Time Comparison for Different Encoders
The EfficientNet-B0 architecture has become a well-known convolutional neural network (CNN) architecture suitable for use as an encoder in semantic segmentation tasks. The EfficientNet-B0 has been used in the proposed research design as a backbone network to extract features from the input image using downsampling. The current study proposes a unique network design using a compound scaling strategy. A very accurate and efficient model is produced by this approach, which balances the network's depth, breadth, and resolution.
(a) |
(b) |
(c) |
Figure 5. Results with Best Encoder- EfficientNet B0 (a) Validation Dice Coefficient, (b) Validation Jaccard Coefficient, and (c) Validation loss
EfficientNet-B0 is a convolutional neural network architecture composed of multiple blocks, each incorporating a blend of convolutional layers, activation functions, and pooling operations. It is a convolutional neural network architecture widely used for image classification tasks. In the context of semantic segmentation, the output of EfficientNet-B0 is commonly utilized as input to a decoder network. Using EfficientNet-B0 as an encoder for semantic segmentation has resulted in exceptional levels of accuracy and efficiency across a range of applications, including medical image segmentation [26]. Figure 5 shows the plots of the encoder model. Figure 5(a) shows the validation dice coefficient, figure 5(b) shows the validation Jaccard coefficient and figure 5(c) shows the model loss plot. The EfficientNet B0 outperformed when compared with different encoders such as ResNet 50, MobileNet V2, and Timm Gernet.
Figure 6 evaluates four decoders to segment GI organs in the GI tract using the Dice coefficient, Jaccard coefficient, and loss. The names of the four decoders used are FPN, PAN, LinkNet, and MAnet. Figure 7 compares different decoders in terms of the processing time required by each decoder model. FPN had the most significant Dice coefficient of 0.8975 and Jaccard coefficient of 0.8832, with a loss of 0.1251 and a processing time of 2 hours and 39 minutes. PAN fared similarly to FPN, with a Dice coefficient of 0.8936, a Jaccard coefficient of 0.8638, and a loss of 0.1278. It took significantly longer to process. Linknet produced a Dice coefficient of 0.8865, a Jaccard coefficient of 0.8567, and a loss of 0.1319 in 2 hours and 36 minutes of processing time. MAnet, on the other hand, had the lowest Dice and Jaccard coefficients and the most significant loss, with a Dice and Jaccard coefficient of 0.7141 and a loss of 0.3685. MAnet also needed the most processing time (3 hours and 7 minutes). Finally, the results indicate that FPN is the most successful segmentation model for segmenting GI organs in the GI tract.
Figure 6. Dice, Jaccard, and Loss Comparison of Different Decoders
Figure 7. Processing Time Comparison for Different Decoders
The FPN segmentation model is a famous deep-learning architecture for medical picture segmentation and other semantic segmentation problems. The FPN segmentation model's structure entails a segmentation head, a top-down pathway, lateral connections, and a backbone network. After several up-sampling and convolutional layers, the top-down route produces feature maps with varying spatial resolutions. The feature maps from the top-down pathway are linked to the feature maps from the backbone network through lateral connections. Because of this, the model can accurately represent details across several scales. The segmentation head then uses the fused feature maps to predict the segmentation masks for the various item classes in the input picture. As a result of its well-designed architecture, the FPN segmentation model is widely used in a wide variety of picture segmentation tasks [29]. Figure 8 shows the plots of the FPN segmentation model. Figure 8(a) shows the validation dice coefficient, figure 8(b) shows the validation Jaccard coefficient, and figure 8(c) shows the model loss plot. The FPN outperformed decoders such as PAN, Linknet, and MAnet.
(a) |
(b) |
(c) |
Figure 8. Result with Best Decoder- FPN (a) Validation Dice Coefficient, (b) Validation Jaccard Coefficient, and (c) Validation loss
Figure 9 evaluates the performance of the proposed model with four optimizers that segment GI organs in the GI tract using the Dice coefficient, Jaccard coefficient, and loss. Figure 10 compares different optimizers regarding the processing time required by the proposed model. The findings reveal that the Adam optimizer obtained the most significant Dice coefficient of 0.8975 and Jaccard coefficient of 0.8832, with the lowest loss of 0.1251. Adam needed 2 hours and 28 minutes to complete his processing. RMS prop also performed well, with a Dice coefficient of 0.8905, a Jaccard coefficient of 0.8605, and a loss of 0.1377. However, it took a little longer to digest than Adam. SGD and Ada Delta, on the other hand, achieved worse Dice and Jaccard coefficient performance and more significant loss than the other optimizers. SGD had a Dice coefficient of 0.7531 and a Jaccard value of 0.7253, with a loss of 0.3571, whereas Ada Delta had a Dice coefficient of 0.7472, a Jaccard coefficient of 0.7204, and a loss of 0.3692. In conclusion, the results indicate that Adam is the most effective optimizer for segmenting GI organs in the GI tract.
Figure 9. Dice, Jaccard, and Loss Comparison of Different Optimizers
Figure 10. Processing Time Comparison for Different Optimizers
The Adam optimizer is a common choice for training deep neural networks for semantic segmentation problems. Adam stands for "Adaptive Moment Estimation," an adaptation of the stochastic gradient descent (SGD) optimizer that employs adaptive learning rates for each weight parameter in the network [33]. Adam operates in semantic segmentation by modifying the learning rate for each weight parameter based on its first and second moments. This adaptive learning rate modification leads to faster convergence and better optimization performance than classic gradient descent-based optimizers. Adam can also handle sparse gradients, which is helpful for segmentation jobs in which many pixels have no labels. The optimizer's hyperparameters, such as learning rate and momentum, may be modified to optimize segmentation performance on a given dataset. Adam is a popular choice for semantic segmentation problems because of its quick convergence, variable learning rate modification, and capacity to handle sparse gradients. Figure 11 shows the plots of the Adam optimizer. Figure 11(a) shows the validation dice coefficient, figure 11(b) shows the validation Jaccard coefficient and figure 11(c) shows the model loss plot. The Adam optimizer outperformed when compared with different optimizers such as AdaDelta, RMSprop, and SGD.
(a) |
(b) |
(c) |
Figure 11. Results with Best Optimizer- Adam (a) Validation Dice Coefficient, (b) Validation Jaccard Coefficient, and (c) Validation loss
Figure 12 depicts the results of the model in the form of images. Figure 12 includes the input image, ground truth mask, and the predicted segmented image. Here yellow represents the large bowel, green is for the small bowel, and red is for the stomach. The similarity between the ground truth mask and the segmented image shows how much the suggested method can accurately segment the input image. It can be seen in the images that the segmented images are very much similar to the ground truth mask of the input image. So the proposed model can segment the MRI scan of the Gastrointestinal tract to assist radiation therapy for speeding up the treatment.
Input Image |
Ground Truth Mask |
Segmented Image |
Figure 12. Visualization of Results
Table 1 summarises several approaches and their associated outcomes segmentation of GI tract organs using the UW Madison GI tract dataset. The references and years of publication are provided, and the procedures utilized and the findings obtained are mentioned in Table 1. In 2022, the SIA UNet method received a Dice score of 0.78. CNN Transformer earned a somewhat higher Dice score of 0.79 and an IoU score of 0.72. The combination of UNet and Mask RCNN yielded a Dice score of 0.51. Furthermore, UNet on 2.5D data produced a Dice score of 0.36% and an IoU score of 0.12%. An ensemble of multiple architectures performed well, with a Dice score of 0.88. Finally, the proposed model, a Hybrid EfficientNet B0, and FPN, received the highest Dice score of 0.8975 and an IoU score of 0.8832. the table reveals that the proposed model outperforms the state of art results for the UW Madison GI tract dataset to segment GI tract organs.
Table 1. State-of-the-Art Comparison
Ref/ Year |
Techniques |
Dice |
IoU/ Jaccard |
[17]/ 2022 |
SIA UNet |
0.78 |
--- |
[18]/ 2022 |
CNN Transformer |
0.79 |
0.72 |
[19]/ 2022 |
UNet and Mask RCNN |
0.51 |
--- |
[20]/ 2022 |
UNet on 2.5D |
0.36 |
0.12 |
[21]/ 2022 |
Ensemble of Different Architecture |
0.88 |
--- |
[37]/ 2022 |
UNet |
0.8854 |
0.8819 |
Proposed Model |
EfficientNetB0 and FPN |
0.8975
|
0.8832 |
The gastrointestinal tract (GI) is a critical mechanism in the human body that aids nutrition, digestion, and absorption. It breaks down food into smaller molecules that the body can absorb and utilize. There has been a significant increase in GI malignancies among men and women in recent years. Radiation therapy is usually considered the most common treatment for GI cancer. The therapy includes employing high-energy X-rays to target malignant cells while leaving avoiding healthy organs in the GI system. Therefore, it is essential to develop an automated method for accurately segmenting GI tract organs to speed up medical therapy. Medical diagnosis in GI tract organ segmentation has various advantages. Accurate segmentation of GI organs enables accurate illness detection and localization, assisting in early diagnosis and tailored therapy planning. This research proposes a hybrid encoder decoder-based model for semantic segmentation of the GI tract. In the proposed hybrid model, EfficientNet B0 is used as bottom-up encoder architecture for downsampling to capture contextual information by extracting meaningful and discriminative features from input images.
In contrast, Feature Pyramid Network (FPN) is a top-down decoder architecture for up-sampling to recover spatial information. The proposed model achieved the dice coefficient and Jaccard index values as 0.8975 and 0.8832, respectively. This research aimed to find the most feasible combination of these components for segmentation optimization. In this study, the best-performing model used EfficientNet B0 as the encoder, FPN as the decoder, and Adam as the optimizer. This strategy is likely to improve cancer therapy efficacy and timeliness.