A significant amount of research has been conducted on gastrointestinal tract segmentation and categorization [
8,
9,
10]. Yu et al. developed a unique architecture for polyp identification in the gastrointestinal tract in 2016 [
11]. They combine offline and online knowledge to minimize the false acceptance created through offline design and boost recognition results even more. Widespread testing using the polyp segmentation dataset indicated that their solution outperformed others. In 2017, Yuan Y et al. suggested a unique automated computer-aided approach for detecting polyps in colonoscopy footage. They used an unsupervised sparse autoencoder (SAE) to train discriminative features. Then, to identify polyps, a distinctive unified bottom-up and top-down strategy was presented [
12]. In 2019, Kang J et al. used the strong object identification architecture “Mask R-CNN” to detect polyps in colonoscopy pictures. They developed a fusion technique to improve results by combining Mask R-CNN designs with differing backbone topologies. They employed three open intestinal polyp datasets to assess the proposed model [
13]. In 2019, Cogan T et al. published approaches for enhancing results for a collection of images using full-image pre-processing with a cutting-edge deep learning technique. Three cutting-edge designs based on transfer learning were trained on the Kvasir dataset, and their performance was accessed on the validation dataset. In each example, 80% of the photos from the Kvasir dataset were used to test the model, leaving 20% to validate the model [
14]. In 2020, Öztürk et al. developed a successful classification approach for a gastrointestinal tract classification problem. The CNN output is enhanced using a very efficient LSTM structure. To assess the contribution of the proposed technique to the classification performance, experiments were carried out utilizing the GoogLeNet, ResNet, and AlexNet designs. To compare the results of their framework, the same trials were replicated via CNN fusion with ANN and SVM designs [
15]. Özturk et al. 2021 presented an artificial intelligence strategy for efficiently classifying GI databases with a limited quantity of labeled images. As a backbone, the proposed AI technique employs the CNN model. Combining LSTM layers yields a categorization. To accurately analyze the suggested residual LSTM architecture, all tests were conducted using AlexNet, GoogLeNet, and ResNet. The proposed technique outperforms previous state-of-the-art techniques [
16]. In 2022, Ye R et al. suggested the SIA-Unet, an upgraded Unet design that utilizes MRI data. It additionally contains an attention module that filters the spatial information of the feature map to fetch relevant data. Many trials on the dataset were carried out to assess SIA-Unet’s performance [
17]. In 2022, Nemani P et al. suggested a hybrid CNN–transformer architecture for segmenting distinct organs from images. With Dice and Jaccard coefficients of 0.79 and 0.72, the proposed approach is resilient, scalable, and computationally economical. The suggested approach illustrates the principle of deep learning to increase treatment efficacy [
18]. Chou, A. et al. used U-Net and Mask R-CNN approaches to separate organ sections in 2022. Their best U-Net model had a Dice score of 0.51 on the validation set, and the Mask R-CNN design received a Dice value of 0.73 [
19]. In 2022, Niu H et al. introduced a technique for GI tract segmentation. Their trials used the Jaccard index as the network assessment parameter. The greater the Jaccard index, the better the model. The results demonstrate that their model improves the Jaccard index compared to other methods [
20]. In 2022, Li, H, and colleagues developed an improved 2.5D approach for GI tract image segmentation. They investigated and fused multiple 2.5D data production methodologies to efficiently utilize the association of nearby pictures. They suggested a technique for combining 2.5D and 3D findings [
21]. In 2022, Chia B et al. introduced two baseline methods: a UNet trained on a ResNet50 backbone and a more economical and streamlined UNet. They examined multi-task learning using supervised (regression) and self-supervised (contrastive learning) approaches, building on the better-performing streamlined UNet. They discovered that the contrastive learning approach has certain advantages when the test distribution differs significantly from the training distribution. Finally, they studied Featurewise Linear Modulation (FiLM), a way of improving the UNet model by adding picture metadata such as the position of the MRI scan cross-section and the pixel height and breadth [
22]. Georgescu M. et al. suggested a unique technique for generating ensembles of diverse architectures for medical picture segmentation in 2022 based on the variety (decorrelation) of the models constituting the ensemble. They used the Dice score among model pairs to measure the correlation between the outputs of the two models that comprise each pair. They chose models with low Dice scores to foster variety. They conducted gastrointestinal tract image segmentation studies to compare their diversity-promoting ensemble (DiPE) with another technique for creating ensembles that relies on picking the highest-scoring U-Net models [
23].