1. Introduction
Capsule endoscopy allows for a non-invasive and painless evaluation of small bowel (SB) mucosa, essentially being a diagnostic modality [1][2]. This exam is fundamental to the diagnosis of obscure gastrointestinal bleeding (OGIB) but also the study of Crohn’s disease (CD), SB tumors, celiac disease (CeD) (extent and severity), and others [1][2][3], as illustrated in Figure 1. However, it is essential to note that CE has some drawbacks. Among these is the dependence on the examiner’s clinical experience and the time and labor involved in the image review process (previous series have reported reading times of over 40 to 50 min), which makes it a task prone to error [4][5]. Therefore, artificial intelligence (AI) will probably contribute to minimizing these limitations and increase its potential. Nowadays, this topic is becoming more popular, resulting in an increasing number of recent research articles dedicated to it.
Figure 1. Capsule endoscopy images of several small-bowel pathologies. In the top row, the left image corresponds to normal mucosa, and the two images on the right illustrate vascular lesions. In the bottom row, the left image corresponds to a protruding lesion, the center image to an ulcer, and the right image to hematic residues.
2. Artificial Intelligence and Obscure Gastrointestinal Bleeding
Gastrointestinal (GI) bleeding can originate anywhere from the mouth to the rectum or anus
[6]. Researchers classified GI bleeding into upper, lower, and middle based on the location; the markers are anatomical landmarks: the ampulla of Vater and terminal ileum. Even though they usually have different presentations, overlap can occur, making it challenging to identify the bleeding source
[7]. OGIB refers to GI bleeding from an unidentified origin that persists despite a comprehensive upper and lower GI evaluation with an endoscopic evaluation of the terminal ileum. OGIB is divided into obscure overt or obscure occult based on the presence or absence of clinically visible bleeding, respectively
[6][7].
Different etiologies can cause OGIB, with angioectasias or nonsteroidal anti-inflammatory drug-induced ulcers being the most common in older patients (>40 years). The approach to studying different etiologies depends on whether it is overt or occult, whether signs of severe bleeding are present, and whether the patient is fit for endoscopic evaluations
[6][8]. In general, video capsule endoscopy (VCE) is the first diagnostic step in OGIB, in the absence of contraindications such as obstruction. Indeed, OGIB is the most common indication for CE
[9]. When the bleeding site is identified during VCE, specific treatment should be initiated.
Many research articles have been dedicated to this field since at least 2007, mainly because of its prevalence in clinical activity. In 2009, Pan et al. developed a CNN to detect a bleeding image using color texture features. This research used a total of 150 full CE videos, 3172 bleeding images, and 11,458 non-bleeding images to test the algorithm, achieving a sensitivity and specificity at the image level of 93.1% and 85.6%, respectively. This study achieved better results than the previous research and used a much larger dataset
[10].
Fu et al. wanted to overcome some of the limitations of the suspected blood indicator, which included the ability to only detect active bleeding and the method’s insufficient sensitivity and specificity. For that, they created a computer-aided design method based on a support vector machine that detects bleeding regions with high sensitivity, specificity, and accuracy (99%, 94%, and 95%, respectively). They also used a different image analysis method, grouping pixels based on color and location through superpixel segmentation, which reduced the computational complexity
[11]. Later, Jia et al. developed a deep CNN to automatically detect bleeding in wireless capsule endoscopy (WCE) images. They compared their method with Fu et al. and others, achieving better results
[12].
Fan et al. used the AlexNet CNN to detect ulcers and erosions in SB mucosa. This study reported an AUC ROC curve over 0.98 in ulcer and erosion detection and an accuracy of 95.16% and 95.34%, a sensitivity of 96.80% and 93.67%, and a specificity of 94.79% and 95.98%, respectively. This research was pioneered by using DL to assess two different lesions simultaneously
[13]. In the following year, Aoki et al. also trained a deep CNN to detect erosions and ulcerations in WCE images automatically. The model reported an AUC of 0.958 and a sensitivity, specificity, and accuracy of 88.2%, 90.9%, and 90.8%, correspondingly
[14].
Wang et al. applied a deep learning framework to ulcer detection using a large dataset (1504 patient cases—1076 with ulcers, 428 normal). The results of this study were moderately decent and indicate a strong correlation between ulcer size and detection
[15].
Aoki et al. developed an unprecedented CNN method to assess if a CNN can reduce endoscopists’ reading time (trainees and experts). To achieve this, they compare the reading times and detection rates of mucosal breaks (erosions or ulcerations) between endoscopist-alone readings (process A) with endoscopist readings after a first screening by the CNN (process B). They used 20 full videos and reported a significantly shorter duration for process B (expert: 3.1 min; trainee: 5.2 min vs. expert: 12.2 min; trainee: 20.7 min) without compromising the detection rate of mucosal breaks. This study reinforces these methods’ importance and practical application in clinical settings
[16]. The same author recently developed a CNN capable of detecting blood in the SB lumen using 27,847 images from 41 patients (6503 images depicting blood content from 29 patients and 21,344 images of normal mucosa from 12 patients). They compared the performance of the CNN with the suspected blood indicator (SBI), achieving significantly higher sensitivity, specificity, and accuracy, corresponding to 96.63%, 99.96%, and 99.89%, than the SBI. This study suggests that a CNN could outperform the SBI software already used in real-time practice
[17].
Ghosh developed a deep transfer learning framework for automated bleeding detection and bleeding zone identification in CE images, achieving satisfactory global accuracy
[18]. More recently, a Portuguese group created a CNN-based algorithm that automatically detects blood and hematic residues within the SB lumen in CE images. Throughout three stages of development, the model’s accuracy demonstrated a tendency to increase as data were repeatedly loaded into the multi-layer CNN. In the last stage, it achieved an area under the ROC curve of 1.0, a sensitivity of 98.3%, a specificity of 98.4%, and an accuracy of 98.2%, with excellent reading times (186 frames/second)
[9]. For the first time. the same group more recently developed a CNN capable of automatically identifying and classifying multiple SB lesions with different bleeding potential, using a dataset of 53,555 images of a total of 5793 CE exams from 4319 patients of two different medical centers. Each frame was evaluated for the type of lesion, lymphangiectasia, xanthomas, vascular lesions, ulcers, erosions, protruding lesions, and luminal blood, and the hemorrhagic risk was evaluated based on Saurin’s classification (P0, P1, and P2, for lesions without hemorrhagic potential, or with intermediate or high hemorrhagic potential, respectively). This research reported sensitivity and specificity for the automatic detection of various abnormalities, approximately 88% and 99%, respectively. It also reported high sensitivity and specificity for detecting P0, P1, and P2 lesions. This study is particularly interesting because it sets a precedent for future advancements in this area, likely contributing to real-time implementation
[19].
3. Artificial Intelligence and Vascular Lesions
Angioectasia is the most common vascular lesion in the GI tract and results from the formation of aberrant blood vessels. This lesion is the cause of more than 8% of bleeding episodes, and its prevalence is highly linked to advanced age
[1][6][20]. Previous studies mainly focused on detecting bleeding lesions rather than specifically angioectasia lesions. In 2012, a study reported that only approximately 70% of angioectasias were detected by experts, highlighting the urgent need for improvement
[20]. AI could be a tool with significant potential in this regard.
Indeed, in 2016, Vieira et al. developed an automatic segmented method capable of identifying angioectasias using different color spaces
[21]. At a later stage, the same group improved the segmentation algorithm used in prior research, outperforming the last study and achieving sensitivity and specificity values of over 96%
[20]. Later, Noya et al. developed a system of automatic angioectasia lesion detection using color-based, texture, statistical, and morphological features. This study reported a sensitivity of 89.51%, a specificity of 96.8%, and an AUC value of 82.33%
[22]. Leenhardt et al. developed a CNN model that could automatically detect and localize angioectasias, using 6360 images from 4166 CEs and achieving a sensitivity and specificity of 100% and 96%, respectively
[23].
Subsequently, further studies were conducted using CNNs. Tsuboi et al. developed a deep CNN system for the automatic detection of SB angioectasia in CE still images using 2237 still frames of CE, achieving an AUC of 0.998, and the sensitivity, specificity, positive predictive value, and negative predictive value of the CNN was 98.8%, 98.4%, 75.4%, and 99.9%, respectively
[24]. More recently, Chu et al. developed a DL algorithm that used ResNet50 as a skeleton network to segment and recognize angioectasia lesions (angioectasia, Dieulafoy’s lesion, and arteriovenous malformation). This study used a dataset of 378 patients and comprised a test set with 3000 images, which contained 1500 images without lesions and 1500 images with lesions. They compare their model network with others available (PSPNet, Ground truth, DeeplabV3+, and UpperNet), achieving an accuracy of 99%, a mean intersection over union of 0.69, a negative predictive value of 98.74%, and a positive predictive value of 94.27%
[25].
4. Artificial Intelligence and Protruding Lesions
CE plays an essential role in investigating patients with clinically or imaging-suspected SB tumors, as well as in monitoring patients with hereditary polyposis syndromes
[26]. SB protuberant lesions consist of various pleomorphism lesions, in which tumors of SB are evidently included. Their detection is challenging due to the lesions’ pleomorphism
[27].
Barbosa and co-workers, in 2008, developed an algorithm based on the textural analysis of the different color channels capable of detecting tumor lesions. They used a small dataset and reported 98.7% sensibility and 96.6% specificity in detecting tumor lesions in the SB
[28]. The same group also developed an algorithm based on combined information from both the color and texture of the images for the detection of tumors of the SB. This algorithm was based on the previous study from the same authors, but this one used a more extensive dataset. It also achieved excellent performance, 93.1% specificity, and 93.9% sensitivity
[29].
Li et al. also used an algorithm based on shape features, but it only used data retrieved from two patients, which limits its applicability to real practice
[30]. The same authors also performed a comparative study using a computer-aided system for detecting tumors in CE images through a comparative analysis of four texture features and three color spaces. The best performance achieved was an average accuracy of 83.50% and a specificity and sensitivity of 84.67% and 82.33%, respectively. They concluded that different color spaces have different impacts on the computer-aided system’s performance
[31]. In the following year, the same group developed a computerized tumor detection system for CE images. Using texture features and a support vector machine, they achieved an accuracy of 92.4%
[32]. Other studies only detected tumors in the SB
[33][34]. Yuan et al. developed a computer-aided detection method to recognize polyp images and other structures (bubbles, turbid images, and clear images) in CE images with an average accuracy of 98%. This study reinforces that luminal content makes it difficult to evaluate frames
[35].
More recently, Saito and co-workers developed, for the first time, a CNN capable of identifying and classifying protruding lesions (polyps, nodules, epithelial tumors, submucosal tumors, and venous structures) in CE images using a large dataset. This research achieved an overall AUC of 0.911 and a sensitivity and specificity of 90.7% and 79.8%, respectively. This method brings the algorithm much closer to real clinical practice, enhancing its practicality in clinical settings
[36].
Saraiva et al. developed a pioneer CNN designed to automatically detect SB protruding lesions and evaluate the lesions’ hemorrhagic potential. Using 1483 CE exams, a total of 18,625 images were extracted, with 2830 images showing protruding lesions and the rest showing normal mucosa. Each frame was evaluated for enteric protruding lesions (polyps, epithelial tumors, subepithelial lesions, and nodules), and the hemorrhagic potential was estimated according to Saurin’s classification. Overall, the model achieved an accuracy of 92.5%, a sensitivity and a specificity of 96.8% and 96.5%, respectively, and an excellent reading time (70 frames per second)
[27].
5. Artificial Intelligence and Pleomorphic Lesion Detection
Most of the currently developed advanced systems can only detect one type of lesion at a time, which does not meet the requirements for clinical practice implementation
[37]. Therefore, there has been a need to develop algorithms capable of detecting multiple pathologies in a single examination.
Ding and co-workers developed a CNN algorithm capable of classifying various lesions in SB CE images, unlike previous studies focusing only on specific lesions. This study used an extensive multicenter dataset—data from 6970 patients (158,235 images from 1970 cases used in the training phase and 5000 cases in the validation phase)—to screen out different lesions (abnormal lesions and normal variants). The algorithm reported excellent performance and time efficiency, with a mean reading time of approximately 6 min compared with conventional reading times of 97 min
[38].
The latter study was followed by other research studies using CNNs to detect a variety of mucosal abnormalities. Otani et al. trained the deep neural network system RetinaNet to diagnose various SB lesions using a training dataset of 167 patients (398 images of erosions and ulcers, 538 images of angioectasias, 4590 images of tumors, and 34,437 normal images from 11 patients), achieving an AUC value for tumors, erosions and ulcers, and vascular lesions of 0.950, 0.996 and 0.950, respectively
[39]. Aoki and co-workers conducted prior research on detecting individual abnormalities. In this article, they developed a deep CNN system capable of detecting various abnormalities and compared it with the QuickView mode, also reporting excellent results
[40]. Vieira et al. applied multi-pathology classification and segmentation to the KID dataset. The model reported good performances in lesion detection and segmentation tasks, suggesting that these two should be used together in future works
[41]. Furthermore, Hwang et al. developed a CNN capable of automatically detecting various SB lesions (hemorrhagic and ulcerative lesions). They trained the CNN in two ways: the combined model (separately identifying hemorrhagic and ulcerative lesions and then combining the results) and the binary model (identifying abnormal images without discrimination). Both models achieved high accuracy for lesion detection, and the difference between the two models was not significant. However, the combined model reported results with higher accuracy and sensitivity
[37].
6. Artificial Intelligence and Small-Bowel Compartmentalization
AI has tremendous potential in assisting with the localization of CE within the GI tract and could decrease the time required to identify organic boundaries, which is necessary for studies of automatic lesion detection and locating lesions in clinical practice
[42].
Prior to 2017, many research articles aimed to locate the pylorus. However, they achieved neither excellent accuracy nor excellent reading times. In turn, Wang et al. developed an SVM method that was able to achieve that aim, using 3801 images from the pyloric region, 1822 from the pre-pyloric region, and 1979 from the post-pyloric region. The study reported an accuracy of 97.1% and a specificity of 95.4% in a time-efficient manner (1.26 min on average)
[42].
7. Artificial Intelligence and Celiac Disease
CeD is an immune-mediated disorder known for being a gluten-sensitive enteropathy. The diagnosis relies on a sequential approach and a combination of clinical features, serology, or histology. Biopsy was, for a long time, considered the ‘gold standard’ for diagnosing CeD and is still mandatory in most cases
[43]. Despite no substitute for duodenal biopsies, CE seems to be a promising alternative for diagnosing CeD, excluding other diagnoses, and evaluating the extent of the disease.
Ciaccio et al. developed a threshold classifier able to predict CeD based on images. Using image data from eleven CeD patients and ten control patients and analyzing nine different features, they reported a threshold classifier with 80% sensitivity and 96% specificity
[3].
Later, Zhou et al. developed a CNN to evaluate the presence and degree of villous atrophy objectively. The training set had CE videos from six CeD patients and five controls, and each frame was rotated every 15 degrees to form a new candidate proposal for the training set, which improved the sensitivity and specificity. The authors achieved a 100% level of sensitivity and specificity in the testing set. This study introduced a new prospect, the automatic correlation between Marsh classification and video capsule images
[44].
Koh et al. used a combination of various image features to classify normal or CeD images using a computer-aided detection (CAD) system. The study reported an accuracy level of 86.47%, and a sensitivity and specificity of 88.43% and 84.60%, respectively. This study reinforced that the CAD system can improve and change how researchers diagnose CeD
[45].
In 2020, Wang et al. developed a CNN system that combined different techniques, utilizing data from 52 CeD videoclips and 55 healthy videoclips. Overall, it achieved remarkable results in the diagnosis of CeD, with accuracy, sensitivity, and specificity of 95.94%, 97.20%, and 95.63%, respectively. This study highlights the role of integrating different technologies to achieve better results and robustness
[46].
More recently, Stoleru et al. presented an algorithm proving that computer-aided CeD detection is possible even without using complex algorithms. They processed images with two modified filters to analyze the intestinal wall’s texture, proving that a diagnosis can be obtained through image processing and without complex algorithms
[47]. Also, Chetcuti Zammit et al. developed an ML algorithm capable of quantitatively grading CeD severity. They used a training dataset of 334,080 frames from 35 patients with biopsy-proven CeD and 110,579 frames from 13 patients without CeD. A strong correlation was observed between the celiac severity scores provided by the algorithm and the average expert reader scores. This study used a large patient cohort, suggesting reproducibility in real time
[48].
8. Artificial Intelligence and Inflammatory Bowel Activity
Inflammatory bowel disease (IBD) has ulcerative colitis and CD as its principal forms. Approximately 70–90% of CD patients have SB disease
[49]. The role of capsule endoscopy in IBD disease or suspected disease, particularly in CD, is well-established for both diagnosis and follow-up in a non-invasive way
[50]. Quantitative scores, namely the Lewis score and the capsule endoscopy CD activity index, are commonly used in medical practice to quantify mucosal inflammation during CE. However, these scores rely on intra-examiner variability. Therefore, AI can play a significant role in reducing the limitations of CE and minimizing the intra-observer variability and, by doing so, improve clinical practice and minimize the risk and cost
[51][52].
Klang et al. developed a CNN to classify images as normal mucosa or mucosa with ulcerations and aphthae. They used data from 49 patients, with 17,640 CE images in total. The model reported an excellent AUC, above 0.94, in detecting ulcerations in patients with CD
[51]. In the same year, Barash et al., in collaboration with Klang, developed a DL algorithm capable of detecting and grading the severity of ulcers in CD. They divided the study into two parts. In the first part, 1108 pathological CE images were graded from 1–3 according to ulcer severity by two evaluators. Also, the inter-reader variability was calculated, reporting an overall inter-reader agreement of only 31% for the images (345/1108). In the second part, Barash and co-workers used a CNN to classify the ulcers’ severity automatically. They achieved an overall agreement between the consensus reading and the automatic algorithm of 67% (166 /248). This study was the first to use AI to assess ulcer severity rather than a binary classification (ulcer vs. normal)
[53].
The presence of ulcers suggests a worse prognosis for the disease, but the presence of strictures also does. As a result, Klang and co-workers recently tested a DL network capable of detecting CE images of strictures in CD. They used a dataset of 27,892 CE images (1942 stricture images, 14,266 normal mucosa images, and 11,684 ulcer images). Overall, the algorithm reported an average accuracy of 93.5% in detecting strictures and excellent differentiation between strictures, normal mucosa, and different grades of ulcers
[54].
9. Artificial Intelligence and Small-Bowel Cleansing
Properly evaluating images from CE requires a well-prepared bowel, which requires the absence of air bubbles, bile, and intestinal debris. A high-quality preparation ensures optimal visualization of the mucosa and allows the drawing of meaningful and reliable conclusions. This is particularly important in CE because the endoscopist has no control over the field of view, as illustrated in
Figure 2. Furthermore, there is currently no established gold standard for intestinal CE preparation due to the lack of objective and automated methods for evaluating cleansing
[5][55][56].
Figure 2. Images depicting the quality of small-bowel preparation. The left image corresponds to a satisfactory preparation. The right image corresponds to an excellent preparation.
Nowadays, there are both operator-dependent scores, such as Brotz and Park, and automated scores to evaluate intestinal CE preparation. The automated scores are considered objective, reliable, and reproducible, thereby overcoming the limitations of operator-dependent scores
[57].
Van Weyenberg et al. developed a computed assessment able to assess the quality of SB preparation using the PillCam
® CE system (Medtronic, Dublin, Ireland), based on the color intensities in the red and green channel of the tissue color bar (visible mucosa is associated with red colors, whereas fecal contamination lumen is associated with green colors). Comparing this method with three previous quantitative and qualitative scores, they found a high overall agreement, indicating that this method should be integrated into video CE reading
[58]. Later, Ponte et al. adapted this computed assessment to the MiroCam
® CE system (Intromedic, Seoul, South Korea); the results were found to be inferior to those reported by Van Weyenberg. However, it remained statistically significant, reinforcing the practicality of the automated score in different CE systems
[59]. Abou Ali et al. also adapted this method, using the PillCam
® CE system, achieving a sensitivity of 91.3% and a specificity of 94.7%, reinforcing that this computed assessment score had the potential for automated cleansing evaluation
[60]. Later, similar computed scores were created: Oumrani et al. used a multi-criteria computer-aided algorithm with three parameters tested individually or combined: the red/green ratio, an abundance of bubbles, and brightness. The main objective was to assess the quality of SB visualization in third-generation still frames, achieving a sensitivity and a specificity of 90% and 87.7%, correspondingly. These results were obtained with optimal reproducibility
[61].
More recently, studies in this area have been conducted utilizing DL algorithms. Noorda et al. developed a CNN capable of automatically evaluating the cleanliness of the SB in CE, classifying images as dirty or clean, using a dataset of over 50,000 images. They compared their algorithm with other algorithms that have more parameters, and their algorithm achieved an excellent balance of performance/complexity. They also compared the results of two medical specialists achieving acceptable agreement, with κ1 values of 0.643 and 0.608, corresponding with specialists one and two
[55]. Leenhardt et al. developed a CNN algorithm capable of assessing SB cleanliness during CE. This method reported a high sensitivity but a moderate specificity and a reading time of 3 ± 1 min
[62].
Nam et al. developed a software for calculating cleansing scores for the SB using a DL method. The training dataset consisted of a five-step scoring system based on mucosa visibility (five being more than 90% of mucosa visible, and one being less than 25% of mucosa visible). This score was compared to a clinical assessment evaluation by gastroenterologists, achieving a highly correlated score. This score aimed to provide a standard criterion for quantitative evaluations of CE preparation
[63]. Later, Ju et al. created a large-scale semantic segmentation dataset that, combined with a CNN, can differentiate the mucosa’s cleanliness with an accuracy above 94.4% to identify clean mucosa
[64].
In January 2023, Ju et al. compared an AI algorithm with the judgment of five gastroenterologists by evaluating 300 video clips with 3000 frames collected from 100 patients. This study reinforces the intra-variability within human judgment and concludes that there was no significant difference between the AI evaluation and human judgment. In addition, AI results were represented on a numerical scale, providing more detailed information
[56]. In April 2023, Ribeiro et al. designed a CNN capable of classifying the quality of intestinal preparation in CE. They used a three-level classification scale: excellent, satisfactory, and unsatisfactory, achieving a high accuracy, sensitivity, and specificity of 92.1%, 88.4%, and 93.6%, respectively. The methodology was particularly robust, using images from two different centers, two different SB-CE systems, and a large dataset (CE from 4319 patients, 12,950 images of SB mucosa). This study suggests a high potential for replicating this algorithm in real-time practice
[65].
Almost all studies emphasize the importance of validating these scores with different CE systems and bowel preparation types
[58][59]. The implementation of CNN algorithms opens the possibility of conducting valid comparative analyses of different preparation regimens. There are already randomized controlled trials that aim to answer this question using computed assessment. Houdeville et al. conducted the first ever research using an ML system to compare two polyethylene glycol (PEG)-based preparations, with and without simethicone, regarding bubble reduction as the primary outcome. Although there was no significant impact on the diagnostic and transit time, there was a marked reduction in the abundance of bubbles over the SB, specifically in the distal ileum. This research was significant as it enhanced the potential role of AI in establishing the gold standard for preparation in SB CE
[66].
AI will probably play a role in optimizing cleanliness scores, which is becoming a hot topic in gastroenterology. However, to this day, there has been no incorporation of scores in CE software
[5].
10. Miscellaneous—Artificial Intelligence and Hookworms/Functional Bowel disorders
Some studies aim to identify parasites in the GI tract, particularly hookworms. However, the initial studies did not achieve excellent results due to the difficulty faced by the algorithms in differentiating hookworms from other luminal contents
[67]. More recently, studies using CNNs achieved better results. Indeed, Gan et al. developed a CNN to detect hookworms in SB CE images automatically. This research reported a sensitivity, specificity, and accuracy of 92.2%, 91.1%, and 91.2%, respectively
[68].
There are also studies using AI to evaluate intestinal motor disorders, including the detection and analysis of contractions and the diagnosis of intestinal disorders
[69][70].