The development of convolutional neural networks has achieved impressive advances of machine learning in recent years, leading to an increasing use of artificial intelligence (AI) in the field of gastrointestinal (GI) diseases. AI networks have been trained to differentiate benign from malignant lesions, analyze endoscopic and radiological GI images, and assess histological diagnoses, obtaining excellent results and high overall diagnostic accuracy. Nevertheless, there data are lacking on side effects of AI in the gastroenterology field, and high-quality studies comparing the performance of AI networks to health care professionals are still limited.
Continuous innovations have allowed to improve many aspects of gastroenterologists daily clinical practice, from increasing early-stage diagnoses to expanding therapeutic boundaries. In the last decades, a great deal of attention has been focused on the development computer assisted systems that could be applied in endoscopy, radiology, and pathology to improve the diagnosis, treatment, and prognosis of many gastrointestinal diseases. Indeed, machine learning has evolved in recent years due to the usage of convolutional neural networks (CNN), the improvement of the training of such networks that build the basis of artificial intelligence (AI), the development of powerful computers with advanced graphics processing, and their increasing use in many diagnostic fields. However, although AI has been applied in a wide range of gastrointestinal diseases, high-quality studies that compare the performance of AI networks to human health care professionals are lacking, especially studies with prospective design and that are conducted in real-time clinical settings.
This narrative entreviewy will give an overview of some of the most relevant potential applications of AI for both upper and lower gastrointestinal diseases ( Table 1 ), highlighting advantages and main limitations and providing considerations for future development.
Table 1. Key points of AI application in GI disease.
Field | Key Points |
---|---|
ESOPHAGUS |
|
|
|
STOMACH |
|
|
|
LOWER GI TRACT |
|
|
Accumulating evidence shows the potential benefits of computer assistance (CA) in the management of esophageal conditions, such as Barrett’s esophagus (BE) and esophageal adenocarcinoma (EAC) [1]. In recent years, the ARGOS project has developed, validated, and benchmarked a computer-aided detection (CAD) system that could assist endoscopists in the detection and delineation of Barrett’s neoplasia. In their study, De Groof et al. showed that their system achieved a higher diagnostic accuracy compared to non-expert endoscopists, and it was potentially fast enough to be used in real time, taking 0.24 s to classify and delineate a Barrett’s lesion within an endoscopic image [2]. This study was conducted using a total of five independent image datasets used to train, validate, and benchmark the system. The first two datasets were used for pre-training and training respectively; the first dataset contained 494,364 endoscopic images from all intestinal segments, and the second contained 1247 high-definition (HD) white-light imaging (WLI) of confirmed early BE neoplasia and non-dysplastic BE. A third dataset containing 297 images was used for refining the training and for internal validation. Fourth and fifth datasets containing 80 images each of early BE neoplasia and non-dysplastic BE delineated by three and six experts, respectively, were used for external validation. The fifth dataset was also used to benchmark the performance of the algorithm versus 53 general endoscopists, showing an accuracy of 88% vs. 73%, a sensitivity of 93% vs. 72%, and a specificity of 83% vs. 74%, respectively [2]. Similarly, in 2020, Hashimoto and colleagues published a single-center retrospective study on a system developed for the detection of early esophageal neoplasia in BE. The algorithm was programmed to distinguish images of lesions with or without dysplasia. A total of 916 images of early esophageal neoplasia in BE and 919 images of BE without high-grade dysplasia were used for training the system. It was validated using 458 images, 225 with dysplasia and 233 without dysplasia, reporting a sensitivity, specificity, and accuracy per image of 96.4%, 94.2%, and 95.4%, respectively. The authors also found that the specificity for images taken with advanced imaging techniques, such as narrow-band imaging (NBI) and near focus, was significantly higher than white-light imaging (WLI) and standard focus [3].
Dedicated models analyzing WLI and NBI images have shown high disease-specific diagnostic accuracy not only for BE and EAC [4] but also for esophageal squamous cell carcinoma (ESCC) [5]. Dedicated algorithms have also been implemented to analyze enhanced endoscopy imaging, allowing to evaluate disease-specific mucosal and vascular patterns [6], the presence of submucosal invasion [7], the depth of invasion [8], and microendoscopy use for both ESCC [9] and BE [10]. Moreover, a recent meta-analysis has shown an overall high accuracy in the detection of early EAC, with a significantly better performance compared to endoscopists in terms of the pooled sensitivity (0.94 vs. 0.82, p = 0.01). However, these results were based mainly on studies where endoscopic images were reviewed retrospectively, whereas data from prospective trials are more limited [11].
A recent study was conducted on protruding lesions of the esophagus, integrating standard WL endoscopic images with endoscopic ultrasound (EUS) images [12]. The diagnostic accuracy in differentiating sub-types of protruding lesions of the AI system outperformed most of the endoscopists enrolled to interpret the images. In addition, when CA models and endoscopists predictions were combined, a higher diagnostic accuracy was achieved compared with the endoscopists alone [12]. CA has been used for image recognition of histology and pathology specimens to categorize dysplastic and non-dysplastic BE and EAC [13] and also for cytology samples obtained by wide-area transepithelial sampling (WATS3D) [14] or by Cytosponge [15], achieving promising results and matching the diagnostic performance of experienced pathologists.
Another worthwhile issue regarding the best management to offer to the patient with a diagnosis of gastric cancer would be to define the real necessity of eventual neoadjuvant chemotherapy (NAC). A study by Wang Y et al. [16], published in 2020, aimed to investigate the role of CT radiomics for differentiation between T2- and T3/4-stage cases in gastric cancer to avoid the adverse events of NAC in those patients who should directly undergo surgery. A total of 244 consecutive patients with pathologically proven gastric cancer were retrospectively included, and a training cohort of 171 patients and a validation cohort of 73 patients were provided. Preoperative arterial and portal phase contrast-enhanced CT images were retrieved. Arterial and portal phase-based radiomics model showed areas under the curve of 0.899 (95% CI 0.812–0.955) and 0.843 (95% CI 0.746–0.914) in the training cohort and 0.825 (95% CI 0.718–0.904) and 0.818 (95% CI 0.711–0.899) in the validation cohort, respectively. The results exhibited that the radiomics models based on the CT images may provide potential value for differentiation of the depth of tumor invasion in gastric cancer. Concerning the use of radiomics, the study of Shin J et al. [17] aimed to develop a radiomics-based prognostic model for recurrence-free survival (RFS) using preoperative contrast-enhanced CT in local advanced gastric cancer. This retrospective study included a training and an external validation cohort of 349 and 61 patients who underwent curative resection for gastric cancer without neoadjuvant therapies. The integrated area under the curve (iAUC) values for RFS prediction were 0.616 (95% CI 0.570–0.663), 0.714 (95% CI 0.667–0.759), and 0.719 (95% CI 0.674–0.764) in clinical, radiomic, and merged models, respectively. In external validation, the iAUC were 0.584 (95% CI 0.554–0.636), 0.652 (95% CI 0.628–0.674), and 0.651 (95% CI 0.630–0.673) in clinical, radiomic, and merged models, respectively. The radiomic model showed significantly higher iAUC values than the clinical model.
In addition to lesions detection, AI has been also investigated for automatic polyp characterization (CADx) and whether it can potentially distinguish precancerous from benign lesions, avoiding useless polyps’ removal for histological evaluation. In this setting, a pioneering study was performed by Tischendorf et al. [18] with a CADx system able to discriminate non-adenomatous from adenomatous polyps based on vascularization features with NBI magnification vision. Although good performances were obtained, human observers performed better than AI both in terms of sensitivity (93.8% vs. 90%) and specificity (85.7% vs. 70%).
Similar to CADe, CADx achieved better results with the introduction of deep-learning systems. A benchmark study in this setting was performed by Birne et al. [19], who tested an AI system on 125 polyps that were histologically defined as adenomas or hyperplastic. The AI performed a real-time evaluation of the polyps on NBI non-magnified vision according to the Narrow-band Imaging International Colorectal Endoscopic (NICE) classification [20]. The AI model did not reach enough confidence to predict the histology of 19 polyps, whereas for the remaining 106 polyps, it showed an accuracy of 94% (95% CI 86–97%), sensitivity for identification of adenomas of 98% (95% CI 92–100), specificity of 83% (95% CI 67–93), NPV of 97%, and PPV of 90%.
CADx was also evaluated using endocytoscopy (EC-CAD). This technique permits cellular nuclei visualization in vivo with ultra-magnification (×450). Mori et al. [21] reported the results of EC-CAD in four patients using EndoBRAIN (Cybernet Systems Corp., Tokyo, Japan), an AI-based system that analyzes cell nuclei, crypt structure, and micro-vessels in endoscopic images to identify colon cancers. This AI system was further investigated including a comparison between AI and humans (20 trainees and 10 expert endoscopists) [22]. Using methylene blue staining or NBI, EndoBRAIN identified colonic lesions significantly better than non-expert endoscopists, while only sensitivity and NPV were significantly higher compared to experts. Two main studies analyzed the potential application of AI in CADx for diminutive polyps [23][24][25], with promising results that were also confirmed by a recent meta-analysis [26], showing a sensitivity and specificity of 93.5% (95% CI, 90.7–95.6) and 90.8% (95% CI, 86.3–95.9), respectively.
These good performances could justify a “resect and discard” or “diagnose and leave” strategy. In the first case, polyps are still removed but not sent for histological analysis. According to Hassan and co-workers [27], this strategy could result in an annual saving of $25/person and a total of $33 million in the United States of America, with no relevant impact on the efficacy of CRC screening. On the other hand, a “diagnose and leave” strategy could avoid the risk of unnecessary of polypectomy and spare the cost of endoscopic polypectomy, which have been approximately estimated as $179 per person, giving a total saving of $1 billion per year to the United States of America health care system [28]. However, this strategy could expose patients to the risk interval of CRC due to the misdiagnosis of precancerous colonic lesions that would be left in place. Few data are available on CAD system applied to computed tomography colonography (CTC) for detection of colorectal polyps, mainly due to the high number of false positives (FPs). To overcome the issue, Ren et al. [29] proposed a CAD-CTC scheme using shape index, multiscale enhancement filters, and a total of 440 radiomic features. This scheme was evaluated on 152 oral contrast-enhanced CT datasets from 76 patients with 103 polyps ≥ 5 mm. The detection results were encouraging, achieving a high sensitivity and maintaining a low FP rate for polyps ≥ 5 mm. In addition, a recent proof-of-concept study [30] evaluated a non-invasive, radiomics-based, machine-learning differentiation of benign and premalignant colorectal polyps in a CT colonography datasets in an asymptomatic, average-risk colorectal cancer screening cohort including 59 patients. Results showed a sensitivity of 82% (95% CI: 74–91), a specificity of 85% (95% CI: 72–95), and AUC of 0.91 (95% CI: 0.85–0.96), providing a potential basis for future prospective studies in the setting of non-invasive analysis of CT colonography-detected polyps.
AI is rapidly integrating into clinical practice [31], becoming, in few years, a reliable tool for supporting physicians in the study of GI tract. This review focused on AI and diagnostic aspects (endoscopy, radiology, and pathology) of GI diseases and showed that AI seems to have a great potential in the field of detection of inflammatory, pre-cancerous, and cancerous lesions of GI tract ( Table 2 ).
Table 2. Summary of topics investigated.
Field | Disease | Topic Investigated |
---|---|---|
ESOPHAGUS | BE |
|
EAC |
|
|
ESCC |
|
|
STOMACH | HP |
|
CAG |
|
|
GIM |
|
|
GC |
|
|
LOWER GI TRACT | UC |
|
CD |
|
|
PCL |
|
From available data, AI seems to have high overall accuracy for the diagnosis of any neoplastic lesion, while for inflammatory disease, fewer studies have been performed but with encouraging results. Nevertheless, major limits should be carefully taken into account. First, AI performance results were sometime heterogeneous from one study to another, making it difficult to compare them [32]. Second, the size of training and test datasets varied widely across studies. Third, most CAD or CNN systems were developed in single centers, and many data come from pre-clinical studies, raising the concern of selection and spectrum bias. Finally, most of AI systems for endoscopy derived from retrospective, non-randomized setting, and standardization still remains an issue. In conclusion, AI is definitely changing our work with possible enormous potential benefits, but thresholds for guidelines for standard patient care are needed also to overcome major limitations that, to date, represent important ethical issues and obstacles for its widespread use and implementation.