2. Application of AI in the Pre-Endoscopy Period for Patient Risk Assessment
Upon presentation at the hospital, stratification of patients in terms of gastrointestinal bleeding (GIB) risk is recommended
[6][7][9]. Accurately identifying (“phenotype”) patients with GIB during initial assessment is the first step toward patient management, especially during these times of the COVID-19 pandemic. Shung et al.
[10] used multiple natural language processing (NLP)-based approaches for automated phenotyping of patients in the emergency department. They found that the syntax-based NLP algorithm from patient triage information performed better than the systematized nomenclature of medicine code information for the patient’s condition, which allows early use of patient triage to subsequent patient management.
In the past two decades, three widely validated scoring systems, namely, Glasgow–Blatchford score (GBS) for outpatient management
[11], Rockall score for mortality
[12], and the AIMS65 score
[13][14][15], have been utilized for predicting low-risk patients. However, compared with these conventional scores
[16], ML can potentially improve risk assessment for the need for transfusion, endoscopic evaluation, or hospital admission for observation. Clinical ML use is also more feasible than such conventional scores for busy clinicians through the automatic deployment of ML models with existing available electronic health records in many healthcare systems. In 2003–2008, nine small studies were conducted to investigate ML’s potential for PUB risk assessment in comparison with the conventional scores
[16]. The median areas under the curve (AUCs) were higher in artificial neural networks (0.93; range, 0.78–0.98) than in other ML models (0.81, range: 0.40–0.92) when predicting patient mortality, intervention requirement, or rebleeding. Moreover, ML generally provided a better prognostic performance in patients with GIB than conventional scores, and artificial neural networks tended to outperform other ML models.
In 2020, Seo et al.
[17] prospectively analyzed 1439 PUB cases to compare the accuracy of ML and conventional scores for PUB patient instability including hypotension, rebleeding, and mortality. Four ML algorithms, namely, logistic regression with regularization, random forest classifier (RF), gradient boosting classifier (GB), and voting classifier (VC), were compared using the GBS and Rockall scores. The RF model was the most accurate in predicting mortality (AUROC: RF 0.917 vs. GBS 0.710), while the VC model was the most accurate for hypotension (VC 0.757 vs. GBS 0.668) and rebleeding within 7 days (VC 0.733 vs. GBS 0.694). The global feature importance analysis identified clinically significant variables, including blood urea nitrogen, albumin, hemoglobin, platelet, prothrombin time, age, and lactate. Thus, the ML models may be helpful in early predicting high-risk patients with initially stable upper GIB upon admission to the emergency department. However, ML performance relies on the quality of data, and these studies usually had a small sample size (<1000 cases) with no external validation data for their performance.
Shung et al.
[18] were the first to conduct a large prospective international study for building an ML model for patients with PUB by testing and comparing the performance of the ML model and the conventional scoring system in 2020. They collected patient data from medical centers in four countries (US, Scotland, England, and Denmark;
n = 1958) to build a model that can predict the need for hospital-based intervention (transfusion or hemostatic intervention) or 1 month mortality. Data from two Asia-Pacific sites (Singapore and New Zealand;
n = 399) were externally validated. Only nonendoscopic features such as age, sex, clinical symptoms, and laboratory variables (hemoglobin, albumin, international normalized ratio, urea, and creatinine) were selected to build the model. The ML model showed a higher AUC (0.91) than GBS (0.88,
p = 0.001), Rockall score (0.73,
p < 0.001), and AIMS65 score (0.78,
p < 0.001). In the external validation cohort, the ML still achieved a higher AUC (0.90) than GBS (0.87,
p = 0.004), Rockall score (0.66,
p < 0.001), and AIMS65 score (0.64 (
p < 0.001). The proposed ML model improved the identification of low-risk patients who can be safely discharged early from the emergency department. Importantly, this ML model identified more than two times the number of patients with very low risk than the available best-performing clinical risk tool.
After presentation in the hospital, initially stable patients who are at risk for hemodynamic instability requiring blood transfusion must be identified during the dynamic monitoring of the patient status. Levi et al.
[19] developed an ML model using publicly available intensive care unit (ICU) databases of 14,620 records with input variables, including several laboratory analyses and demographic information. Their model, which was based on the patient’s vital signs and laboratory test changes in the first 5 h of ICU admission, showed a high level of accuracy (overall AUC, 0.80) in predicting the need for transfusion in the next 24 h of admission.
Therefore, such an algorithm is essential to provide improved risk assessment through the automatic retrieval of information from electronic health records, thereby allowing timely decision support in an already crowded clinical scenario.
3. Application of AI during Endoscopy
Forrest
[20] described the endoscopic classification of PUB in 1974 (
Figure 2). The classification requires endoscopist judgment of the risk for rebleeding and the need for endoscopic intervention. Current guidelines
[3][6][7] suggest that patients who are highly at risk for ulcers, such as those with active spurting, active oozing, or a nonbleeding visible vessel, should receive endoscopic therapy because of the high risk for persistent bleeding or rebleeding, especially when only relying on drug therapy. However, the ability to make a correct classification varies with the endoscopist’s experience, whereby an experienced endoscopist
[21][22] can reportedly make better clinical judgment than clinical risk scores
[23]. In the study of Laine et al.
[24], the rate of correct identification of the endoscopic characteristic of hemorrhage increased as the endoscopic experience increased (performing five cases per month), from 59% to 73% before a training course. After the training course, the increase was related to the training level: fellows, 15% increase; physicians with 0–20 years of experience since training, 8% increase; physicians with an experience of 20 years or more since training, 3% increase. In an Italian study, Forrest Ia/b lesions showed a high interobserver agreement, whereas Forrest II/III lesions exhibited a low agreement
[25].
Figure 2. Forrest classification of bleeding peptic ulcers.
To explore whether AI is useful for identifying the endoscopic characteristics of hemorrhage during endoscopy, our study
[26] initiated the proposal of a DL model that can classify endoscopic images with different bleeding risks according to the Forrest classification and using 2378 still endoscopic images from 1694 patients with PUB (
Figure 3). The agreement of the model was moderate to substantial with the senior endoscopist on the testing dataset. The accuracy of the DL model was higher than that of a novice endoscopist. Therefore, the DL model has potential use, particularly in aiding young endoscopists in decision making during emergent endoscopy.
Figure 3. Illustration of the DL approach for analyzing endoscopy images in peptic ulcer disease: (a) heatmap image showing an active bleeder in the endoscopy image (upper); (b) segmentation of the ulcer area (left) from the original endoscopy image (right).