Defining FPs based on the duration of time is an objective way of classifying FPs. However, the threshold required for reporting FPs is unsettled. One report suggested that only FPs > 2 s be reported, and another only reported FPs > 1 s, while the majority of FPs (i.e., more than 90%) lasted <0.5 s. It is unknown whether ignoring the transient FPs (i.e., those lasting for <1 or 2 s) would increase the risk of missing a real polyp.
Missed lesions account for 57.8% of interval colorectal cancers (i.e., cancers that occur within 3–5 years after a negative colonoscopy) . To reduce incidences of missed lesions and interval cancers, measures were proposed to improve the quality of colonoscopies. One of the most important quality metrics is the adenoma detection rate (ADR), defined as the proportion of patients with at least one adenoma .
Artificial intelligence (AI) is being used in the computer-aided detection (CADe) and diagnosis (CADx) of polyps . Randomized controlled trials (RCTs) showed CADe-assisted colonoscopy significantly increased the ADR . A meta-analysis confirmed that the ADR was significantly higher in the CADe group than in the conventional group (36.6% vs. 25.2%; RR, 1.44; 95% confidence interval, 1.27–1.62; p< 0.01; I2= 42%) .
An accompanying limitation of the CADe is false positives (FPs), which occur when the algorithm identifies a “polyp” that the endoscopist would disagree with. FPs were ranked 3rd in importance among 59 future research questions related to CADe . We assessed CADe-overlaid video analyses, RCTs using real-time CADe to enhance polyp detection during colonoscopies, and studies that used FPs as the primary outcome. We test the hypothesis that the systematic review of the literature on FPs will yield insight into methods of managing and limiting the adverse effects of this drawback of CADe.
The time expended to differentiate an FP from a true lesion can potentially increase the withdrawal time. Although most RCTs on the real-time application of CADe found a longer withdrawal time in the CADe group compared to the control group , the withdrawal time without biopsy was not significantly different. In a post hoc analysis of a small fraction (40/342 or <11.7%) of the original CADe groups in the RCT studies, Hassan et al. found that 94% of FPs were discarded by the endoscopist immediately without further exploration, and the time wasted on the remaining FPs only contributed to about 1% of the withdrawal time. In a real-life situation, where the bowel preparation is usually less than optimal and endoscopists are less experienced, the impacts of bowel preparation on FPs and withdrawal time require more objective studies.
The presence of FPs might lead to unnecessary biopsies of non-neoplastic tissues. (with another 1 unreported  and 1 showing no difference ) listed in Table 1 showed a significant increase in the biopsy of non-neoplastic polyps in the CADe group, which was typically double the number reported for the control group. The removal of hyperplastic polyps—other than the diminutive ones at the distal rectosigmoid colon—is justified, as these polyps contribute to the serrated pathway of colorectal carcinogenesis . If these biopsies were, in fact, unwarranted, then there exists an avoidable non-indicated use of medical resources.
Table 1. Recent RCTs comparing real-time CADe with control on adenoma detection during colonoscopy.
|Study||Location of Study||Control vs. CADe (n)||Overall ADR||Non-Neoplastic Polyps Detected, n (%)||CADe Used During Insertion||Number of Screens Used||Withdrawal Time, Mean, Minutes||Withdrawal Time, Exclude Biopsy, Mean, Minutes|
|Wang et al. ||China||536 vs. 522||20.3% vs. 29.1% *||94 (34.9) vs. 217 (43.6) * (hyperplastic plus inflammatory)||No||2||6.39 vs. 6.89 *||6.07 vs. 6.18|
|Wang et al. ||China||478 vs. 484||28% vs. 34% *||113 (37) vs. 200 (40) * (hyperplastic plus inflammatory)||No||1||6.99 vs. 7.46 *||6.37 vs. 6.48|
|Repici et al. ||Italy||344 vs. 341||40.4% vs. 54.8% *||57 (16.6) vs. 68 (19.9) (Normal, hyperplastic, inflammatory and others)||Yes||1||NA||7.0 vs. 7.3|
|Su et al. ||China||308 vs. 315||16.5% vs. 28.9% *||NA||No||2||5.68 vs. 7.03 *||6.74 vs. 6.82 *|
|Liu et al. ||China||518 vs. 508||23.9% vs. 39.1% *||92 (37.1) vs. 203 (41.8) * (proliferative and inflammatory)||No||2||NA||6.32 vs. 6.37|
|Liu et al. ||China||397 vs. 393||20.9% vs. 29.0% *||87 (42.7) vs. 222 (52.7) * (hyperplastic and inflammatory)||No||1||6.94 vs. 7.29 *||6.62 vs. 6.71|
The application of the CADx to characterize the polyps following their detection with the CADe might help reduce the number of unnecessary polypectomies of non-neoplastic polyps. Preliminary results showed promise for simultaneously classifying polyps with endocytoscopic images , or even with white light images  after using the CADe to detect the polyps in white light.
The recurrent appearance of FPs on the screen may lead to increased fatigue and decreased vigilance on the part of the endoscopist . Inundating the endoscopist with such a large amount of prompts on the screen, even if only very transient attention is demanded for each prompt, engenders the risk of the fatigue of the endoscopist. However, a study showed that a real-time CADe system, integrated on one primary endoscopy monitor instead of the two monitors used in most RCTs (Table 1), improved the ADR without an increase in the subjective fatigue level reported by the endoscopists during the colonoscopy . The unblinded report, developed by proponents of the CADe algorithm under study, raised questions regarding the objectivity of the results.
False positives cause distractions and the need for refocusing, potentially resulting in adverse effects during the search for real polyps. To illustrate how difficult it is to refocus after distraction, a study on mobile phone use while driving showed that the risk of a rear-end accident occurring increased by 2.34–3.56 times, despite increasing their time headway by 0.41–0.59 s to offset the distraction of texting while driving .
Too many FPs may hamper the enthusiasm of the endoscopist to apply the CADe in clinical practice. One recent survey on the views of gastroenterologists regarding the potential use of artificial intelligence found that 33.9% of respondents worried about high numbers of FPs . Reports that emphasize the lack of importance of FPs based on subjective assessment need to be re-evaluated by studies with more objective and unbiased designs.
There is considerable variability in FPRs in the literature (Table 2). This variability suggests that there are diverse definitions of FPs and various conditions that affect the occurrence of FPs inside the bowel lumen, which indicates that there is an opportunity to minimize FPs through standardizing the definitions of FPs and optimizing the condition of the bowel lumen.
Table 2. Recent studies using CADe-overlaid videos for real-time detection of polyps.
|Study||Primary Outcome||Videos Reviewed (n)||Polyps Detected||Sensitivity||Specificity|
|Misawa et al. ||Accuracy of CADe||155 positive videos and 391 negative videos. Most of the polyps were flat.||NA||Per-frame: 90%||Per-frame: 63.3%|
|Urban et al. ||Polyp detection by CADe||9 randomly selected colonoscopy videos||Performing endoscopist: 28
Three expert reviewers without CADe: 36
One expert reviewer with CADe: 45
|Per-polyp: 94%||Per-frame: 93%|
|Becq et al. ||Polyp detection by CADe||50 colonoscopies from consecutive patients with various bowel preparations.||Performing endoscopist: 55
CADe: 401 possible polyps (100 definite polyps, 63 possible polyps, and 238 false positives
|Guo et al. ||Accuracy of CADe||50 videos with small polyps and 50 videos without polyps.||NA||When confidence level ≥10%, per-frame: 66.9%
When confidence level ≥30%, per-frame: 56.8%
|When confidence level ≥10%, per-frame: 92%
When confidence level ≥30%, per-frame: 98%
|Wang et al. ||Accuracy of CADe||138 videos with polyps and 54 videos without polyps||NA||Per-frame: 91.6%||Per-frame: 95.4%|
|Misawa et al. ||Accuracy of CADe in a large, publicly accessible database.||100 videos||NA||Per-frame: 90.5%
|Per frame: 93.7%|
|Hassan et al. ||Accuracy of CADe||138 polyp-positive short videos||NA||Per-frame: 99.7%||NA|
|Lee et al. ||Accuracy of CADe||15 unaltered videos||Performing endoscopist: 38
|Podlasek et al. ||Accuracy of CADe||42 colonoscopy videos||Reviewer: 84
An example of a simple method that could be used to reduce FPs is re-training the CADe algorithms with scenarios that currently lead to FPs. Another approach could be the adoption of recurrent neural networks, which have memory and can process temporal sequences of frames in a way that is similar to the learning process of human brains . (You Only Look Once, Version 3), a state-of-the-art, real-time object detection algorithm, better specificity was achieved (increasing from 90.9% to 93.7%) To filter out most short flashes, Podlasek et al. suggested setting a threshold of persistent time for FPs to show up; however, this method might introduce a minor detection lag, depending on the desired sensitivity .
Optimal bowel preparation is the prerequisite for a high-quality CADe-assisted colonoscopy and is associated with fewer FPs . As the major source of CADe FP alerts is the wrinkled walls, they can be reduced by ensuring adequate luminal insufflation. The use of an anti-spasmodic agent, such as Hyoscine-n-butylbromide, might be helpful in reducing the contraction of the colon wall . Adding simethicone or rinse water to the bowel preparation regimen helps eliminate bubble-induced FPs .
Before the FPs can be effectively reduced, proper training of the endoscopist to recognize and ignore FPs is needed to enable the widespread adoption of the CADe for the detection of colon neoplasms .
The optimization of the condition of the bowel lumen can be controlled by the colonoscopist using water exchange colonoscopy, which will be discussed in detail below.
Among the Gastrointestinal (GI) Endoscopy Editorial Board’s top 10 topics in endoscopy in 2019, water exchange (WE) and artificial intelligence (i.e., CADe) were both considered important advances in GI endoscopy . The coincidence brought both to the forefront of the discussion on the improvement of ADR.
Compared with traditional gas (i.e., air or CO2) insufflation for colonoscopes, WE is an effective insertion method that minimizes insertion pain and enhances ADR . It features infusing water to guide the scope advancement in an airless lumen, while suctioning the infused water at the same time during insertion, thus aiming at the almost complete removal of the infused water when cecal intubation is achieved. A network meta-analysis concluded that WE produced the highest ADR when compared with water immersion and gas insufflation . A modified Delphi review also endorsed WE as having better bowel cleanliness, as well as less insertion pain and higher ADRs, than gas insufflation .
WE can effectively salvage-clean bubbles and fecal debris during insertion, resulting in better bowel cleanliness during withdrawal. WE consistently showed better Boston Bowel Preparation Scale (BBPS) scores than air insufflation, both in the whole colon and the right colon, the latter of which was usually the dirtiest colon segment . WE might also help reduce FPs associated with crumpled folds, as there is less need for suction cleaning, and thus the related spasms, during withdrawal . In an analysis of the CADe-overlaid withdrawal phase videos of colonoscopies from an RCT comparing right colon ADR inserted with WE or air insufflation, Tang et al.
WE and CADe both increase ADR but through different mechanisms. WE increases ADRs mainly through insertion salvage cleaning, thus revealing otherwise unexposed polyps (Table 3). On the other hand, CADe works as a second observer and points out polyps that are exposed but not recognized due to human error . In other words, the individual strengths of WE and CADe complement the weakness of one another.
Table 3. BBPS scores in key randomized controlled trials comparing ADRs between WE and air insufflation.
|Study||Sample Size, Air Insufflation vs. WE (n)||Primary Outcome: ADR (95%CI)||Overall BBPS Scores or||Right Colon BBPS Score|
|Jia et al. ||1650 vs. 1653||13.4% vs. 18.3%; RR 1.45 (1.20–1.75) *||7.0 ± 2.3 vs. 7.3 ± 1.6 # (Mean ± SD)||2.3 ± 0.7 vs. 2.2 ± 1.5 #|
|Hsieh et al. ||217 vs. 217||37.5% (31.6–44.4%) vs. 49.8% (43–56.4%) *||6.2 ± 1.1 vs. 7.1 ± 1.3 # (Mean ± SD)||NA|
|Cadoni et al. ||408 vs. 408||43.4% (35.6–45.3 %) vs. 49.3% (44.3 –54.2 %) *||8.0 (6.0–9.0) vs. 9.0 (7.0–9.0) # [Median (IQR)]||2.0 (2.0–3.0) vs. 3.0 (2.0–3.0) #|