Artificial Intelligence in Polyp Detection

Artificial Intelligence in Polyp Detection: Comparison

Please note this is a comparison between Version 1 by Edward Young and Version 2 by Jason Zhu.

There has been an exponential rise in the availability of artificial intelligence systems in endoscopy in recent years. As a result, maintaining an informed understanding of the utility and efficacy of existing systems has become increasingly complex.

colonoscopy
artificial intelligence
polyp

1. Introduction

Since 2016, researchers have published deep learning algorithms for polyp detection (CADe) that have been tested in pre-clinical applications, such as polyp detection in still images or videos ^[1][9]. Only 3 years later, the first randomised controlled trials (RCTs) comparing CADe with existing standards were published ^[2][10]. Since then, there has been a vast amount of research published on real-time CADe systems, with strong support for their efficacy in polyp detection. Of the 15 RCTs reviewed here, 10 demonstrated a statistically significant increase in adenoma detection, although baseline and CADe adenoma detection rates (ADRs) are highly varied because of differing populations and study designs ^{[2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]}[10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. Although overall lesion detection is generally improved, many of these systems have been criticised for a lack of impact on the detection of advanced adenomas of heightened clinical significance. Many argue that these larger polyps are less likely to be missed by endoscopists, making the implementation of CADe systems less pivotal. While it may be true that larger polyps are less likely to be missed by endoscopists, the lack of demonstrable impact of CADe systems for advanced adenomas may simply reflect their reduced prevalence and, hence, the larger numbers required to adequately power these studies. For example, in the largest RCT by Xu et al., including 3059 patients, there was a statically significant increase in advanced adenoma (>10 mm, villous component or high-grade dysplasia) detection in the CADe group versus the control group (6.6% vs. 4.9%, p = 0.041) ^[4][12].

In an effort to synthesise the expanse of research in this area, multiple meta-analyses have been published comparing CADe with high-definition white light imaging (HD-WLI) control groups. These studies have universally found an increase in ADR with CADe, with a 1.43–1.78 times increase in ADR versus HD-WLI ^{[17][18][19][20][21][22][23][24][25][26][27]}[25,26,27,28,29,30,31,32,33,34,35]. The most significant difference has been in the detection of diminutive (<5 mm) adenomas. For larger polyps, the results have been varied, with four of the seven meta-analyses specifically analysing >10 mm adenomas finding a statistically significant improvement in detection. Interestingly, in their 2021 meta-analysis, Zhang et al. actually reported a reduction in the detection of advanced adenomas with CADe ^[26][34]. While this raises the possibility that the time and concentration consumed by higher diminutive polyp detection with CADe may detract from the detection of advanced lesions, this has not been borne out in other meta-analyses and was not the case in the largest RCT to date ^[4][12]. Sessile serrated lesions (SSLs) are a polyp subtype prone to being missed during colonoscopy because of their inconspicuous nature, as they are generally flat and difficult to differentiate from surrounding normal mucosa. For SSLs, RCTs have not been powered to demonstrate an effect as their incidence is considerably lower compared with adenomas. However, three meta-analyses assessed SSLs specifically, demonstrating a between 1.37- and 1.52-times increase in SSL detection with CADe, though one of these did not reach statistical significance ^[17][22][25][25,30,33].

Overall, prospective studies into CADe for adenoma detection have been optimistic. Although many studies have not shown improved advanced adenoma detection, multiple meta-analyses and the largest RCT to date suggest that this is likely the case, and it has been conclusively demonstrated to improve the detection of diminutive adenomas. However, with the advent of commercially available CADe systems, data are now available in a real-world context, which may have greater generalisability than those conducted in a clinical trial setting. The largest of these, published by Ladabaum et al. in 2023, was a pragmatic real-world retrospective study whereby data were collected following the implementation of CADe in a single centre, compared with concurrent and historical controls ^[28][36]. The introduction of CADe resulted in no statistically significant difference in any detection metric, including ADR, adenomas per colonoscopy, or advanced adenoma detection. This was further supported by Levy et al., who demonstrated a reduction in ADR from 35.2% to 30.3% (p < 0.001) in their single-centre cohort study ^[29][37]. These studies highlighted the potential pitfalls of the use of CADe, including less thorough mucosal exposure due to a ‘false sense of security’ from the AI assistance; proceduralists dismissing lesions not highlighted by AI; and the cumulative effect of false positive detection and the resulting increase in withdrawal time. However, in two other large real-world propensity score-matched studies including a cumulative 2262 patients following the implementation of CADe, its introduction resulted in a 1.32–1.59-times higher ADR when compared with HD-WLI ^[30][31][38,39].

The differing results in these real-world implementation studies may relate in part to differences in the impact of AI on expert referral centres with already high ADR versus lower ADR proceduralists. Given the nature of the limited availability of CADe systems thus far, few studies have examined their impact on low-ADR endoscopists. of the five studies not demonstrating a difference in ADR with CADe, only one study had a baseline ADR of less than 36% ^[30][31][38,39]. In study by Wang et al., the control group included a second observer and was, therefore, not strictly a ‘standard of care’ control ^[5][13]. In one such study with a low baseline ADR, adenoma detection improved from 19.9% to 26.4% with the introduction of CADe ^[30][38]. Interestingly, proceduralists were stratified by experience, with experts defined as having performed more than 1000 colonoscopies, rather than by ADR. In doing so, they found no improvement in ADR in the ‘non-expert’ group. This raises the possibility that baseline ADR is of greater significance than procedural experience when determining the impact of CADe. This was also supported by Repici et al., who compared ADR with and without CADe across 660 colonoscopies performed by non-experts (<2000 colonoscopies) and found no correlation between examiner experience and the impact of AI on ADR ^[9][17]. In contrast, although not a controlled comparative study, Biscaglia et al. showed that with the assistance of CADe, trainee endoscopists (200–400 previous colonoscopies) could achieve the same ADR on tandem colonoscopy with expert, high-ADR endoscopists without AI assistance ^[32][40].

While ADR is often used as a surrogate marker, the adenoma miss rate (AMR) is the most direct correlate with the potential for bowel cancer development despite surveillance colonoscopy. Few studies have directly examined the impact of CADe in this context. AMR refers to the number of adenomas ‘missed’ during a colonoscopy, generally based on tandem colonoscopy studies where an immediate repeat procedure detects additional adenomas. Three tandem colonoscopy studies have compared AMR for CADe versus HD-WLI, with a significant reduction when using CADe ^[33][34][35][41,42,43]. The SSL miss rate was higher in all three studies with HD-WLI, with two reaching statistical significance. In addition, non-polypoid and right-sided adenomas, both of which are frequently missed at colonoscopy, were less likely to be missed with the use of CADe. These are promising data for the potential of CADe to standardise the quality of colonoscopy by reducing miss rates for these more inconspicuous polyp subtypes.

Multiple previous studies have demonstrated the impact of fatigue on ADR, presumably because of a higher likelihood of human error. A 2009 retrospective study of 3619 colonoscopies found an ADR of 29.3% in the morning versus 25.3% in the afternoon (p = 0.008) ^[36][44]. This was reinforced by a prospective study that found that 27% more polyps were detected per patient during early morning cases, with an hour-by-hour decrease in adenoma detection as the day progressed ^[37][45]. Given CADe aims to reduce the likelihood of human error, two studies have assessed its role in preventing deterioration in ADR from physician fatigue. Lu et al. undertook a post hoc analysis of two prospective RCTs comparing CADe with HD-WLI, finding that while the ADR in morning sessions was higher in the control group, there was no longer any statistically significant difference in the CADe group ^[38][46]. In this cohort, the OR for adenoma detection during afternoon colonoscopy with CADe assistance versus without was 3.81 (95% CI 2.1–6.91) ^[38][46]. Similarly, Ritcher et al. performed a retrospective database analysis comparing ADR with CADe versus HD-WLI over the course of a day, demonstrating that while there was a statistically significant trend towards reduction in ADR throughout the day with HD-WLI (p = 0.015), this trend was no longer present in the CADe-assisted group (p = 0.65) ^[39][47].

2. Criticisms of CADe

The two main criticisms of CADe are the impact on procedure time and the high rates of distracting false positive polyp identifications. In a 2022 ESGE position statement, the overwhelming consensus was that, for the use of CADe to become widespread, it would need to have an acceptable false-positive rate such that it does not significantly prolong procedure times ^[40][48].

Despite initial concerns from image- and video-based studies, the actual rates of false positives that have a meaningful impact on withdrawal time appear to be low, with 91% of false positives lasting less than half a second ^[41][49]. In their post hoc analysis of an RCT, Hassan et al. found that while overall false positive rates are high (27.3 per colonoscopy), only 5.7% of false positives required an additional exploration time of 4.8 s per false positive, adding a negligible 1% increase in total withdrawal time ^[42][50]. Nevertheless, although the majority of false positives are short-lived, they still have a considerable impact on proceduralist fatigue, with more than 80% of gastroenterologists reporting concerns regarding excessive false positive alerts in a 2023 survey assessing one commercially available CADe system ^[43][51]. These false positive alerts from CADe are most often related to bubbles or faeces falsely identified as polyps. As a result, Tang et al. examined whether this could be minimised using water exchange colonoscopy (where water is used rather than CO₂ insufflation during colonoscope insertion while, at the same time, fluid is suctioned to clear the lumen) in order to clear the field of view of the mucosa. In their 2022 study, they demonstrated a significant increase in the additional polyp detection rate with CADe versus HD-WLI after water exchange colonoscopy (30.1% vs. 12.3%, p = 0.001), with a lower rate of false positives related to faeces (p = 0.007) and bubbles (p = 0.001) due to the clearer field upon colonoscope withdrawal ^[44][52]. Techniques such as water exchange colonoscopy, therefore, stand to enhance the performance of CADe not only by improving mucosal visualisation but also by reducing rates of distracting false positives.

Regarding withdrawal times, it remains difficult to assess the true mucosal inspection time without this being impacted by the additional time spent on polyp assessment and resection. Though studies generally pause a stopwatch at the time of polypectomy, there are still delays when a polyp is found, for example, while the stopwatch is paused and restarted on each occasion.

3. Cost Effectiveness

There are controversies surrounding the cost-efficacy of implementing CADe-assisted colonoscopy in screening programs. Initially, the increase in adenoma detection will result in an increased healthcare burden because of requirements for pathological evaluation and a shortening of surveillance intervals. However, eventually, the reduction in adenoma miss rates may mean that surveillance guidelines are able to be adjusted, and there are significant cost savings if advanced colorectal cancers are able to be prevented. In 2022, Mori et al. investigated this further by performing a pooled analysis of RCTs, demonstrating that the proportion of patients who were recommended more intensive surveillance according to US guidelines increased from 8.4% in the control group to 11.3% in the CADe group (RR 1.35, 95% CI 1.16–1.57), which would place a significant burden on a strained healthcare system ^[45][53]. However, Areia et al. developed a microsimulation model in a hypothetic cohort to show that the implementation of CADe detection in a US population resulted in a yearly additional prevention of 7194 colorectal cancer cases and 2089 related deaths, with cost savings of USD 290 million ^[46][54]. This is aptly described in the World Endoscopy Organisation position statement on AI in colonoscopy in 2023, which states the following: ‘In the short term, use of CADe is likely to increase health-care costs by detecting more adenomas’, but ‘the increased cost by CADe could be balanced by savings in costs related to cancer treatment due to CADe-related cancer prevention‘ ^[47][55].

4. Summary

CADe systems lead to improved adenoma detection, particularly for diminutive adenomas and polyp subgroups more likely to be missed because of human error, including non-polypoid adenomas, right-sided adenomas, and SSLs. While this has not yet been consistently supported by ‘real-world’ studies, the existing retrospective studies introduce forms of bias that may influence results. What has been demonstrated, however, is that, with the support of CADe, regular endoscopists can achieve equivalent performance in adenoma detection to expert high-ADR endoscopists in referral centres, standardising the quality of service provision. Given the dramatic increase in demand for colonoscopy with the implementation of population screening programs, not all patients will have access to expert referral centres for colonoscopy. CADe systems, therefore, have the capacity to make equality of healthcare provision a reality despite inevitable resource limitations. This sentiment is echoed by the European Society of Gastrointestinal Endoscopy (ESGE) 2022 position paper on AI in gastrointestinal endoscopy, stating that ‘the task of AI is to lift the less experienced to the level of experienced endoscopists rather than to further increase the high ADR values of the high-detector experts’ ^[40][48]. In this way, CADe is clearly meeting its objective.