1. Introduction
Glaucoma, often referred to as the “silent thief of sight”, is the leading cause of irreversible blindness worldwide
[1]. Its insidious nature, characterized by a gradual loss of peripheral vision often unappreciated by the patient, underscores the critical importance of early detection and continuous monitoring
[2]. In the clinic, glaucoma is typically diagnosed and monitored using a multimodal approach, including tonometry to measure intraocular pressure (IOP), visual field tests, optical coherence tomography (OCT), and fundoscopic examinations
[3]. These methods, while foundational, have their limitations: tonometry can be influenced by corneal thickness, visual field tests depend on patient responsiveness, and OCT and fundoscopic exams require expert interpretation, often with some degree of subjectivity
[4]. Given these constraints, as the emphasis on early detection and intervention grows, there is an unmet need for more consistent, objective, and precise monitoring techniques
[5]. Artificial intelligence (AI) has emerged as a solution to harness this extensive data, aiming to offer automated, consistent, and predictive insights into all areas of glaucoma care
[6].
AI’s ability to analyze vast amounts of data, detect intricate patterns, and predict disease trajectories offers a paradigm shift in how researchers approach glaucoma diagnosis and monitoring
[7]. In tracing the historical context, the journey of AI’s integration into glaucoma care is a testament to the continuous evolution of medical technology. The late 20th and early 21st centuries saw a surge in ophthalmic imaging techniques, notably OCT, providing high-resolution views of the optic nerve and retinal layers
[8]. With this influx of data came increased challenges in interpretation. It was during this phase, particularly in the 2010s, that AI began making its mark. Leveraging machine learning algorithms, early applications of AI sought to automate the analysis of visual fields and OCT scans, aiming to identify subtle patterns indicative of glaucoma progression
[9]. Transitioning through the years, as datasets grew and algorithms became more sophisticated, AI’s role transitioned from simple analysis to prediction, including forecasting disease trajectories and potential treatment outcomes.
In light of these developments, this research focuses on the multifaceted applications and implications of AI in glaucoma care, which are illustrated in Figure 1. At the core of this schematic is the AI continuum, representing various AI methodologies such as Machine Learning (ML), Neural Networks (NN), and Deep Learning (DL), which form the foundation for advanced data analysis in glaucoma research.
Figure 1. The spectrum of AI applications in glaucoma healthcare.
2. From Traditional to AI-Enhanced Data Collection
Data collection for glaucoma diagnosis has traditionally revolved around a combination of clinical assessments and specialized imaging techniques. Key metrics such as IOP are gathered using tonometry, while the structure of the optic nerve head and retinal nerve fiber layer (RNFL) are visualized through imaging modalities such as OCT and fundus photography. Additionally, visual field tests map out the patient’s field of vision, identifying any deficits or abnormalities characteristic of glaucoma. Recognizing the limitations of traditional methods, as the volume of diagnostic data grew, so did the need for more efficient and accurate data processing.
The integration of AI is beginning to transform the data collection process in glaucoma care, enhancing both efficiency and accuracy. For example, AI-powered imaging devices are now being developed to auto-calibrate based on patient specifics, which could potentially improve image quality
[10]. Additionally, emerging AI algorithms in these devices process data in real-time, providing immediate insights and the ability to predict trends based on historical data
[11]. The Retinal Fundus Glaucoma Challenge (REFUGE) represents a pivotal step in AI-driven ophthalmology
[12]. Established with MICCAI 2018, it addressed the constraints of conventional glaucoma assessment using color fundus photography. REFUGE introduced a groundbreaking dataset of 1200 fundus images with detailed ground truth segmentations and clinical labels, the largest of its kind. This initiative was crucial for standardizing AI model evaluations in glaucoma diagnosis, allowing for consistent and fair comparisons. Notably, some AI models in the challenge surpassed human experts in glaucoma classification, demonstrating AI’s potential in enhancing diagnostic precision through an advanced, large-scale dataset
[12]. Although this marks a significant advancement, the field is still in the early stages of transitioning into the AI era, with traditional methods and AI-based approaches coexisting.
This evolution is further augmented by the advent of wearable technology, which introduces new dimensions in continuous monitoring and real-time data analysis for glaucoma management. With the advent of wearable technology and smart devices, continuous monitoring and real-time data collection have become feasible
[13]. One of the most useful wearable technologies for glaucoma detection are contact lenses. The SSCLs introduced by Zhang et al.
[14] allow continuous 24 h monitoring of IOP through an embedded wireless sensor built upon commercial soft contact lenses. In vivo testing in a dog model demonstrated the ability to wirelessly track circadian IOP fluctuations with a sensitivity of 662 ppm/mmHg (R
2 = 0.88) using a portable vector network analyzer coupled to a contact-lens reader coil. Measurements in human subjects exhibited even higher sensitivity of 1121 ppm/mmHg (R
2 = 0.91) attributed to superior fit enabled by the soft hydrogel lens base. This sensitivity exceeds previous wearable sensors by more than two times. The seamless interface of the SSCLs with the cornea was confirmed through anterior segment OCT imaging in human eyes. The wireless, 24 h IOP data obtained by these soft hydrogel-based sensors can aid in glaucoma detection and management through continuous monitoring of ocular hypertensive events and linking IOP trends to disease progression. Thus, from ensuring quality data collection to real-time processing, AI is beginning to embed itself deeply into the data collection process for glaucoma diagnosis and management, making these more robust and insightful.
However, recent studies in AI-assisted glaucoma diagnosis underscore the importance of training data diversity. For example, the REFUGE challenge, a significant initiative in AI-driven ophthalmology, utilized a dataset of 1200 fundus images, aiming for broad demographic representation. However, as highlighted in the literature, there is a recognized need for more inclusive data encompassing a wider range of ethnicities and age groups. This inclusivity is critical, given the variability in glaucoma presentation across different populations. Studies emphasize the potential risk of biased AI models due to non-diverse training datasets, which may not effectively represent glaucoma manifestations in underrepresented groups. Thus, the current shift towards more ethnically and demographically inclusive datasets is a vital step in developing universally applicable and unbiased AI models for glaucoma detection.
3. AI’s Role in Glaucoma Screening
3.1. Early Detection and Challenges
Early detection and treatment of glaucoma is essential as vision loss from the disease is currently irreversible. However, the disease is difficult to detect in the early stages, as it is asymptomatic and typically begins with peripheral rather than central vision loss. As a result, almost 50% of glaucoma patients are undiagnosed, delaying treatment until irreversible vision loss has already occurred
[15]. Screening for glaucoma is therefore an important mechanism to detect signs of disease in undiagnosed individuals, allowing intervention while there is still vision left to preserve
[16]. Currently, the impact and reach of glaucoma screenings is limited by a reliance on individual examinations by glaucoma specialists, ophthalmologists, or optometrists. Screenings can be lengthy, labor-intensive, and challenging to practically implement, ultimately limiting the number of screened individuals. This is especially true in developing countries, where there is a high burden of glaucoma and a limited number of trained eye professionals
[17]. As the prevalence of glaucoma continues to rise in an aging population, there is a growing mismatch between the need for glaucoma screenings and the supply of available resources
[18].
3.2. AI’s Potential to Transform Glaucoma Screening
In response to these challenges, AI-enabled screening for glaucoma could help fill this unmet need, increasing access to care and lessening the burden on healthcare systems
[19]. A system that accurately flags possible glaucoma on images in real time could allow for large-scale screenings to be conducted without the presence of a vision specialist. Patients with signs of glaucoma could then be referred to an ophthalmologist or optometrist for a comprehensive examination, diagnosis, and treatment. Ideally, glaucoma screenings utilizing AI would be low-cost, accurate, and easily translated to low-resource settings. These screenings could then take place in remote rural areas, underserved urban areas, or countries with a scarcity of ophthalmic specialists providing frontline eye care.
Fundus photography, a low-cost option that fits these criteria, has already been successfully incorporated into AI-enabled screening programs to detect diabetic retinopathy
[20]. Fundus images provide visualization of anatomic changes to the optic nerve head, such as optic disc cupping and thinning of the neuroretinal rim. These structural abnormalities often precede loss of visual fields
[21]. Among its benefits for screening, fundus photography is low-cost, non-invasive, quick, and portable, allowing application to low-resource settings
[22]. Starting around 2018, many studies have developed convolutional neural networks (CNNs) trained on thousands of labeled fundus photos to distinguish glaucomatous from healthy eyes
[23]. A range of CNN architectures have been applied including ResNet, InceptionNet, and VGGNet, often utilizing transfer learning and reporting high performance
[24].
Recent advancements in glaucoma detection have incorporated Vision Transformers like the data-efficient image transformer (DeiT), showing notable efficacy in analyzing fundus photography. These models utilize self-attention mechanisms, effectively capturing the global characteristics of fundus images and thereby enhancing classification accuracy. For instance, studies such as those by Wassel et al.
[25] and Fan et al.
[26] have demonstrated the competitive performance of Vision Transformer models, particularly in terms of generalizability across diverse datasets. Notably, the attention maps from DeiT models tend to concentrate on clinically relevant areas, like the neuroretinal rim, aligning with regions commonly assessed in manual image review. This alignment suggests that DeiT models can complement traditional diagnostic approaches by focusing on key areas used in glaucoma assessment. The emerging use of Vision Transformers, including DeiT, in glaucoma detection highlights their potential in contributing to the evolving landscape of AI applications in ophthalmology.
3.3. AI Outperforming Human Experts and Challenges
Several studies have shown that deep learning models can achieve equal or better accuracy in differentiating normal from glaucomatous eyes when compared with expert glaucoma specialists
[27][28][29][30]. Ting et al.
[31] developed a deep learning model using 494,661 retinal images to detect diabetic retinopathy and achieved an AUC of 0.942 in detecting “referable” glaucoma. Similarly, Li et al.
[24] created a deep learning algorithm using 48,116 fundus images to detect glaucomatous optic neuropathy. The model achieved an AUC of 0.986, with a sensitivity of 95.6% and specificity of 92.0%. The Pegasus system (version v1.0)
[32], a cloud-based AI from Visulytix Ltd. (London, UK) evaluates fundus photos using specialized CNNs to extract and classify the optic nerve. Compared to medical professionals, it achieved an accuracy of 83.4% in identifying glaucomatous damage. Orbis International provides free access to an AI tool called Cybersight AI to eye care professionals in low- and middle-income countries
[33]. This open access tool can detect diabetic retinopathy, glaucoma, and macular disease on fundus images. At clinics in Rwanda, screening with this device led to accurate referrals for diabetic retinopathy and high rates of patient satisfaction, though more research is needed on diagnostic accuracy for glaucoma
[33].
Several challenges must be addressed in order to successfully integrate AI-enabled glaucoma screening into real-world settings. First, researchers must ensure that deep learning models maintain their accuracy when applied to images from different cameras with varying photographic quality. Studies show that these models currently underperform when images are captured on different cameras compared with those used in training datasets
[34]. Second, more research is needed on how co-morbid pathologies can impact the performance of such algorithms. Anatomic variability and pathologic conditions can affect the appearance of the optic nerve head, so many training datasets eliminate images with ocular pathologies; however, real-world screenings will be filled with individuals with a variety of ocular conditions. Finally, more studies must integrate testing of deep learning models in the actual settings where they will be implemented and ensure generalizability to diverse racial and ethnic groups. Many models with high accuracies upon testing do not demonstrate similar accuracy in the real world
[34].
The interpretability of AI models in clinical settings is a crucial aspect that warrants detailed discussion. AI models, particularly those based on deep learning, often function as ‘black boxes’, providing limited insight into how they derive their conclusions. This lack of transparency can be a significant barrier to the adoption of AI in clinical practice, where understanding the reasoning behind a diagnosis is fundamental for clinician trust and decision making.
Recent advancements in AI have seen the development of techniques aimed at unraveling these black boxes, thus enhancing the interpretability of AI systems. Methods such as Layer-wise Relevance Propagation (LRP) and Class Activation Mapping (CAM) are being explored to provide visual explanations of AI decisions. For instance, in glaucoma detection, these methods can highlight areas in fundus images or OCT scans that the AI model deems significant for its diagnosis. This not only aids clinicians in understanding AI decisions but also serves as a tool for validating the accuracy of the AI model. The integration of such interpretability frameworks into AI systems for glaucoma detection is a promising step towards their acceptance and effective utilization in clinical environments.
4. AI’s Role in Glaucoma Diagnosis
Unlike screening, where the primary aim of AI is to flag potential glaucoma cases for further examination, AI in glaucoma diagnosis tackles a more nuanced challenge. Here, AI is tasked with confirming the presence of glaucoma in individuals who have been flagged during screening or who present with symptoms. This involves a detailed analysis of clinical data, requiring algorithms to be highly accurate and reliable in differentiating glaucoma from other conditions that may present similarly. Determining an official diagnosis of glaucoma is a more difficult application for AI than screening for suspected disease, and this is not yet established or accepted in many clinical practices. Despite these challenges, there has been exponential growth in research in AI applications for glaucoma diagnosis in the past decade. Most applications focus on OCT, visual fields, or hybrid models that combine structural and functional data.
4.1. Leveraging OCT for Glaucoma Diagnosis
OCT, which provides a three-dimensional view of the retina and optic nerve head, is the most widespread tool used to measure structural damage from glaucoma. Key findings include high AUC scores ranging from 0.78 to 0.99, underscoring the effectiveness of these models in differentiating glaucoma eyes from normal eyes and in predicting RNFL thickness and different glaucoma stages. In the clinic, structures of interest are automatically segmented by the machine’s software to generate relevant quantitative measures, such as RNFL thickness. Early studies in the 2000s applied machine learning classifiers to time-domain OCT (TD-OCT), showing comparable or better glaucoma detection accuracy than standard OCT parameters alone
[35]. With the advent of spectral-domain OCT (SD-OCT) in the 2010s, newer parameters like RNFL thickness enabled sensitivity of 50–80% and specificity of 80–95% for glaucoma diagnosis when analyzed by classifiers
[36]. Recently, swept-source OCT (SS-OCT) with scanning speeds of 100,000 A-scans/second has shown potential for earlier glaucoma detection, with algorithms applied to SS-OCT achieving an AUC of 0.95
[37].
Different types of OCT images have been used to develop deep learning algorithms for glaucoma diagnosis, including the OCT conventional report, 2D B scans, 3D volumetric scans, anterior segment OCTs, and OCT-angiography (OCT-A) images. Deep learning models trained with images extracted from the OCT single report can achieve high accuracy in detection of glaucoma
[38][39][40][41]. Other models rely on raw OCT scans for model training, rather than previously defined features from automated segmentation software. The usage of raw scans can help to reduce the effects of segmentation error, which can be present in 19.9% to 46.3% of SD-OCT scans
[42]. Mariottoni et al.
[43] trained a deep learning algorithm to predict RNFL thickness from raw OCT B-scans. These segmentation-free predictions were highly correlated with the actual RNFL thickness (r = 0.983,
p < 0.001), with a mean absolute error of 2 μm in images of good quality. Thompson et al.
[44] also used OCT B-scans to develop a deep learning algorithm that discriminated glaucomatous from healthy eyes. The diagnostic performance of this algorithm was better than using conventional RNFL thickness (AUROC 0.96 vs. 0.87 for the global peripapillary RNFL thickness,
p < 0.001). OCT volumetric scans of the optic nerve head can provide more comprehensive features and aid in glaucoma detection. Maetschke et al.
[45] developed a 3D deep learning model using volumetric OCT scans of the optic nerve head, which achieved a higher AUROC compared to a classic machine learning method using segmentation-based features (AUROC 0.94 vs. 0.89,
p < 0.05).
AI-based image analysis of anterior segment OCTs and OCT-A has not yet been explored in depth but does hold potential
[46]. Anterior segment OCTs, used to diagnose narrow angles or angle closures, have difficulties related to subjective interpretation. Fu et al.
[47] developed a deep learning system trained to detect angle closure from Visante OCT images, which achieved an AUROC of 0.96, sensitivity of 0.90 ± 0.02, and specificity of 0.92 ± 0.008, compared to clinician gradings of the same images. Xu et al.
[48] developed a model that could detect gonioscopic angle closure, with an AUROC of 0.928 in the test dataset of Chinese-American eyes and an AUC of 0.933 on the cross-validation dataset, also with AUCs of 0.964 and 0.952 for detecting primary angle closure disease (PACD) based on 2- and 3-quadrant definitions, respectively. OCT-A provides dynamic imaging to map the red blood cell movement over time at a given cross-section. Bowd et al.
[49] trained a deep learning model on
en face 4.5 × 4.5 mm radial peripapillary capillary OCT-A optic nerve head vessel density images. The model showed improvement compared to the gradient boosting classifier analysis of the built-in software in the OCT-A device.
In addition, Machine-to-Machine (M2M) approaches that predict RNFL thickness from fundus photographs are also a growing area of research. OCT has become the standard of care to objectively quantify structural damage in glaucoma
[50], but it is expensive and not easily portable. M2M approaches can be used to quantify (not just qualify) glaucomatous damage, especially in low-resource settings without OCT access. Medeiros et al.
[51] developed a machine learning classifier for glaucomatous damage in fundus photos, using OCT-derived RNFL thickness as a reference. The model showed a strong correlation (r = 0.832) with actual RNFL values and identified glaucomatous damage with an AUC of 0.944, though 30% of OCT variance was unaccounted for. Thompson et al.
[52] employed a similar approach but used a different reference standard from OCT: the Bruch’s membrane opening-minimum rim width (BMO-MRW) parameter. Again, predictions from the deep learning model were well correlated with the actual BMO-MRW values (Pearson’s r = 0.88,
p < 0.001), with an AUC of 0.933 for distinguishing deep learning predictions from glaucomatous and healthy eyes.
As researchers venture into the realm of AI’s practical applications in glaucoma diagnosis, it is crucial to shift the focus from controlled research environments to real-world clinical settings. The efficacy and reliability of AI technologies must be critically evaluated in diverse clinical environments to understand their performance and applicability in routine clinical practice. The intricacies of real-world application, such as varied patient demographics, differing equipment, and non-standardized operating procedures, present unique challenges that are not typically encountered in controlled research settings.
Recent studies have begun to address this gap by conducting field trials and observational studies in various clinical settings. For example, the use of AI in community eye clinics and in regions with limited access to specialized care provides valuable insights into the performance of these technologies outside traditional research environments. These studies often highlight the need for robust AI models that can adapt to varying image qualities and different patient populations. Additionally, the integration of AI into existing healthcare workflows and its impact on clinical decision-making processes are being actively explored. These real-world evaluations are critical in ensuring that AI technologies not only meet the stringent requirements of clinical validation but also demonstrate practical utility and scalability in diverse healthcare settings.
4.2. Visual Fields and the Power of Hybrid Models
In addition to OCT scans, visual fields have been explored for AI-enabled diagnosis of glaucoma. Standard automated perimetry (SAP) using the Humphrey Field Analyzer has been the main method for assessing visual field defects in glaucoma. SAP provides numerical data on light sensitivity at different visual field locations, as well as summary indices like mean deviation. Beginning in the 1990s, machine learning techniques like artificial neural networks were applied to analyze and interpret SAP visual fields for glaucoma diagnosis. More recently, CNNs have also been trained using raw visual field data or probability maps to classify fields as normal versus glaucomatous
[53][54].
Li et al.
[54] trained a deep learning algorithm with the probability map of the pattern deviation image, showing that it had superior performance in distinguishing normal from glaucomatous visual fields (accuracy 87.6%) than either human graders (62.6%), the Glaucoma System 2 (52.3%), or the Advanced Glaucoma Intervention Study criteria (45.9%). Although it is considered one of the most robust algorithms using visual fields, one limitation is that the input pattern deviation images may preclude early glaucoma from being identified. Elze et al.
[55] used “archetypal analysis” to classify patterns of visual field loss, such as arcuate defects, finding good correspondence to human classifications from the Ocular Hypertension Treatment Study (OHTS). With a follow-up study also using archetypal analysis, Wang et al.
[56] classified central visual field patterns in glaucoma, showing that specific subtypes with nasal defects were associated with more severe total central loss in the future. Brusini et al.
[57] developed a model that could identify local patterns of visual field loss and classify and quantify the degree of severity based on subjective assessments. Li et al.
[58] developed iGlaucoma mobile software, which is a smartphone application-based deep learning algorithm that extracts data points in the visual field using optical character recognition techniques. This software outperformed ophthalmologist readers and has undergone real-world prospective external validation testing.
Early studies suggest that hybrid deep learning models that combine structural and functional tests have increased performance over models trained with either test alone
[59]. Such models better mimic a clinical diagnosis from eye specialists, which is typically multimodal and does not rely on a single imaging modality as input. Xiong et al.
[60] showed that a multimodal algorithm using both visual fields and OCT scans to detect glaucomatous optic neuropathy had superior performance compared with models that relied on each modality alone. Other groups have focused on prediction models, such as prediction of visual field sensitivities from RNFL thickness from OCT
[61][62][63][64]. Using fundus photographs to predict RNFL thickness has been shown to predict future development of field defects in eyes of glaucoma suspects
[65]. Lee et al.
[66] trained a deep learning algorithm to predict mean deviation from optic disc photographs, which could be useful when SAPs are not available. Sedai et al.
[67] combined multimodal information into a model, using clinical data (age, IOP, inter-visit interval), circumpapillary (cp) RNFL thickness from OCT, and visual field sensitivities to predict cpRNFL thickness at the subsequent visit. This model showed consistent performance among suspects and cases and could potentially be used to personalize the frequency of follow-up visits for patients.
4.3. Challenges and Future Prospects for AI in Glaucoma Diagnosis
One major barrier shared by all algorithms trained to diagnose glaucoma is the lack of a gold standard definition for the presence and progression of this disease. Numerous studies cite high interprovider variability in glaucoma diagnosis
[68][69], which serves as the reference standard for evaluating algorithm outputs. A clear, concrete definition of glaucoma could help set the bar for model accuracy
[34]. For example, diabetic retinopathy has an agreed upon classification system, allowing a more straightforward approach to developing AI for diagnostic applications with proven success; in 2018, a deep learning system for diagnosis of this disease in diabetic patients received FDA approval for use in primary care clinics
[20]. This system documents the appearance of the optic nerve but is not approved to diagnose glaucoma at this time.
Another key challenge in diagnosis is how to design the interface between the clinician and the AI model. For instance, recent work has demonstrated that professional radiologists selectively comply with AI recommendations in a suboptimal way, which can lead to worse performance than desired
[70]. Relatedly, selective compliance has also led to issues with racial bias in other domains
[71]. Techniques such as explainability have been proposed to help bridge this gap, though significant challenges remain to ensuring reliability of explainability techniques
[72]. As an alternative, high-quality uncertainty quantification has been shown to help improve end-user trust in other domains
[73] and may be valuable for clinical decision support systems as well.
Moreover, each imaging modality also poses its own set of challenges to implementation in real-world settings. OCT machines are expensive and therefore not as applicable for low-resource areas. Additionally, like fundus images, anatomic abnormalities can influence results, and there is a lack of interchangeability across OCT devices
[46]. Visual field testing is subjective and can be affected by patient factors such as attention and fatigue
[74]. Additionally, most models are typically trained using visual field tests labelled as reliable and may not be able to identify unreliable exams, which are very common in clinical settings. Finally, because structural changes are known to precede functional damage in glaucoma, it can be difficult to provide an early diagnosis using visual fields. As a result, many AI applications using visual fields are better suited to assess disease progression, rather than diagnosis. There are also several barriers to the development and implementation of hybrid models. Such models require paired data from imaging modalities in training and testing datasets, which imposes limits on the availability and feasibility of data collection. When using multiple input types, there is also a need to add more training data to avoid overfitting
[75].