Data-Driven AI in High Myopia and Pathologic Myopia

Data-Driven AI in High Myopia and Pathologic Myopia: Comparison

Please note this is a comparison between Version 1 by Ran Du and Version 2 by Camila Xu.

Myopia is a global health issue, and the prevalence of high myopia has increased significantly in the past five to six decades. Artificial intelligence (AI) has been identified as one of the key drivers of the Fourth Industrial Revolution. Because of the growth of digital databases, the number of AI-based applications in the medical field based on Python or C has increased immensely in recent years.

diagnosis and management
high myopia
pathologic myopia

1. Introduction

Myopia is a global health issue, and the prevalence of myopia has increased significantly in the past five to six decades [1]. In urban areas of China, Taiwan, Hong Kong, Japan, Singapore, and South Korea ^{[2][3][4][5][6][7]}[2,3,4,5,6,7], 80–90% of high school students are myopic and 10–20% of them have high myopia ^[1][8][1,8]. The same prevalence has been observed in North America, Germany, Spain, and Russia ^{[9][10][11][12]}[9,10,11,12]. The worldwide increase in the prevalence of myopia and PM indicates that myopia-related blindness will increase worldwide in the future ^{[13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29]}[13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]. A lack or shortage of myopia specialists is a great concern to governmental leaders, and the control of myopia has been a national policy in China [30].

In PM eyes, there is an increase in the axial length and the presence of a posterior staphyloma, a deformity of the posterior segment of the eye ^{[31][32][33][34][35]}[31,32,33,34,35]. Following a deformation of the sclera, the neural retina is mechanically damaged and blinding pathologic changes develop in the macular region. The eyes are then said to have myopic maculopathy, which is the main sight-threatening complication. In addition, it has been reported that the cost for one myopic patient would be over seven hundred United States dollars/year and 17 thousand United States dollars during the patient’s lifetime in Singapore [36]. In China, it is estimated that myopia-associated productivity loss is about 244 billion United States dollars/year ^[30][37][30,37]. These values indicate that myopia is an increasingly serious public health problem with a high economic burden. Because myopic maculopathy is generally progressive and irreversible, interventions to prevent the progression of myopic eyes to PM, continuous surveillance, and slowing the progression of PM are highly recommended. However, the number of well-trained myopia specialists is insufficient worldwide and the diagnosis of myopic maculopathy is difficult for general eye care providers, e.g., optometrists or general ophthalmologists, and a continuous monitoring of every myopic patient is inefficient in both time and cost. For example, various lesions of myopic maculopathy often co-exist in the same eye, which makes their appearance difficult to interpret. Thus, there is a great need for automated methods that can be used in a cost-efficient way to assist physicians in monitoring PM and to manage PM patients who need the care of specialists.

Artificial intelligence (AI) has been identified as one of the key drivers of the Fourth Industrial Revolution [38]. Because of the growth of digital databases, the number of AI-based applications in the medical field based on Python or C has increased immensely in recent years ^[39][40][39,40]. One of the main parts of AI is machine learning (ML), which not only has a powerful capacity for statistical analyses but also has a great ability to manipulate data and perform complex operations to find relationships among the many biological characteristics. As an evolutionary form of ML, deep learning (DL) enhances these advantages and has reached a new high by processing data through information in hidden layers.

Many successful models and platforms have been established for screening and diagnosing age-related macular degeneration ^[41][42][43][41,42,43], diabetic retinopathy ^[44][45][44,45], and glaucoma [46]. These applications focused on analyzing ophthalmic images to diagnose the disease and to determine prognosis from these images. However, in addition to a general workflow, which is shown in Figure 1, high myopia and PM generate even more data because both the ophthalmic information and morphological changes of the retina and choroid need to be analyzed.

Figure 1. General workflow of artificial intelligence analyses of high myopia and pathologic myopia.

General workflow of artificial intelligence analyses of high myopia and pathologic myopia.

2. Data-Driven AI in High Myopia and Pathologic Myopia

PM is associated with an elongation of the axial length of the eye, which is usually associated with morphological changes in the sclera, choroid, Bruch’s membrane, retinal pigment epithelium, and neural retina. In addition, due to increases in the progressive and excessive axial lengths, highly myopic eyes also have high refractive errors and related ophthalmic changes. These changes may be further amplified when the eye undergoes refractive or cataract surgery due to the excessive length of the eye. Thus, it is expected that high myopia will generate a considerable amount of data during a long-term follow-up period, which would require an efficient method to analyze and interpret the findings.

Earlier, redundant and inconsistent data were collected due to the non-integrated and fragmented data management procedures. This has led to information quality problems, which has hampered the acquisition of an accurate diagnosis, resulting in poor management of myopic eyes.

With the recent creation and general distribution of digital hospital information systems, an opportunity has opened up for determining the onset and progression of PM through a much larger set of data. This has advanced theour understanding of PM with more comprehensive perspectives and on more solid theoretical bases.

Data-driven AI studies are usually performed using ML techniques because they can detect different categories, obtain information buried in a large amount of data, and optimize the model that best fits the data. The models that are regressed by training data would verify the capacity for data categorization. ML techniques involve supervised learning, semi-supervised learning, and unsupervised learning. They include many methods such as kernel ridge regression, support vector machines (SVM), nearest neighbors, gaussian processes, naive Bayes, random forests, neural networks, and others. Further evolutional methods such as extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) supply more chances in regression models and can determine potential relationships to understand the occurrence and progression of high and pathologic myopia. With these powerful methods, representative patterns can be statistically calculated and extracted for ensemble predictive models.

Earlier studies reported that the incidence of myopia had reached 84.6% in elementary school children and 95.5% in university students in China ^[47][48][49][47,48,49], and it is not difficult to believe that such levels are not unique to China. Thus, it is urgent to monitor eyes with high myopia at an earlier stage, which raises the need for AI-assisted screening techniques. In areas with high levels of myopia, several data-driven studies on high myopia have reported that DL learning models can be used to solve real problems with sensible solutions (Table 1). The ML models have shown that the refractive errors and the risk of high myopia (myopia ≤ −6.0 diopters) that develop within ten years are predictable in school-aged children [50]. In this approach, the random forest model, generalized estimating equation model, and mixed-effects model were fitted and evaluated by the coefficient of determination (R2), the root mean square error (RMSE), mean absolute error (MAE), and characteristics of the area under the receiver operating curves (AUC). The model was tested by both internal and external datasets. Typically, the random forest model had the best performance and the AUC reached as high as 0.802 to 0.976. This approach provided evidence for transforming clinical practice, health policy-making, and precise individualized interventions regarding the practical control of school-aged myopia by employing big data and ML. However, in some circumstances where the clinical data are not available, it may be difficult for physicians to manage high myopia patients. To address this, ML models were also designed and trained to play roles in analyzing eyes with high myopia. By training with the wavefront aberrometry values through the XGBoost algorithm, DL models have been used to predict the subjective refractive errors, and the mean absolute error between true values and predicted values ranged from 0.094 to 0.301 diopters, and the combination of machine learning and aberrometry based on wavefront decomposition basis will aid in the development of refined algorithms [51]. Furthermore, highly myopic eyes often have hyperopic refractive errors after cataract surgery, despite the use of partial coherence interferometry, which could eliminate biometric errors. Through XGBoost regression, AI models trained by medical records extracted from myopia patients could improve the accuracy of implementing IOL power in high myopia with cataracts [52].

Table 1. Data-driven artificial intelligence (AI) models in high myopia and pathologic myopia.

Research	Year	Materials	Participants	AI Methods	Main Outcome	Evolutions and Performance
Lin, H. et al. [50]	2018	Refraction data	School-aged children	ML	Predicting the presence of high myopia	AUC: 0.802–0.976
Kaya, C. et al. [53]	2018	electrooculographic data	Adults (25–65 years old)	ML	Detecting hypermetropia and myopia refractive disorders	Sensitivity: 95.5%; specificity: 96%; classification accuracy: 90.91%
Ye, B. et al. ^[54]	Ye, B. et al. [55]	2019	luminance, ultraviolet light levels, and step number data	Myopia patients	ML	Differentiating indoor and outdoor locations	Accuracy: 0.827–0.996; AUC: 0.90–0.99
Rampat, R. et al. [51]	2020	Wavefront aberrometry data	General population	ML	Predicting subjective refraction	mean absolute error: 0.094–0.301 diopters
Tang, T. et al. ^[55]	Tang, T. et al. [54]	2020	Medical data	School-age myopic children	ML	Estimating physiological elongation of axial length	R square equals 0.87
Wei, L. et al. [52]	2020	Medical data	Myopia patients	ML	Improving the accuracy of IOL power predictions	mean absolute error: 0.25–0.29; median squared errors: 0.06–0.09
Yang, X. et al. [56]	2020	Medical data	Primary school children	ML	Studying influence of related factors on incidence of myopia in adolescents	Accuracy equals 0.92–0.93; Precision equals 0.95; Sensitivity equals 0.94; f1 equals 0.94; AUC equals 0.98; Specificity equals 0.94
Li, S.M. et al. [57]	2022	Medical data	Primary school children	ML	Detecting risk factors for myopia progression	Combined weight: 77%; Accuracy: over 80%

AUC, area under the receiver operating characteristic curves; ML, machine learning.

In addition, in situations where only limited information can be accessed, electrooculographic (EOG) data could also be used to train ML models in classifying myopic refractive disorders. It has been reported that when the logistic regression model, Naïve Bayes model, and random forest model were trained by EOG data, the random forest model had the best performance with a sensitivity of 95.5% and a specificity of 96%. The total classification accuracy reached 90.91%, and the achieved models could inspire novel approaches to clinical screening of myopia when general data are not available [53]. Furthermore, because the axial length value is a key indicator for high myopia, simply assessing the change in axial length can be used to evaluate the myopia progression. More specifically, these methods can be used by practitioners to judge the true extent of myopia progression before performing a cycloplegic refraction examination. Linear regression, SVM, and bagged trees have been used to predict increases in axial length in adolescents. From an evaluation of the performance of models by five-folded cross-validation, the linear model achieved a high level of precision with an R square value of 0.87 ^[55][54].

In addition to these methods of predicting the actual outputs, there are other ways to use AI algorithms. It is generally accepted that clinical data tend to be imperfect and may lack different parts during clinical research because each performed examination is required to test the evidence-based hypothesis. However, these imperfect data would be a high barrier for research and the understanding of these disease processes. One of the benefits of ML algorithms is that they can fill in the missing values based on a scientific method, and the results can be closer to the true value. This will lead to a better understanding of the occurrences and progression of the disease processes. Furthermore, even with abundant data or features that can be assessed, physicians still need to determine how to filter out important values to test a hypothesis. In addition to traditional methods such as the principal component analysis (PCA), ML algorithms supply multiple choices for data dimension reduction, such as randomized singular value decomposition-based PCA, spectral embedding, isomap embedding, and others. These algorithms offer opportunities for clinicians to analyze the abundant data and to determine ways to test their hypotheses.

For myopia control, it is widely known that the environment, especially luminance and ultraviolet, plays important roles in affecting the progression of myopia. As the nature of collecting monitoring environmental data is complex, it is difficult to implement monitoring widely in the public. Through luminance, ultraviolet light levels, and step number data, AI models could be trained in different indoor and outdoor locations. These methods can be useful monitoring tools for community- or school-based public health interventions or individual health management ^[54][55].

ML models have been typically used to fill in missing clinical data and to select features that were highly correlated with the myopia in adolescents [56]. Features selected by ML learning algorithms have been used to explore the potential risk factors that affect the severe axial length elongation in highly myopic eyes. These approaches are particularly important because they provide reference data for physicians when faced with complex situations. To screen for high myopia in rural areas where myopia specialists or essential instruments are not available, these predictive values would be important indicators for high myopia screening and for monitoring the progression of myopia.