Here, an efficient and fully automated system that is based on deep learning architecture and can efficiently diagnose early-stage glaucoma on given datasets is proposed. The following are the main contributions of this work:
-
The most notable recent machine learning and deep learning-based glaucoma detection research is thoroughly reviewed to define the problem, focusing on various features that can support an efficient diagnosis.
-
For the diagnosis, a model is developed employing advanced deep learning methods along with transfer learning, and the model is tuned using various techniques to lower the likelihood of model overfitting.
-
Multiple datasets of glaucomatous retinal images are adopted to train and test the model to achieve higher diagnostic accuracy.
-
An end-to-end learning system that overcomes the drawbacks of current glaucoma screening methods is developed.
2. Techniques for the Detection of Glaucoma
Glaucoma detection through CNNs is performed by various researchers
[21,22,23,24,25,26,27,28,29,30][20][21][22][23][24][25][26][27][28][29]. The CNN-based systems perform effective computation and provide robust results for disease classification. A CNN consists of different layers, such as convolutional, activation, pooling, and the fully connected layer (FCL). Each architecture consists of a different combination of these layers. The diagnosis and detection of other retinal diseases such as papilledema
[23[22][30],
31], diabetic retinopathy
[23][22], central serous retinopathy (CSR)
[32,33][31][32], and hypertensive retinopathy
[22][21] can be performed through deep learning and machine learning methodologies using OCT and fundus images
[5,30][5][29]. Diabetes and other eye diseases have been successfully diagnosed by DL techniques
[23][22].
The application of the CAD system has widened the diagnostic horizon in several other disease diagnoses, such as CSR
[33][32], lung tumor
[34][33], brain tumor
[35][34], skin tumor
[17], and prostate cancer
[18]. The fundus images provide a clear picture of the eye’s internal structure and are widely used for glaucoma diagnosis. The glaucoma classification using fundus images through DL models has shown encouraging results
[36,37][35][36]. The fundus images clearly depict the optic nerve head and are readily available for training the glaucoma detection models
[38][37]. Various models based on pre-trained CNN models
[14[14][38],
39], ensemble approaches
[40,41,42][39][40][41] and CNN-based architectures are encountered
in this article for the detection of glaucoma.
Serte and Serener developed a glaucoma detection model using an ensemble approach based on a local dataset of 1542 fundus images
[40][39]. The model cropped the OD by using a graph saliency region technique. Three CNN architectures, namely ResNet-50, ResNet-152, and AlexNet, were used as the ensemble classifiers in this model. All three methods, including without saliency map, with saliency map and single CNN model, and with saliency map and ensemble approach, were tested, and the best results were obtained for the ensemble approach with an AUC of 94% and accuracy of 88%. Chaudhary and Pachori developed a glaucoma detection model based on two methods, using RIM-ONE, ORIGA, and DRISHTI-GS datasets
[41][40]. The 2D Fourier–Bessel series expansion-based empirical wavelet transform was used for the segmentation of the boundary. Two methods were used, one depending on the ML model and the other using the ensemble approach of the CNN architecture ResNet. The first model at full scale obtained the best results. The best results with the second method were obtained with the ensemble technique at a full scale with 91.1% accuracy, 91.1% sensitivity, 94.3% specificity, 83.3% AUC, and 96% ROC. GlaucomaNet
[42][41] was proposed to identify POAG based on dataset images from different populations and settings. The model comprises two CNNs intended to mimic the human grading process. To this end, the first CNN learns the discriminative features, whereas the second fuses the features for grading. This simulation of the human grading process combined with an ensemble of network architectures greatly enhanced the diagnostic accuracy.
Thakoor et al. developed a model based on different CNN architectures trained on OCT images and also used some pre-trained models to detect glaucoma
[14]. The pre-trained ResNet, VGG, and InceptionNet were combined with random forest and compared with the CNN architectures trained on OCT images. A high accuracy of 96.27% was achieved with the CNN trained on the OCT images. Hemelings et al. proposed an approach for glaucoma detection using pre-trained ResNet-128 architecture with 7083 OD center fundus images
[39][38]. The transfer and active learning approaches were used to enhance the diagnosis capability of the model. The use of a saliency map highlighted the affected region to provide evidence of the disease. The model achieved robust results with an AUC of 99.55%, a specificity of 93%, and a sensitivity of 99.2% for glaucoma detection.
Yu et al.
[4] developed a model using a modified version of U-Net architecture in fundal images for glaucoma diagnosis using multiple datasets. The U-Net used the pre-trained ResNet-34 as an encoder and the classical U-Net architecture as a decoder. The model showed good performance as 97.38% of disc dice values and 88.77% of cup dice values were aligned with the DRISHTI-GS test set. Other authors proposed an approach named AG-CNN, which detected glaucoma and localization of pathological areas using the fundus images
[6]. The model is based on attention prediction, localization of the affected area, and glaucoma classification. The deep features predicted glaucoma through the visual maps of necrotic areas in the LAG and RIM-ONE datasets. The use of attention maps for localizing the pathological area demonstrated high efficacy. The model prediction for glaucoma was superior to previous models, with an accuracy of 95.3%.
Phan et al. developed a model based on three CNN architectures, ResNet-152, VGG19, and DenseNet201, for diagnosing glaucoma on 3312 retinal fundus images
[25][24]. The proposed model has also been tested on poor-quality images to examine its diagnostic accuracy in glaucoma. All the architectures achieved an AUC of 90% for detecting glaucoma. Liao et al.
[43][42] proposed a novel CNN-based scheme that used ResBlock architecture to diagnose glaucoma using the ORIGA dataset. The model diagnosed glaucoma and provided a transparent interpretation based on visual evidence by highlighting the affected area. The model named EAMNet contained three parts: ResNet architecture extracted the features and aggregation, and the multiple-layer average pooling (M-LAP) linked the semantic detail and information of the localization, while the evidence activation map (EAP) was used for the evidence of the affected area the physician used for the final decision. The activation map was used to provide the clinical basis for glaucoma. The proposed scheme efficiently diagnosed glaucoma, with an AUC of 0.88.
Researchers developed the G-Net model based on CNN to detect glaucoma in the DRISHTI-GS dataset
[44][43]. The model used two neural networks (U-Net) to separate the disc and cup. The cropped fundus images in the red channel were fed to the model. The model contained 31 layers of convolutional, max-pooling, up-sampling, and merge layers. The filters applied were of sizes (3, 3), (1, 1), and (1, 32), and 64 filters were used on different layers. The model labeled the pixel as black on segmenting the OD in the real image and white otherwise. The output images were fed to the other model to segment the cup. The second model was like the first model, with a single difference in the size of the filters (4, 4). The output of this model was a segmented cup. These two outputs were used to calculate the CDR for the glaucoma prediction. This algorithm used two neural networks to obtain a high accuracy of 95.8% for OD and 93.0% for OC segmentation.
Researchers developed a model based on CNN for glaucoma detection using 1110 OCT images and compared its performance with the ML algorithms
[45][44]. A total of 22 features were extracted and fed to different machine learning classifiers such as NB, RF, SVM, LR, Gradient Adaboost, and Extra Trees. The CNN model classified and achieved better results with an AUC of 0.97 than other machine learning approaches, such as logistic regression, with an AUC of 0.89.
Thakur et al. proposed a model capable of diagnosing glaucoma before the onset of the disease
[46][45]. Three deep learning models were trained on 66,721 fundus images that can detect glaucoma, such as 1 to 3 years ago, 4 to 7 years ago, and before the onset of glaucoma. All three models achieved AUCs of 0.88, 0.77, and 0.97 in detecting glaucoma. Lima et al. developed a CNN model for the optic cup segmentation for the detection of glaucoma
[47][46]. The modified U-Net architecture segmented the optic cup from the green channel image, and the optic disc mask was given as input. The model achieved a dice value of 94% on the DRISHTI dataset.
Maheshwari et al. presented a model that converted the images into RGB channels after dividing the dataset images into training and testing images
[15]. The LBP-based augmentation was applied to obtain the best results. The model achieved 98.90% accuracy, 100% sensitivity, and 97.50% specificity. Lima et al. used a genetic model based on CNN with 25 layers using the RIM-ONE dataset to diagnose glaucoma
[12]. The model achieved an accuracy of 91% in detecting glaucoma. Saxena et al. developed a six-layer CNN model for glaucoma detection using the SCES and the ORIGA datasets
[13]. The ROI was extracted using the ARGALI approach, and the data augmentation technique was used to avoid the overfitting problem. The model achieved excellent results, with an AUC of 0.882 on SCES and 0.822 on ORIGA datasets. Elangovan and Nath developed a CNN-based model consisting of 18 layers for glaucoma detection
[48][47]. The model was based on DRISHTI–GS1, ORIGA, RIM–ONE2 (release 2), ACRIMA, and LAG datasets. The best results were obtained with the ACRIMA dataset, achieving 96.64% accuracy, 96.07% sensitivity, 97.39% specificity, and 97.74% precision. Aamir et al.
[49][48] developed a multi-level CNN model for diagnosing glaucoma. The fundus images were preprocessed to reduce noise with the adaptive histogram equalizer technique. The model classified the fundus images for glaucoma detection into advanced, moderate, and early categories. The model achieved a sensitivity of 97.04%, a specificity of 98.99%, an accuracy of 99.39%, and a PRC of 98.2% on 1338 fundus images. Raja et al.
[50][49] proposed a technique for diagnosing glaucoma using a dataset of 196 OCT images. The proposed model used CNN and calculated the CDR with 94% accuracy, 94.4% sensitivity, and 93.75% specificity in detecting glaucoma. Carvalho et al.
[51][50] proposed a 3DCNN algorithm for diagnosing glaucoma through the fundus images of RIM-ONE and DRISHTI-GS datasets. The 2D fundus images were converted into 3D volumes for each RGB and gray channel. The CNN was trained on all four channels and showed the best results on a gray channel with 83.23% accuracy, 85.54% sensitivity, 80.95% specificity, 83.2% AUC, and 66.45 Kappa.
Gheisari et al. developed a combined model based on a CNN and a recurrent neural network for diagnosing glaucoma using retinal fundus images
[52][51]. The diagnostic results were achieved with an F-measure of 96.2% on 295 videos and 1810 fundus images. Veena et al. developed a CNN model for the detection of glaucoma
[53][52]. The images were preprocessed to eliminate the noise using the Gaussian filter. The Sobel edge and the watershed algorithms extracted the features from the fundus images. The model achieved the OD and OC segmentation accuracies of 0.9845 and 0.9732, respectively, on the DRISHTI dataset. The achieved results are 98.48% accuracy, 99.3% sensitivity, 96.52% specificity, 97% AUC, and 98% of F1-score on the G1020 dataset.
Fan et al.
[54][53] assessed the diagnostic precision, generalizability, and explainability of a Vision Transformer deep learning method in diagnosing the primary open-angle glaucoma and identifying the salient areas found in the retinal images. A dual learning-based technique that combines deep learning and machine learning was proposed by Thanki
[55][54]. For identifying distinctive retinal characteristics, a deep neural network extracts deep features. Following that, a hybrid classification algorithm is employed to accurately classify glaucomatous retinal images.