Multi-Label Fundus Image Classification: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , ,

Fundus images are used by ophthalmologists and computer-aided diagnostics to detect fundus disease such as diabetic retinopathy, glaucoma, age-related macular degeneration, cataracts, hypertension, and myopia.

  • attention mechanisms
  • deep learning
  • feature fusion

1. Introduction

Fundus images are used by ophthalmologists and computer-aided diagnostics to detect fundus disease such as diabetic retinopathy, glaucoma, age-related macular degeneration, cataracts, hypertension, and myopia. Ophthalmologists have progressively adopted computer-aided diagnosis as its accuracy has increased in recent years. The system assists doctors in making partial diagnoses and saves both doctors and patients time and effort [1][2][3].
Early detection of fundus disease is critical for patients to avoid blindness. Abnormalities in the fundus can indicate different types of disease when a single fundus image is analyzed in three color channels. Patients usually develop ocular diseases differently in each eye due to the complexity and mutual independence of ocular diseases. Figure 1 shows right and left fundus images, taken from the ODIR dataset [4], of a patient with diabetic retinopathy and myopia in the right eye but not the left. The majority of fundus image research focuses on segmenting fundus structures or detecting anomalies in certain fundus diseases [5][6]. As a result, the ability to classify the whole range of disease on fundus images is critical for the development of future diagnosis systems.
Figure 1. Images of the ODIR dataset. (a) No disease. (b) Diabetic retinopathy and myopia.
Various algorithms in the fields of enhancement [7], segmentation [8][9][10], and classification [11][12] of fundus images have been developed based on merging image processing and deep learning principles. Deep learning algorithms can be sufficiently trained and are less prone to overfitting for datasets with more images, and test results accuracy can exceed 95%. The fundamental issue in classifying multi-label fundus images is the insufficient data, which prevents the model from being effectively trained. The second is that fundus images with more obvious lesions, such as glaucoma and other disorders that develop later before more obvious lesions appear, are easier to identify, and classification accuracy is significantly lower.

2. Classification of Fundus Images

Most fundus image classification challenges nowadays are focused on identifying a single disease with or without conditions such as diabetic retinopathy [13], myopia [14], glaucoma [15], age-related macular degeneration [16], and other eye disorders. Gour et al. [17] used a single fundus image and developed a convolutional neural network using a transfer learning model to achieve high classification accuracy for multi-labeled images. SGD was used to optimize the network and improved the training set accuracy from 85.25% to 96.49%. However, the classification accuracy is low for fundus images containing glaucoma; one of the reasons is that the dissimilarity of early lesions in these diseases is not significant and not easily detected during classification. Second, the dataset has significantly less data than other disease images, making the model sensitive to overfitting when classifying these diseases. Joon Yul Choi et al. [18] discovered that the number of classes has a significant impact on classification performance. The VGG-19 network was used in this research to classify three types of fundus images, and the accuracy fell to 41.9% when the number of classes was increased to five. As a result, the critical problems that should be solved as soon as possible are how to handle the test dataset so that it is equally distributed and how to train a high-performance neural network to increase the classification accuracy of fundus images for each disease class.

3. Image Augmentation

A major challenge is averaging positive and negative sample distributions and enhancing image quality to increase classification accuracy. The number of input modules in a classification model impacts how well the network performs. The problem of unequal image distribution is common with multi-label data. The data upsampling method, in which the images are rotated, flipped, cropped, and other operations to augment the dataset with insufficient samples. The transfer learning method, in which the weight parameters are obtained by training on large ImageNet image datasets, and it is easier to obtain optimal results when using the pre-trained weights. Luquan et al. [19] improved the accuracy from 62.82% to 75.16% using transfer learning, but the model is prone to overfitting for image classes with small datasets. Third, by changing the underlying network, the model can perform better, even with small samples, Wang et al. [20] used Vgg16 to classify multi-labeleed fundus images with an accuracy of 86%, and changing to EfficientNetB3 improved the accuracy to 90%.

4. Attention Mechanisms

Image augmentation solves the problem of unequal sample distribution, but complicated lesions in the fundus, such as microaneurysms and hemorrhages, remain hard to identify. The shallow neural network learns the image’s texture features; as the network deepens, it learns the image’s semantic information. The rich semantic information can improve the network’s classification performance. Including the attention module allows the image to properly learn the spatial position information of lesions. This module imitates humans in finding significant regions in complicated situations and has applications in a variety of vision tasks [21], including image classification, target identification, image segmentation, and facial recognition. As indicated in the correlations in Figure 2, it may be split into six types based on the data domain: channel, spatial, temporal, and branching attention mechanisms, as well as channel and spatial attention and spatial and temporal attention mechanisms. Hu et al. [22] proposed the SENet channel attention network, which includes a squeeze-and-excitation (SE) module at its foundation. The SE module can gather data information, capture inter-channel relationships, and enhance the representation. However, it has the disadvantage of being unable to capture complex global information and having a high model complexity. Sanghyun Woo et al. [23] proposed the convolutional block attention module (CBAM) to improve global information exploitation. It connects the channel attention and spatial attention mechanisms, allowing the network to focus on features and their spatial locations. CBAM can also be added to any existing network architecture due to the network’s lightweight design.
Figure 2. Classification of attentional mechanisms (ф indicates no relevant classification).

5. Model Optimization

When a high-performance model is employed as the basic classification network, the total performance metric improves [24]. This is because the network’s prediction capacity is strongly related to its recognition of the features of the fundus image. To process the input fundus images, a high-performance neural network must be used. Multi-model fusion enables the creation of two models for extracting feature vectors, which will then be joined by vectors during fusion to increase classification accuracy. Wang et al. [20] analyzed model fusion-based classification of binocular fundus images using EfficientNet to extract features and then input the features into a classifier for classification, a two-stage classification technique with 90% accuracy. In general, the deeper the network, the better the classification. However, the deeper the network, the greater the risk of overfitting. ResNet extends network optimization by incorporating a residual module, which increases classification performance [25][26]. Furthermore, in deep learning, dataset size is important in determining classification performance. Pre-training weights loaded on huge ImageNet datasets are then trained on the target dataset to acquire the appropriate training parameters via transfer learning. Gour et al. [17] utilized neural networks trained through transfer learning to train binocular fundus images. The classification accuracy for cataract disease was 97% and that for glaucoma disease was 54%, with considerable variation in classification accuracy resulting in lower confidence in the model.

This entry is adapted from the peer-reviewed paper 10.3390/mi13060947


  1. Chen, T.-L.; Wang, J.; Yuan, F. A preliminary study of a deep learning assisted diagnostic system with an artificial intelligence for detection of retina disease. Int. Eye Sci. 2020, 20, 1452–1455.
  2. Aggarwal, A.; Chakradar, M.; Bhatia, M.S.; Kumar, M.; Stephan, T.; Gupta, S.K.; Alsamhi, S.H.; Al-Dois, H. COVID-19 Risk Prediction for Diabetic Patients Using Fuzzy Inference System and Machine Learning Approaches. J. Healthc. Eng. 2022, 2022, 1–10.
  3. Chakradar, M.; Aggarwal, A.; Cheng, X.; Rani, A.; Kumar, M.; Shankar, A. A Non-invasive Approach to Identify Insulin Resistance with Triglycerides and HDL-c Ratio Using Machine learning. Neural Process. Lett. 2021, 1–21.
  4. Li, C.; Ye, J.; He, J.; Wang, S.; Qiao, Y.; Gu, L. Dense Correlation Network for Automated Multi-Label Ocular Disease Detection with Paired Color Fundus Photographs. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1–4.
  5. Song, W.; Cao, Y.; Qiao, Z.; Wang, Q.; Yang, J.-J. An Improved Semi-Supervised Learning Method on Cataract Fundus Image Classification. In Proceedings of the IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; pp. 362–367.
  6. Cho, H.; Hwang, Y.H.; Chung, J.K.; Lee, K.B.; Park, J.S.; Kim, H.-G.; Jeong, J.H. Deep Learning Ensemble Method for Classifying Glaucoma Stages Using Fundus Photographs and Convolutional Neural Networks. Curr. Eye Res. 2021, 46, 1516–1524.
  7. Sahu, S.; Singh, H.V.; Kumar, B.; Singh, A.K.; Kumar, P. Image processing based automated glaucoma detection techniques and role of de-noising: A technical survey. In Handbook of Multimedia Information Security: Techniques and Applications; Springer: Berlin/Heidelberg, Germany, 2019; pp. 359–375.
  8. Fraz, M.M.; Remagnino, P.; Hoppe, A.; Uyyanonvara, B.; Rudnicka, A.R.; Owen, C.G.; Barman, S.A. Blood vessel segmentation methodologies in retinal images -a survey. Comput. Methods Programs Biomed. 2012, 108, 407–433.
  9. Almazroa, A.; Burman, R.; Raahemifar, K.; Lakshminarayanan, V. Optic disc and optic cup segmentation methodologies for glaucoma image detection: A survey. J. Ophthalmol. 2015, 2015, 1–28.
  10. Madhu, G.; Govardhan, A.; Ravi, V.; Kautish, S.; Srinivas, B.S.; Chaudhary, T.; Kumar, M. DSCN-net: A deep Siamese capsule neural network model for automatic diagnosis of malaria parasites detection. Multimed. Tools Appl. 2022, 1–23.
  11. Amin, J.; Sharif, M.; Yasmin, M. A review on recent developments for detection of diabetic retinopathy. Scientifica 2016, 2016, 1–20.
  12. Thakur, N.; Juneja, M. Survey of classification approaches for glaucoma diagnosis from retinal images. In Advanced Computing and Communication Technologies; Springer: Berlin/Heidelberg, Germany, 2018; pp. 91–99.
  13. Krishnan, A.S.; Clive, D.; Bhat, R.V.; Ramteke, P.B.; Koolagudi, S.G. A Transfer Learning Approach for Diabetic Retinopathy Classification Using Deep Convolutional Neural Networks. In Proceedings of the IEEE India Council International Conference (INDICON), Coimbatore, India, 16–18 December 2018; pp. 1–6.
  14. Wan, C.; Li, H.; Cao, G.-F.; Jiang, Q.; Yang, W.-H. An Artificial Intelligent Risk Classification Method of High Myopia. J. Clin. Med. 2021, 10, 4488.
  15. Guo, F.; Li, W.; Zhao, X.; Zou, B. Glaucoma Screening Method Based on Semantic Feature Map Guidance. J. Comput.-Aided Des. Comput. Graph. 2021, 33, 363–375.
  16. Grassmann, F.; Mengelkamp, J.; Brandl, C.; Harsch, S.; Zimmermann, M.E.; Linkohr, B.; Peters, A.; Heid, I.M.; Palm, C.; Weber, B.H. A Deep Learning Algorithm for Prediction of Age-Related Eye Disease Study Severity Scale for Age-Related Macular Degeneration from Color Fundus Photography. Ophthalmology 2018, 125, 1410–1420.
  17. Gour, N.; Khanna, P. Multi-class multi-label ophthalmological disease detection using transfer learning based convolutional neural network. Biomed. Signal Process. Control. 2021, 4, 102329.
  18. Choi, J.Y.; Yoo, T.K.; Seo, J.G.; Kwak, J.J.; Um, T.T.; Rim, T.H. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database. PLoS ONE 2017, 12, e0187336.
  19. Lu, Q.; He, C.; Chen, J.; Min, T.; Liu, T. A Multi-Label Classification Model with Two-Stage Transfer Learning. Data Anal. Knowl. Discov. 2021, 5, 91–100.
  20. Wang, J.; Yang, L.; Huo, Z.; He, W.; Luo, J. Multi-Label Classification of Fundus Images with Efficient Net. IEEE Access 2020, 8, 212499–212508.
  21. Guo, M.H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention Mechanisms in Computer Vision: A Survey. arXiv 2022, arXiv:2111.07624.
  22. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. arXiv 2017, arXiv:1709.01507.
  23. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision-ECCV, Munich, Germany, 8–14 September 2018; p. 11211.
  24. Ju, C.; Bibaut, A.; Van Der Laan, M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J. Appl. Statist. 2018, 45, 2800–2818.
  25. Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
  26. Veit, A.; Wilber, M.; Belongie, S. Residual Networks are Exponential Ensembles of Relatively Shallow Networks. arXiv 2016, arXiv:1605.06431.
This entry is offline, you can click here to edit this entry!
Video Production Service