Deep Convolutional Neural Networks and Retinal Vessels

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Zhipin Ye	--	1453	2023-11-15 11:15:59	\|
2	layout	Jessie Wu	Meta information modification	1453	2023-11-16 03:40:15	\|

This entry is adapted from the peer-reviewed paper 10.3390/s23218899

Accurate segmentation of retinal vessels is an essential prerequisite for the subsequent analysis of fundus images. Recently, a number of methods based on deep learning have been proposed and shown to demonstrate promising segmentation performance, especially U-Net and its variants. However, tiny vessels and low-contrast vessels are hard to detect due to the issues of a loss of spatial details caused by consecutive down-sample operations and inadequate fusion of multi-level features caused by vanilla skip connections.

fundus image retinal vessel segmentation deep learning

1. Introduction

Fundus imaging is a non-invasive, reproducible, and inexpensive method that shows retinal vessels and pathology ^[1]. In the medical domain, the morphological changes of retinal vessels, e.g., vessel diameter, branch angles, and branch lengths, can be used as clinical indicators for the detection and diagnosis of diabetes, hypertension, atherosclerosis, and other diseases ^[2]. In addition, the retinal vascular tree can serve as a unique identifier for identification systems in the social security domain, due to the unique morphology of this feature in individuals ^[3]. Retinal vessel segmentation is the process of determining from fundus images whether each pixel is a vessel or non-vessel pixel and is the preliminary step in objectively assessing retinal vasculature and quantitatively interpreting the morphometrics. Nevertheless, manual approaches to retinal vessel segmentation by trained experts are expensive, time-consuming, and laborious, especially in screening a large number of people. Furthermore, manual segmentation cannot guarantee segmentation performance because the results often vary from expert to expert due to their subjective segmentation. Therefore, the development of an automatic and high-precision method for retinal vessel segmentation is highly demanded. However, the retinal vascular tree presents an extremely complicated morphological structure and has many tiny vessels with a width of fewer than ten pixels or even one pixel, and are, therefore, generally difficult to distinguish from the background. Similarly, owing to uneven illumination and lesion regions, the contrast between blood vessels and non-vascular structures is relatively low. Because of these problems, it remains a challenging task to accurately segment retinal vessels from fundus images, especially tiny vessels and low-contrast vessels.

In 1989, Chaudhuri et al. became the first to deal with the problem of automatically segmenting retinal vessels ^[4]. Following this research, many methods have been proposed for retinal vessel segmentation, spurred by developments in digital image processing technology in recent decades ^[5]. Early studies based on various hand-crafted features, e.g., shape ^[6], color ^[7], and edge ^[4], usually exhibit low accuracy and poor robustness due to the features being shallow and insufficiently expressing semantic-rich information. Recently, deep learning methods, especially deep convolutional neural networks (DCNNs), have achieved superior results for many computer vision tasks, e.g., image classification ^[8], object detection ^[9], human pose estimation ^[10], and semantic segmentation ^[11]. Compared with conventional methods, DCNNs are able to automatically learn richer representations from raw input data and demonstrate superior segmentation performance ^[12]. In particular, Long et al. ^[13] proposed a novel end-to-end and pixel-to-pixel semantic segmentation network, called FCN, which introduces the most basic framework for natural image segmentation: the encoder–decoder structure. However, unlike the large number of natural image datasets available, the number of medical image datasets is relatively small because they are difficult to collect due to patient privacy and ethical issues. In this regard, Ronneberger et al. ^[14] proposed U-Net, an improvement on FCN, that could be trained with only a few images and still predict precise results. U-Net is a breakthrough advancement in deep learning in the field of medical image segmentation. In addition to its encoder–decoder structure, the success of U-Net is largely attributed to the skip connection between the encoder sub-network and the decoder sub-network, which combines multi-level features at different stages. As a general rule, the low-level features of shallow layers have abundant spatial details but lack sufficient semantic information, while the high-level features of deep layers have semantic-rich information but lose spatial details. It is an intuitive method that adopts the skip connection to fuse the spatial details of the encoder sub-network and the semantic information of the decoder sub-network.

Even though U-Net and its variants have achieved state-of-the-art results on many medical image segmentation tasks including kidney segmentation, pancreas segmentation, and liver segmentation, it is still not good enough to efficiently and effectively segment retinal vessels. In general, there are two main limitations. Firstly, consecutive down-sample operations in the encoder sub-network result in the loss of spatial information of tiny vessels and vessel edge information, and the final segmentation map cannot recover this lost information through skip-connections and up-sample operations in the decoder sub-network. Clinically, tiny vessels consisting of only several pixels provide an indispensable reference for the diagnosis of diseases like neovascular diseases. Therefore, researchers should pay more attention to the tiny vessels than to the thick vessels. Secondly, there exists a certain semantic gap between low-level features and high-level features in fundus images, especially in low-contrast regions. The vanilla skip connection introduces too much irrelevant redundant information, harming retinal vessel segmentation performance, especially with low-contrast vessels. It is essential to intelligently enhance vessel representations while suppressing background noise.

2. Deep Convolutional Neural Networks

To date, many segmentation networks based on the fully convolutional network (FCN) with the encoder–decoder (high-to-low and low-to-high in series) architecture have been proposed in the field of semantic segmentation. Among them, U-Net ^[14] and its variants have achieved remarkable performance in medical image segmentation including retinal vessel segmentation. For instance, DUNet ^[15] replaced standard convolutions with deformable convolutions because of the high complexity of the structures of retinal vessels. Zhang et al. ^[16] introduced new edge-aware flows into U-Net to make predictive outcomes more sensitive to vessel edge information. For multi-source vessel image segmentation, Yin et al. ^[17] designed a deep fusion network, called DF-Net, which is composed of multi-scale fusion, feature fusion, and classifier fusion. Li et al. ^[18] proposed a multi-task symmetric network, called GDF-Net, which consists of three typical U-Net-shaped sub-networks consisting of a global segmentation network branch, a detail enhancement network branch, and a fusion network branch. As an alternative to the encoder–decoder architecture, Guo ^[19] put forward a low-to-high segmentation architecture, called CSGNet, which first obtains low-resolution representations, and then learns high-resolution representations with the help of low-resolution representations. Recently, some studies ^[20]^[21] have demonstrated that learning high-resolution representations throughout the training process can preserve spatial details of tiny vessels and vessel edge information, which is beneficial to segmenting tiny vessels and locating vessel boundaries. A representative method is HRNet, which was originally proposed for human pose estimation and used for other position-sensitive vision tasks ^[20]. HRNet maintains high resolutions from input data to final outcomes without the requirement of restoring high resolutions and generates semantic-rich high-resolution representations via repeatedly exchanging information from multi-resolution features. Motivated by HRNet, Lin et al. ^[21] proposed a novel high-resolution representation network with a multi-path scale, called MPS-Net. In MPS-Net, there are three paths with different resolutions, in which the main path maintains high resolutions throughout the entire process, while two branch paths with low-resolution representations are added to the main path in parallel.

3. Self-Attention Modules

Generally speaking, humans can analyze and understand complex scenes naturally and effectively. Motivated by this observation, attention mechanisms ^[22]^[23] were introduced into deep learning in order to dynamically adjust the weight of feature maps. In particular, Vaswani et al. ^[24] proposed a self-attention mechanism with the aim of acquiring the long-range dependencies of timing signals, which facilitates machine translation and natural language processing. Then, Wang et al. ^[25] introduced the self-attention mechanism into computer vision to obtain long-range dependencies via non-local operations. Based on the self-attention mechanism, Fu et al. ^[26] presented DANet for scene segmentation, which includes a position-attention module to focus on the relationship in the spatial dimension and a channel-attention module to pay attention to the interdependencies in channel dimensions. However, the self-attention mechanism needs to generate a huge attention matrix, whose complexity is

𝒪 ((H \times W) \times (H \times W))

, where

H \times W

denotes the resolution of the input feature map, which seriously limits its practical applicability. Therefore, several variants of the self-attention mechanism have been proposed to reduce computational complexity. For instance, Huang et al. ^[27] viewed the self-attention operation as a graph convolution and utilized several sparsely connected graphs instead of the densely connected graph generated by the original self-attention mechanism. To do so, Huang et al. introduced a criss-cross attention module, whose weight is

H + W - 1

not

H \times W

, reducing the computational complexity from

𝒪 ((H \times W) \times (H \times W))

𝒪 ((H \times W) \times (H + W - 1))

. In addition, Li et al. ^[28] regarded the self-attention mechanism in terms of an expectation–maximization manner to obtain a much more compact set of bases, reducing the computational complexity from

𝒪 ((H \times W) \times (H \times W))

𝒪 ((H \times W) \times K)

, where

K

represents the number of the compact bases. Li et al. ^[29] designed a lightweight dual-direction attention block, generating the attention matrix with computational complexity of

𝒪 (H \times W)

via horizontal and vertical pooling operations. However, these existing variants are insufficient for retinal vessel segmentation, as they fail to focus on the characteristics of vessel structures.

References

Liew, G.; Wang, J.J.; Mitchell, P.; Wong, T.Y. Retinal vascular imaging: A new tool in microvascular disease research. Circ. Cardiovasc. Imaging 2008, 1, 156–161.
Grogan, A.; Barclay, K.; Colville, D.; Hodgson, L.; Savige, J. Retinal small vessel dilatation in the systemic inflammatory response to surgery. Sci. Rep. 2022, 12, 13291.
Aleem, S.; Sheng, B.; Li, P.; Yang, P.; Feng, D.D. Fast and accurate retinal identification system: Using retinal blood vasculature landmarks. Trans. Ind. Inform. 2018, 15, 4099–4110.
Chaudhuri, S.; Chatterjee, S.; Katz, N.; Nelson, M.; Goldbaum, M. Detection of blood vessels in retinal images using two-dimensional matched filters. IEEE Trans. Med. Imaging 1989, 8, 263–269.
Khandouzi, A.; Ariafar, A.; Mashayekhpour, Z.; Pazira, M.; Baleghi, Y. Retinal vessel segmentation, a review of classic and deep methods. Ann. Biomed. Eng. 2022, 50, 1292–1314.
Vlachos, M.; Dermatas, E. Multi-scale retinal vessel segmentation using line tracking. Comput. Med. Imaging Graph. 2010, 34, 213–227.
Li, Q.; You, J.; Zhang, D. Vessel segmentation and width estimation in retinal images using multiscale production of matched filter responses. Expert Syst. Appl. 2012, 39, 7600–7610.
Wang, S.; Han, Y.; Chen, J.; Pan, Y.; Cao, Y.; Meng, H. Weed classification of remote sensing by UAV in ecological irrigation areas based on deep learning. J. Drain. Irrig. Mach. Eng. 2018, 36, 1137–1141.
Xu, Y.; Wen, D.; Zhou, J.; Fan, X.; Liu, Y. Identification method of cotton seedlings and weeds in Xinjiang based on faster R-CNN. J. Drain. Irrig. Mach. Eng. 2021, 39, 602–607.
Chen, J.; Zhan, Y. Action recognition based on multiple time scale two-stream CNN and confidence fusion in video. J. Jiangsu Univ. Nat. Sci. Ed. 2021, 42, 318–324.
Fan, Y.; Shi, L.; Su, W.; Yan, H. Lane detection algorithm based on PINet + RESA network. J. Jiangsu Univ. Nat. Sci. Ed. 2023, 44, 373–378.
Galdran, A.; Anjos, A.; Dolz, J.; Chakor, H.; Lombaert, H.; Ayed, I.B. State-of-the-art retinal vessel segmentation with minimalistic models. Sci. Rep. 2022, 12, 6174.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440.
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241.
Jin, Q.; Meng, Z.; Pham, T.D.; Chen, Q.; Wei, L.; Su, R. DUNet: A deformable network for retinal vessel segmentation. Knowl. Based Syst. 2019, 178, 149–162.
Zhang, Y.; Fang, J.; Chen, Y.; Jia, L. Edge-aware U-net with gated convolution for retinal vessel segmentation. Biomed. Signal Process. Control 2022, 73, 103472.
Yin, P.; Cai, H.; Wu, Q. DF-Net: Deep fusion network for multi-source vessel segmentation. Inf. Fusion 2022, 78, 199–208.
Li, J.; Gao, G.; Yang, L.; Liu, Y. GDF-Net: A multi-task symmetrical network for retinal vessel segmentation. Biomed. Signal Process. Control 2023, 81, 104426.
Guo, S. CSGNet: Cascade semantic guided net for retinal vessel segmentation. Biomed. Signal Process. Control 2022, 78, 103930.
Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364.
Lin, Z.; Huang, J.; Chen, Y.; Zhang, X.; Zhao, W.; Li, Y.; Lu, L.; Zhan, M.; Jiang, X.; Liang, X. A high resolution representation network with multi-path scale for retinal vessel segmentation. Comput. Methods Programs Biomed. 2021, 208, 106206.
Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.R.; Cheng, M.; Hu, S. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368.
Corbetta, M.; Shulman, G.L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 2002, 3, 201–215.1.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008.
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803.
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154.
Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612.
Li, X.; Zhong, Z.; Wu, J.; Yang, Y.; Lin, Z.; Liu, H. Expectation-maximization attention networks for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9167–9176.
Li, K.; Qi, X.; Luo, Y.; Yao, Z.; Zhou, X.; Sun, M. Accurate retinal vessel segmentation in color fundus images via fully attention-based networks. IEEE J. Biomed. Health Inform. 2020, 25, 2071–2081.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Zhipin Ye

Yingqian Liu

Teng Jing

Zhaoming He

Ling Zhou

View Times: 242

Update Date: 16 Nov 2023

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. Deep Convolutional Neural Networks

3. Self-Attention Modules

References