Accurate segmentation of nasopharyngeal carcinoma is essential to its treatment effect. CAFS addresses the above challenges through three mechanisms: the teacher–student cooperative segmentation mechanism, the attention mechanism, and the feedback mechanism. CAFS can use only a small amount of labeled nasopharyngeal carcinoma data to segment the cancer region accurately. The average DSC value of CAFS is 0.8723 on the nasopharyngeal carcinoma segmentation task.
1. Introduction
Nasopharyngeal carcinoma
[1][2] is one of the most common cancers, wildly occurring around the world. According to global cancer statistics, there were 133,354 new nasopharyngeal carcinoma cases and 80,008 deaths in 2020
[3]. Nasopharyngeal carcinoma is an epithelial carcinoma arising from the nasopharyngeal mucosal lining
[4], which is generally observed at the pharyngeal recess of the nasopharynx
[5]. In the clinic, nasopharyngeal carcinoma has three types: ascending, descending, and mixed
[6]. The ascending type invades the skull base crania and destroys nerves, the descending type metastasizes to distant tissues through cervical lymph, and the mixed type has both. Thus, due to the particular location of nasopharyngeal carcinoma, it is abnormally dangerous once it metastasizes.
Currently, radiotherapy has become one of the most effective methods for treating nasopharyngeal carcinoma
[7]. The segmentation of nasopharyngeal carcinoma images significantly affects the effects of radiotherapy
[8]. Accurate segmentation would improve the effectiveness of radiotherapy and thus increase patient survival
[9]. The traditional method of segmentation is manually operated by the physician. However, due to the irregularity of nasopharyngeal carcinoma tissues, it is often a time-consuming burden for doctors to manually segment the boundaries
[10]. Moreover, manual segmentation is often so subjective that doctors with different levels of expertise may come up with different segmentation results
[11].
To reduce the burden on physicians, more and more deep learning algorithms are now being utilized to segment medical images
[12][13][14]. However, it is difficult for many deep learning models to segment nasopharyngeal carcinoma boundaries accurately. First, lots of deep learning algorithms typically utilize the fully-supervised approach. The fully-supervised approach is that all training data are labeled and the model is trained using these labeled data
[15]. This means that the model requires a large amount of labeled data to obtain the expected training results
[16]. However, the hardship of annotating interested targets hinders fully-supervised learning in medical imaging. In contrast, unlabeled data are readily available
[17]. Second, the imaging characteristics of nasopharyngeal carcinoma usually resemble the surrounding tissue
[18][19], making it challenging to identify. That leads many algorithms to mistake the surrounding tissue for nasopharyngeal carcinoma. Third, due to the irregular shape of the nasal cavity, the shape of nasopharyngeal carcinoma is usually very complex as well
[20][21], which leads to many algorithms that do not segment the boundaries accurately.
To address the challenges encountered in the above-mentioned conventional methods of fully-supervised segmentation of nasopharyngeal carcinoma, and therefore to improve the efficacy and survival rate of nasopharyngeal carcinoma, this entry proposes an attention-based co-segmentation semi-supervised method named CAFS for automatic segmentation of nasopharyngeal carcinoma. The semi-supervised approach means that only a portion of the training data contains labels, and uses these labeled and unlabeled data to train the model collaboratively
[22]. As shown in
Figure 1, CAFS contains three primary strategies: the teacher–student cooperative segmentation mechanism, the attention mechanism, and the feedback mechanism. The teacher–student model is typically used in knowledge distillation
[23]. In general, the teacher model uses the obtained knowledge to guide the student model training, making the student model have comparable performance to the teacher model. Among CAFS, the teacher–student cooperative segmentation mechanism aims to reduce the number of nasopharyngeal carcinoma labels used. The teacher model learns from a small amount of labeled nasopharyngeal carcinoma data and then generates pseudo-masks for the unlabeled nasopharyngeal carcinoma data. The student model utilizes the unlabeled nasopharyngeal carcinoma data and the pseudo-mask generated by the teacher model to train itself and segment the unlabeled nasopharyngeal carcinoma data. This allows for reducing the use of labeled data. The attention mechanism serves to pinpoint the location of cancer, which zooms in on the target and thus captures more information to localize the nasopharyngeal carcinoma. The feedback mechanism aims to make the segmentation boundaries of nasopharyngeal carcinoma more accurate. The student model is trained on unlabeled data and pseudo-masks and then predicts the labeled data. The prediction results are compared with the ground truth to generate feedback to update the model’s parameters.
Figure 1. The task of CAFS is to automatically segment out the nasopharyngeal carcinoma boundaries by using only a small amount of labeled data. However, there are several challenges of segmenting nasopharyngeal carcinoma. First, reliable labeled data are difficult to obtain. Second, the nasopharyngeal carcinoma resembles the surrounding tissue. Third, the boundaries of nasopharyngeal carcinoma are irregular. The CAFS utilizes the cooperative, attention mechanism, and the feedback mechanism to address these difficulties, respectively.
In general, the main contribution of CAFS are as follows:
-
The teacher–student cooperative segmentation mechanism allows CAFS to segment nasopharyngeal carcinoma using only a small amount of labeled data;
-
The attention mechanism could prevent confusing nasopharyngeal carcinoma with surrounding tissues;
-
The feedback mechanism allows CAFS to segment nasopharyngeal carcinoma more accurately.
2. Fully-Supervised
The most common method for the automatic segmentation of nasopharyngeal carcinoma is the fully-supervised methods
[24][25][26][27][28]. In the last few decades, deep learning methods have been increasingly used in medical image segmentation
[29][30][31]. Among them, many fully supervised algorithms have been proposed for nasopharyngeal carcinoma segmentation. Convolutional neural networks (CNN)
[32] are an effective image segmentation method that captures contextual semantics by computing high-level feature maps
[33][34]. Since the pioneering CNN algorithm by Lecun et al., in 1990, more and more improved CNN algorithms for image segmentation have been proposed. Pan et al.
[35] improved the typical CNN network by designing dilated convolution at each layer of the FPN to obtain contextual associations, which was applied to nasopharyngeal organ target segmentation. Some other scholars segment nasopharyngeal carcinoma by improving CNN into the CNN-based method with three-dimensional filters
[36][37][38]. Ronneberger et al.
[39] propose in 2015 a convolutional networks called U-Net for biomedical image segmentation. After that, many segmentation algorithms for medical images were adapted from U-Net. Some scholars combined mechanisms such as attention mechanism and residual connectivity with U-Net to improve segmentation performance and segment the nasopharyngeal carcinoma
[40][41][42]. In order to accommodate the volume segmentation of medical images, many U-Net-based 3D models have been developed as well
[43][44]. While these fully supervised methods are capable of achieving the excellent segmentation effect, they predicate by using a large amount of labeled data. The fact that reliable labeled data are often tough to obtain as specialized medical knowledge and time are both demanded.
3. Semi-Supervised
More and more semi-supervised segmentation methods have been proposed in recent years to confront the challenge of difficult access to annotated data
[45]. Self-training is one of the most commonly used semi-supervised methods
[46]. It first trains using a small amount of labeled data, then makes predictions on unlabeled data, and finally mixes the excellent predictions with labeled data for training
[47][48]. Another common semi-supervised method is co-training, which uses the interworking between two networks to improve the segmentation performance
[49][50]. Hu et al.
[51] proposed uncertainty and attention guided consistency semi-supervised method to segment nasopharyngeal carcinoma. Lou et al.
[52] proposed a semi-supervised method that extends the backbone segmentation network to produce pyramidal predictions at different scales. Zhang et al.
[53] then use the teacher’s uncertainty estimates to guide the student and perform consistent learning to uncover more information from the unlabeled data.
Sun et al.
[54] also applies the teacher–student paradigm in medical image segmentation. It is worth noting that the mixed supervision in
[54] stands for partial dense-labeled supervision from labeled datasets and supplementary loose bounding-box supervision for both labeled and unlabeled data. The work only uses partial dense-labeled supervision. In addition, Ref.
[54] applies bounding-box supervision to provide localization information. In the work, the attention mechanism involves localizing the nasopharyngeal carcinoma target. Moreover, Sun et al.
[54] have the teacher model well-trained before providing pseudo label guidance for the student, while optimize both teacher and student models simultaneously.
In addition to the self-training and the co-training, semi-supervised methods include paradigms such as generative models, transductive support vector machines, and Graph-Based methods as well.