SAA-UNet: Comparison
Please note this is a comparison between Version 1 by Shroog Hawash Alshomrani and Version 2 by Jason Zhu.
The disaster of the COVID-19 pandemic has claimed numerous lives and wreaked havoc on the entire world due to its transmissible nature. One of the complications of COVID-19 is pneumonia. Different radiography methods, particularly computed tomography (CT), have shown outstanding performance in effectively diagnosing pneumonia.
  • COVID-19 pneumonia segmentation
  • CT images
  • SAA-UNet model

1. Introduction

In December 2019, people began rush Wuhan hospitals with severe pneumonia of unknown cause. After the number of infected people increased, on 31 December, China notified the World Health Organization of the outbreak [1][2][1,2]. After several examinations, the virus was found to be a coronavirus with more than 70% similarity to SARS-CoV on 7 January [3]. Coronavirus 2019 is a severe acute respiratory syndrome (SARS-CoV-2), named COVID-19 by the World Health Organization in February 2020 [4]. It is from the beta virus family, which is highly contagious and causes various diseases. One of these viruses appeared in 2003, called severe acute respiratory syndrome (SARS), and another appeared in 2012, the Middle East respiratory syndrome (MERS) [5][6][5,6]. The first fatal case of coronavirus was reported on 11 January 2020. As a result, the World Health Organization (WHO) declared a global emergency on 30 January 2020. The number of cases began to increase dramatically due to human-to-human transmission [7]. The infection is transmitted through droplets from the coughing and sneezing by patients, whether they show symptoms or not [8]. These infected droplets can spread from one to two meters and accumulate on surfaces. COVID-19 continued to spread despite strict preventive efforts. Consequently, the WHO declared coronavirus a global pandemic at the International Health Meeting held in March 2020 [9]. The number of confirmed cases has reached more than 758 million, and the number of deaths has reached 6,859,093 persons [10].
Pneumonia is a complication of viral diseases such as COVID-19, influenza, the common cold, bacteria, fungi, and other microorganisms. COVID-19 can affect any organ in the human body, and the symptoms range from mild, like the common cold, to more severe pneumonia or even be asymptomatic. Pneumonia caused by COVID-19 is named “novel coronavirus-infected pneumonia (NCIP)” [11].
The formal diagnosis of COVID-19 infection is the reverse-transcription-polymerase chain reaction (RT-PCR) test. This test takes a swab from the mouth, nasopharynx, bronchial lavage, or tracheal aspirate. The RT-PCR test has a high error rate because of the low sensitivity. Furthermore, blood tests may show signs of COVID-19 pneumonia [12]. Computed tomography (CT) of the chest is a complementary tool for the diagnosis even before the patients develop symptoms, as CT images show the places of lung damage caused by COVID-19 [13]. This helps to know the extent of the infection at any stage of the disease. CT is the latest tool that uses X-rays and computers to create three-dimensional human body images. It is a scan that combines a series of X-ray images taken from different angles around an organ or body and uses computer processing to create cross-sectional images called slices. Computerized tomography images provide more detailed information than regular X-rays, as they are three-dimensional images. These 3D images are made using tomography, which shows the parts of the organ, facilitates segmentation, and diagnoses diseases. In CT scans for people with COVID-19, the lungs contain different opacity forms such as ground-glass opacity (GGO) and consolidation [14]. This infection is due to the entry of the virus into the cells by attaching to surface angiotensin-converting receptor enzyme 2 (ACE2). After the virus enters, it causes the tiny air sacs to inflate, causing them to fill with so much fluid and pus that breathing is difficult. The inhaled oxygen is processed and delivered to the blood in these sacs. This damage causes tissue rupture and blockage in the lungs. Later, the walls of these sacs thicken, making breathing difficult. As a result of that, the lungs become the first organ affected by the coronavirus [15][16][15,16].
Artificial intelligence, specifically deep learning, has recently played an effective and influential role in medical images. The diagnostic evaluation of medical image data is a human-based technique that requires sufficient time by expert radiologists. Recent advances in artificial intelligence have substituted many personalized diagnostic procedures with computer-aided diagnostic (CAD) methods that can achieve effective real-time diagnoses. As a result, it has an essential role in diagnosing diseases such as infections, cancer, and many other diseases by taking shots of the organ or even the whole body to help radiologists make decisions and plan the stage of treatment. The segmentation task identifies the pixel or voxels that make up the contour or the interior of the region of interest (ROI) as the first stage in computer-aided diagnostics (CAD) [17][18][17,18]. Many deep learning algorithms used in image segmentation tasks have succeeded in biomedical images. For example, a fully convolutional network (FCN) was proposed as an end-to-end, pixel-to-pixel network for image segmentation [19], SegNet [20]. UNet was proposed for biomedical image segmentation, in which an encoder–decoder structure with concatenated skip connections yielded significant performance improvements [21], and the modified UNet (UNet++ [22]) and PSPnet [23] have been widely used in medical image segmentation.

2. Related Work

With artificial intelligence (AI) advancements in the health field, many deep learning algorithms have been proposed for medical image processing as segmentation tasks play an essential role in the treatment stage. For example, Ronneberger et al. [21] introduced the standard UNet for biomedical image segmentation. They evaluated UNet on several datasets, including the ISBI Challenge for segmenting neuronal structures in electron microscopic stacks. They achieved an average IOU on the PhC-U373 dataset of 0.92 and on DIC-HeLa of 0.777. Oktay et al. [24][25] proposed an extension to the UNet architecture. They added an attention mechanism to skip the connection of UNet to focus on the image’s region of interest and improve the segmentation. They evaluated attention UNet on the 150 abdominal 3D CT scans from patients diagnosed with gastric cancer dataset and achieved a Dice score of 0.84. The second dataset CT consisting of 82 contrast-enhanced 3D CT scans of the pancreas achieved a Dice score of 0.831. In continuation, Zhao et al. [25][26] proposed a modification of the UNet architecture that included a spatial attention module in the bridge to focus on the important regions of the image. They evaluated SA-UNet on the Vascular Extraction (DRIVE) dataset and the Child Heart and Health Study (CHASE-DB1) dataset. They achieved F1-scores of 0.826 and 0.815, respectively. Relying on the above, deep learning models can be used to find areas of lung damage caused by 2019-nCoV. Athanasios Voulodimos et al. [26][27] used an FCN-8s to segment COVID-19 pneumonia and achieved a 0.57 Dice coefficient. They proposed a light UNet model with three stages of the encoder and decoder to deal with the limited datasets of this problem. This achieved a 0.64 Dice coefficient. Sanika Walvekar and Swati Shinde proposed UNet with preprocessing and spatial, color, and noise data augmentation from the MIScnn library with Tversky loss [27][28]. The Dice similarity coefficient (DSC) for COVID-19 was 0.87 for infection segmentation and 0.89 for the lungs. Imran Ahmed et al. [28][29] proposed an attention mechanism added to the standard UNet architecture to improve feature representation with binary cross-entropy Dice loss and boundary loss. The Dice score was 0.764 on the validation set. Tongxue Zhou et al. [29][30] proposed a spatial attention module and a channel attention module added to a UNet architecture with focal Tversky loss. The spatial attention module reweights the feature representation spatially and channelwise to capture rich contextual relationships for better feature representation. The DSC was 0.831. Narges Saeedizadeh et al. [30][31] proposed a ground-glass recognition system called TV-Unet, a UNet model with a total variation gradient. The loss function was the binary cross-entropy with a total variation term. The DSC achieved 0.86 and 0.76 for two different splits. The combination of two UNet models proposed by Narinder Singh Punna and Sonali Agarwala [30][31] is called the CHS-NET model. One segments the lungs, and the other segments infection with the weighted binary cross-entropy and Dice loss function. The CHS-NET model uses UNet, Google’s Inception model, a residual network, and an attention strategy. The DSC for the lungs was 0.96, whereas for COVID-19 infection, it was 0.81. Tal Ben-Haim et al. [31][32] proposed a VGG backbone in the encoder of two UNets. The first UNet model segments the lung regions from CT images. The second UNet model extracts the infection or shapes of lesions (GGO and consolidation). For the segmentation of infection with the binary cross-entropy loss, the DSC was 0.80, and for the multi-class weighted cross-entropy (WCE) and Dice loss, the GGO was 0.79 DSC and the consolidation 0.68. A plug-and-play attention module [32][33] was proposed to extract spatial features by adding to the UNet output. The plug-and-play attention module contains a position offset to build the positional relationship between pixels. This framework achieved 0.839 for the DSC. Ziyang Wang and Irina Voiculescu [33][34] proposed the quadruple augmented pyramid network (QAP-Net) for multi-class segmentation by establishing four augmented pyramid networks on the encoder–decoder network. These four were two pyramid atrous networks with different dilation rates, the pyramid avg pooling network and the pyramid max pooling network. The mean intersection over union (IOU) score with categorical focal loss was 0.816. Qi Yang et al. [34][35] used MultiResUNet [35][36] as the basic model, introduced a new “Residual block” structure in the encoder part, added regularization and dropout, and changed the partial activation function from rectified linear unit (ReLU) activation function to LeakyReLU. The DSC with a combination of binary cross-entropy, focal, and Tversky loss was 0.884. Nastaran Enshaei et al. [36][37] proposed using the Inception-V3, Xception, InceptionResNet-V2, and DenseNet-121 pre-trained encoders and replacing each fully connected model with the decoder to segment COVID-19 infection. Consequently, the the results of multiple models were aggregated by soft voting for each image pixel. This achieved a Dice score for GGO = 0.627 and consolidation = 0.592 with the categorical cross-entropy. Moreover, Murat Ucar [37][38] proposed aggregating the pre-trained VGG16, ResNet101, DenseNet121, InceptionV3, and EfficientNetB5 with a pixel-level majority vote to obtain the last class probabilities for each pixel in the image. The Dice coefficient was 0.85 with the Dice loss. Hong-Yang PEI et al. [38][39] proposed a multi-point supervised network (MPS-Net) based on UNet. The proposed model gave a 0.833 DSC result with a combination of binary cross-entropy and Tversky loss to detect COVID-19 infection. Ümit Budak et al. [39][40] proposed an A-SegNet network that combines SegNet with the attention gate (AG) mechanism. The DSC score was 0.896 on the validation set with focal Tversky loss. Alex Noel Joseph Raj et al. proposed an attention gate-dense network-improved dilation convolution UNet (ADID-UNET) based on UNet [40][41]. ADID-UNet achieved an average Dice score of 0.803 on the MedSeg + Radiopaedia dataset with the Dice loss. Ying Chen et al. proposed a HADCNet model based on UNet that contains hybrid attention modules in five stages of the encoder and decoder [41][42]. It helps balance the semantic differences between various levels of features, which refines the feature information. HADCNet was trained with five-fold cross-validation with the cross-entropy and Dice loss on the MedSeg, Radiopaedia P9, 150 COVID-19 patients, and Zenodo datasets, achieving Dice scores of 0.792, 0.796, 0.785, and 0.723. Nour Eldeen M. Khalifa et al. proposed an architecture of three encoder and decoder stages to deal with the limited datasets problems [42][43]. The mean IOU score for Zonodo 20P achieved 0.799. Yu Qiu et al. proposed a MiniSeg model to extract multiscale features and deal with limited datasets with 83K parameters [43][44]. After MiniSeg was trained with five-fold cross-validation with the cross-entropy loss on MedSeg, Radiopaedia (P9), Zenodo 20P, and MosMedData, the average Dice scores were 0.759, 0.80, 0.763, and 0.64, respectively. Xiaoxin Wu et al. proposed a focal attention module (FAM) inspired by a residual attention network that contains channel and spatial attention, with a residual branch in the feature map [44][45]. The focal attention module was applied to the FCN, UNet, SegNet, PSPNet, UNet++, and DeepLabV3+ with binary cross-entropy loss (BCE), where the best was DeepLabV3+ when applied on Zenodo 20P with an average Dice score of 0.885. Feng Xie et al. proposed the double-U-shaped dilated attention network (DUDA-Net) to enhance segmentation [45][46]. DUDA-Net contains a coarse-to-fine network with a coarse network for lung segmentation and a fine network for infection segmentation. The proposed model was trained with five-fold cross-validation with Tversky loss on infection slices of Radiopaedia 9P with an average Dice score of 0.871 and a mean IOU of 0.771. Vivek Kumar Singh et al. proposed a LungInfseg model based on an encoder and decoder structure [46][47]. LungInfseg was applied on Zenodo 20P with a combination of blockwise (BWL) and total loss (TL), with an average Dice score of 0.8034. R. Karthik et al. proposed a contour-enhanced attention decoder CNN model with an encoder and decoder structure [47][48]. The proposed model with the mean pixelwise cross-entropy loss was applied to the Zenodo 20P dataset and had an average Dice score of 0.88; on the MosMedData dataset, the Dice score was 0.837, and on the combination of the Zenodo 20P and MosMedData datasets, the Dice score was 0.854. Kumar T. Rajamani et al. proposed the deformable attention net (DDANet) model [48][49] based on UNet and criss-cross attention (CCNet) [49][50]. The proposed model has the same structure as attention UNet [24][25], with a criss-cross attention module inserted in the bottleneck to capture non-local interactions. DDANet was trained with five-fold cross-validation on the combined dataset of MedSeg and Radiopaedia 9P with multiple classes with class-weighted cross-entropy loss where GGO was 0.734, consolidation was 0.614, and the average Dice score was 0.781. Three-dimensional algorithms can be used for the overall CT volume of a patient. Keno K. Bressem [50][51] proposed a pre-trained 3D ResNet block added to the 3D UNet architecture to solve COVID-19 computed tomography image segmentation. The DSC was 0.648, combining the Dice loss and pixelwise cross-entropy loss. Aswathy A. L. and Vinod Chandra [51][52] proposed a cascaded 3D UNet with two 3D UNet, the first for segment lung volumes and the second for infection volume. The DSC for the lung = 0.925 and infection = 0.82. The 3D algorithms for the segmentation of COVID-19 from CT are rarely used for several reasons, including the computational cost and limited datasets of this problem.
Video Production Service