Perceptual Encryption-Based Image Communication System for Tuberculosis Diagnosis: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: ,

Block-based perceptual encryption (PE) algorithms are becoming popular for multimedia data protection because of their low computational demands and format-compliancy with the JPEG standard. In conventional methods, a colored image as an input is a prerequisite to enable smaller block size for better security. However, in domains such as medical image processing, unavailability of color images makes PE methods inadequate for their secure transmission and storage.

  • perceptual encryption
  • JPEG standard
  • EfficientNetV2
  • deep learning
  • tuberculosis diagnosis

1. Introduction

Cloud services provide a cost-effective solution to meet the Information and Communication Technology (ICT) needs of an organization. The organization can use ICT resources, services and software of a Cloud Services Provider (CSP) via the internet without a necessity of internal infrastructure or hardware on-site installations. With the recent success of Machine Learning (ML) in the field of computer vision, automatic computer aided diagnosis (CAD) systems have emerged in healthcare organizations to assist doctors and practitioners. Particularly, Deep Learning (DL), a subfield of ML, has achieved state-of-the-art performance for image classification [1]. However, DL models are compute-intensive tasks, and their training requires cutting-edge technology and high computational resources. In this regard, healthcare organizations can avail cloud-computing services to access the latest technology in order to speed up the training process and allow DL models to scale efficiently with a lower capital cost [2,3]. In addition, training DL models requires a large volume of sample data, which in some cases such as the medical domain, is expensive and difficult to acquire. To overcome this issue, healthcare organizations can benefit from a community cloud, where services are shared by organizations with common interests. In this case, cloud storage services can be used as a shared central data repository for joint projects and collaboration among the organizations. However, like all communication systems, when data are outsourced for cloud services, there is always a risk of information leakage and a large volume of data requires high bandwidth [4,5,6,7].
Compression and encryption are two processes that satisfy the dual requirements of data transmission over bandwidth constraint and public channels. Image compression gives a compact representation to an image such that it requires less number of bits. It can be achieved either in lossless or lossy mode. In lossless compression, an image can be recovered with almost the same quality as that of the original image, whereas in lossy mode the image quality degrades. Compared to lossless mode, lossy compression offers better savings; however, resulting quality degradation in lossy mode may not be acceptable in certain domains. For example, medical images contain information crucial for correct diagnosis of diseases; therefore, their compression should be carried out in such a way that the diagnostic information remains intact in them while their sizes are reduced [8,9,10]. One of the popular approaches to achieve this goal is to compress the region-of-interest (ROI) necessary for diagnosis in lossless mode and non-ROI in lossy mode [8,9,10]. Such methods can achieve a significant reduction in the image size while preserving its important details. However, they require segmentation of an image beforehand, which is computationally expensive and is a target task to be performed using cloud-computing resources. Therefore, ROI-based methods are not suitable for efficient image data transmission [2].
Encryption makes image data unintelligible, which can only be recovered by its inverse decryption process. The number theory and chaos theory-based encryption algorithms are proven efficient for securing image data [2]. These conventional encryption algorithms perform stream encryption and/or scrambling of pixel values; however, they are only suitable for encrypting raw images. For example, the JPEG compressed image consists of format markers and any changes in them by an external operation will leave the image uninterpretable. Similarly, re-encoding a cipher image as a JPEG image results in file size increment. Different from other form of data, encryption of image data can be carried out only by disrupting their intrinsic properties. For example, changing pixel correlation and/or redundancy in an image can result in an unintelligible image with a necessary level of security. Based on this observation, a new class of encryption algorithms has been emerged called Perceptual Encryption (PE) algorithms to meet the aforementioned requirements of encrypting compressed images. The main idea is to reverse the conventional order of performing compression prior to encryption. PE performs block-based operations that hides only perceptual information of an image, thereby preserves image intrinsic properties necessary to carry out computations in the encryption domain. For example, refs. [11,12] proposed PE methods for enabling privacy-preserving DL applications. In addition, PE cipher images are JPEG compressible, which makes them suitable for numerous applications, such as cloud photo storage and social networking services [13,14] and image retrieval in the encryption domain [15]. Nonetheless, PE methods are resilient against various attacks, including brute-force and cipher-text-only attacks [16].
Based on an input image representation, PE methods can be grouped as Color-PE and Grayscale-PE methods. The Color-PE represents an input color image as a three-component image and uses same encryption keys for each component [17], whereas their extended versions encrypt each color component independently [12,18]. The latter methods have larger keyspace as they have increased number of blocks. However, this increment is limited by the smallest allowable block size in the JPEG algorithm, for instance, block size no smaller than 16 × 16 should be used for color image compression. This recommended size is necessary to avoid block artifacts resulted from the JPEG chroma-subsampling step [2]. Smaller block size such as 8 × 8, can be utilized in the JPEG algorithm without any adverse effect, for compression of grayscale images. Therefore, to exploit the smaller block size for an expanded keyspace, Grayscale-PE represents color input as a pseudo-grayscale image by combining the color components along the horizontal or vertical direction [13,14]. Overall, in conventional methods, color image as an input is a prerequisite for better security.

2. Deep Learning-Based Tuberculosis Screening

Grivkov et al. [19] implemented InceptionNetV3 [20] for diagnosis of TB in Shenzhen (SH) and Montgomery (MG) datasets [21] and achieved 86.8% accuracy. Das et al. [22] exploited transfer learning to improve InceptionV3 accuracy to 91.7% on the same datasets. Priya et al. [23] implemented transfer learning on VGG19 [24], ResNet50 [25], DenseNet121 [26] and InceptionV3 models. In their analysis, pre-trained VGG19 has achieved 89% and 95% best accuracies on MG and SH datasets, respectively. Cao et al. [27] implemented DenseNet121, VGG and ResNet152 [25] models and achieved best accuracy of 90.38% classification accuracy with DenseNet121. Raman et al. [28] adopted somewhat different approach than the aforementioned methods. They have used three pre-trained models (ResNet101 [25], VGG19, and DenseNet201 [26]) to extract features from CXR images and use eXtreme Gradient Boosting (XG-Boost) (1.6.1, Tianqi Chen and Carlos Ernesto Guestrin, Seattle, WA, USA) [29] model to classify TB and non-TB in them. In their experiments, DenseNet201 with XG-Boost architecture achieved the highest accuracy of 99.92% as compared to its counterparts. Munadi et al. [30] proposed to enhance CXR quality before feeding them to pre-trained ResNet and EfficientNet [31] models. They have used three different image-enhancing techniques (unsharped masking, high-frequency emphasis filtering, and contrast limited adaptive histogram equalization). In their analysis, EfficientNet with unsharped masking image enhancement achieved 89.92% accuracy on SH dataset. Msnoda et al. [32] implemented ResNet, GoogLeNet [33], and AlexNet [34] with an extra Spatial Pyramid Pooling (SPP) [35] layer. Among the implemented architectures, GoogLeNet achieved the highest classification accuracy of 97%, which was then improved to 98% by using the SPP layer.
The methods discussed so far rely on the architecture of an individual model for classification efficiency. There are methods that combine two or even more models to form an ensemble network to achieve better performance. For example, Rajaraman et al. [36] implemented VGG16, InceptionResNetV2 [37], InceptionV3, XceptionNet [38] and DenseNet121, and then ranked them based on their accuracy. In their experiments, the top-3 models were InceptionV3 (accuracy = 94%), DenseNet121 (accuracy = 92.8%) and InceptionResNetV2 (accuracy = 92.5%). They have evaluated multiple ensemble methods to combine the top-3 models such as majority voting, simple averaging, weighted averaging stacking and blending to make an ensemble network. Their analysis showed that stacking ensemble demonstrated better performance and achieved 94.1% accuracy. Dasanayaka et al. [39] have implemented an ensemble of only two models (VGG16 and InceptionV3), and achieved 97.10% accuracy, which is higher than the ensemble of the three models proposed in [36]. Oloko-Oba et al. [40] have implemented an ensemble of VGG16, ResNet50 and InceptionV3 and achieved best accuracy of 96.14%. In their other study [41], they have explored ensemble of EfficientNets [31] for the diagnosis of TB. In their analysis of individual models, EfficientNet-B4 achieved best accuracy of 94.35% on SH dataset, which was then improved to 97.44% through ensemble learning. The ensemble was built by averaging the performance of three best individual EfficientNets (B2, B3, and B4). Saif et al. [42] proposed to combine the traditional hand-engineered feature with an ensemble of DenseNet169, ResNet50 and InceptionV3 models. Their ensemble model has achieved best accuracy of 99.7% on SH dataset. Overall, ensemble methods have shown superior performance for TB screening in CXR images than the individual models.

3. Perceptual Encryption Methods

The PE algorithm is block-based and performs four steps: blocks permutation, rotation and inversion, negative and positive transformation, and color channel shuffling. Based on these steps, several methods have been proposed in literature. They can be classified as Color-PE and Grayscale-PE methods based on their input image representation. In Color-PE methods, an input color image is represented as a three-component image, whereas Grayscale-PE methods represent an input as pseudo-grayscale image by concatenating its color components along the vertical or horizontal direction. In Grayscale-PE methods, the channel-shuffling step is omitted. This section provides a summary of PE related work.
Kurihara et al. [17] proposed a block-based Color-PE method that performs the encryption steps on each color component by using the same key. Since, the input is a color image, larger block size is used to avoid block artifacts in a decoded image resulted by chroma subsampling of the JPEG algorithm. However, the use of the same key for each color component and larger block size result in a smaller number of blocks, which make the scheme vulnerable to jigsaw puzzle attack. To increase the number of blocks for better security, Imaizumi et al. [43] proposed to perform the first three steps of encryption independently in each color component. As a result, the scheme has a larger key space than that of [17]; however, processing each component individually results in the JPEG compatibility issues. For example, the method is only applicable with the JPEG lossless algorithm only when using RGB colorspace. Ahmad et al. [12,18] proposed a PE method to deal with the compatibility issue of [43]. In their proposed schemes, rotation, inversion and pixel values transformations are performed on each color component independently. The use of a same key for permutation step in each component allows the JPEG algorithm with YCbCr colorspace for better compression savings. The extended PE methods in [12,18,43] have better security than the PE method proposed in [17] as the keyspace is expanded and color distribution is altered significantly. However, the main limitation of extended PE methods is that they cannot exploit the JPEG chroma subsampling.
An alternative approach has been adopted by Chuman et al. [13] that allows the use of a smaller block size. The main idea is to represent an input color image as a pseudo grayscale image by concatenating the color components along the horizontal or vertical direction; therefore, belongs to Grayscale-PE methods. Such representation allows use of a smaller block size without any adverse effect on the decoded image quality and compression savings. In addition to smaller block size, lack of color information improves robustness against jigsaw puzzle solver attack. When chroma subsampling is desirable, Sirichotedumrong et al. [14] proposed to convert input image from RGB to YCbCr colorspace, perform down sampling on the color components and then concatenate them with the luminance component. For example, to combine them horizontally, the color components must be concatenated vertically and vice versa.
Compared to Color-PE methods and their extensions, Grayscale-PE methods provide better security as their use of smaller block size increases the number of blocks and pseudo grayscale representation efficiently disrupts the color information. However, for block-based PE schemes, there is an efficiency tradeoff between compression and encryption because of the block size choice. For example, a block size of no smaller than 16 × 16 and 8 × 8 should be used in Color-PE and Grayscale PE methods, respectively.

This entry is adapted from the peer-reviewed paper 10.3390/electronics11162514

This entry is offline, you can click here to edit this entry!
Video Production Service