Combining State-of-the-Art Pre-Trained Deep Learning Models: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , ,

Skin cancer poses a significant healthcare challenge, requiring precise and prompt diagnosis for effective treatment. While recent advances in deep learning have dramatically improved medical image analysis, including skin cancer classification, ensemble methods offer a pathway for further enhancing diagnostic accuracy.

  • skin cancer
  • classification
  • deep learning
  • transfer learning
  • max voting

1. Introduction

Skin cancer, a prevalent yet often misdiagnosed disease poses a significant challenge in medical diagnostics. Skin cancer is characterized by the uncontrolled growth of abnormal cells in the skin. The three most common types of skin cancer are basal cell carcinoma, squamous cell carcinoma, Merkel cell carcinoma, and melanoma [1]. Non-melanoma skin cancer is primarily caused by DNA damage from ultraviolet radiation exposure [2,3]. Skin cancer ranks among the most prevalent forms of cancer globally, making up around a third of all reported cancer diagnoses, with its frequency steadily rising each year [4]. Just in the United States, it is estimated that more than 9500 individuals receive a skin cancer diagnosis daily [5,6].
While skin cancer is often treatable, early detection and precise diagnosis play a pivotal role in achieving favourable treatment results and enhancing patient survival rates [7,8,9]. Skin cancer detection has traditionally relied on a mix of visual examination and histopathological analysis, methods which are fraught with limitations in accuracy and scalability. Unfortunately, traditional methods for diagnosing skin cancer, such as visual inspection and histopathological examination, can be time-consuming and subjective, and have high inter-observer variability.
Numerous non-invasive imaging technologies for skin cancer detection and monitoring have been developed [10] in recent years. One notable example is the use of multi-spectral sensors to detect differences in the refraction index using millimeter-wave to terahertzphotonic near-field imaging [11]. Recent advancements in different areas of artificial intelligence (AI) have significantly impacted many fields, suggesting that AI could greatly improve cancer diagnosis. The field of medical diagnostics is now benefiting from the rapid evolution of deep learning technologies, employing advanced models like Xception, InceptionResNetV2, and ResNet50V2 [12,13,14].
Despite these technological advancements, the difficulty in differentiating malignant from benign cases continues to hinder diagnostic accuracy. In recent years, deep neural networks, especially convolutional neural networks (CNNs), [15] have demonstrated significant potential in precisely detecting and categorizing skin cancer from medical images [7,16,17]. CNNs are a specialized class of neural networks ideally suited for tasks involving image classification, as they can autonomously acquire hierarchical representations of image features directly from raw pixel values [18,19].
The challenge in skin cancer detection lies in the high variability of lesions and the limitations of current diagnostic methods, which often lead to inaccuracies and inconsistent diagnoses. Transfer learning, which entails the utilization of pre-trained CNN models for feature extraction from medical images, has additionally demonstrated enhancements in classification accuracy [20,21,22]. The application of deep learning methods in the domain of skin cancer detection and classification has emerged as a vibrant research area in recent years. Numerous investigations have substantiated that CNN models can achieve high accuracy in detecting and classifying skin cancer from medical images, and that transfer learning can further improve classification accuracy [23,24,25].
Although a substantial research gap still exists in the investigation of a holistic and cohesive methodology, the extant body of literature on skin cancer diagnosis utilizing deep learning models is rapidly growing with many researchers vying to create increasingly more accurate detection methods. There are, however, few studies that harness the possible synergies that may be achieved using a Max Voting Ensemble approach, which has not—as far as been known—been thoroughly investigated on a diverse dataset in the context of skin cancer diagnosis.
While deep learning for medical image analysis has made significant strides, a particular area of study interest is the application of cutting-edge pre-trained deep learning models to skin cancer detection. The primary aim of this work is to improve the precision and dependability of skin cancer diagnostic systems by examining the special use of the Max Voting Ensemble approach. This study presents an innovative approach for skin cancer detection and classification, utilizing the max voting ensemble technique with cutting-edge pre-trained deep learning models. The method proposed here combines the strengths of multiple pre-trained deep learning models, including Xception, InceptionResNetV2, ResNet50V2, InceptionV3, DenseNet121, DenseNet201, ResNet50, VGG16, AlexNet, and MobileNetV2, to create an ensemble with enhanced accuracy and robustness. The cutting-edge pre-trained models employed in this investigation have been extensively trained on large-scale skin cancer datasets, enabling them to capture complex features and patterns specific to skin lesions. 

2. Combining State-of-the-Art Pre-Trained Deep Learning Models

Skin cancer ranks as the most prevalent form of cancer in the United States, accounting for approximately 5.4 million diagnosed cases each year. Skin cancer is often curable, but effective treatment is contingent on early detection. Delayed diagnosis and treatment can result in adverse outcomes, such as increased rates of metastasis, as well as heightened morbidity and mortality. Conventional approaches to skin cancer diagnosis, including visual inspection and biopsy, are characterized by subjectivity, time-intensive procedures, and the potential for significant inter-observer variation. Consequently, there has been a rising interest in leveraging machine learning methods, notably deep learning techniques, to enhance the precision and efficiency of skin cancer diagnosis. Over the past few years, deep learning methods [26], particularly convolutional neural networks (CNNs) [17,23,26,27,28], have demonstrated significant potential for accurately identifying and categorizing skin cancer from medical images of skin lesions [7,16,24,25]. 
The advancements in deep learning techniques have spurred considerable progress in skin cancer detection. Recent review articles [29,30,31,32,33] explore key contributions in this domain, with a focus on the application of deep learning methods. For example, the comprehensive review by Dildar et al. (2021) [33] provides a contemporary overview of skin cancer detection leveraging deep learning techniques. The authors systematically analyze the latest advancements and methodologies in the field, offering valuable insights into the evolving landscape of skin cancer diagnostics. Brinker et al.’s [29] systematic review focuses on the application of convolutional neural networks (CNNs) for skin cancer classification. The paper critically assesses the state of the art in CNN-based skin cancer diagnosis, providing a comprehensive synthesis of existing literature and highlighting key trends in this rapidly evolving field. Adegun and Viriri (2021) conduct an in-depth exploration of deep learning techniques applied to skin lesion analysis and melanoma cancer detection [30]. Their survey not only summarizes existing methodologies but also critically examines the state of the art, offering insights into challenges and opportunities for further research in this domain. Munir et al.’s bibliographic review [31], featured in Cancers, presents a holistic examination of cancer diagnosis using deep learning, encompassing various cancer types, including skin cancer. The review consolidates knowledge from diverse studies, shedding light on the broad applications of deep learning in cancer diagnosis and underscoring its potential impact on improving diagnostic accuracy. Li et al. [32] conscientiously discuss the challenges faced in the application of deep learning to skin disease diagnosis. From data limitations to interpretability issues, the authors provide a balanced perspective on the obstacles that researchers and practitioners must navigate. Importantly, the review concludes with insights into potential future research directions, guiding the trajectory of advancements in this evolving field.
A number of recent studies of ensemble deep learning for biomedical imaging have advanced the field [34,35]. A case in point is Shokouhifar et al. [34], which used swarm intelligence empowered three-stage ensemble deep learning to measure arm volume as a difference in arm volume is an indicator of the presence of and change in the status of lymphedema. Another prime example is Bao et al. [35], who utilized integrated stack-ensemble deep learning to enhance the preoperative prediction of prostate cancer Gleason grade.
Numerous investigations have delved into the utilization of CNNs for the detection and categorization of skin cancer over the past two decades [7,32,35,36]. As an illustration, Esteva et al. (2017) trained a CNN model on a dataset comprising more than 129,000 clinical images to identify skin cancer. This model achieved a sensitivity of 95%, a specificity of 85%, and classification accuracy of 91% for melanoma detection, a performance level on par with that of dermatologists [23]. Likewise, Tschandl et al. (2018) employed a CNN model for the classification of skin lesions into benign or malignant categories, achieving an Area Under the Curve (AUC) score of 0.94 on an independent test dataset [37]. Brinker et al. (2019) conducted an assessment of various CNN models to detect melanoma, the most lethal type of skin cancer. Their findings highlighted a substantial enhancement in classification accuracy through the application of transfer learning with pre-trained CNN models [29].
Transfer learning, which encompasses the utilization of pre-trained CNN models to extract features from medical images, has also demonstrated an enhancement in classification accuracy when applied to the detection of skin cancer [23,24,25]. Han et al. (2018) developed a transfer learning-based CNN model that achieved a classification accuracy of 89.1% for melanoma detection using dermoscopic images [38]. Haenssle et al. (2018) used transfer learning to fine-tune a pre-trained VGG-19 model on a dataset of dermoscopic images to detect melanoma, achieving an AUC of 0.86 on an independent test set [38]. Codella et al. (2018) similarly used transfer learning to fine-tune a pre-trained Inception-v3 model on a dataset of dermoscopic images, achieving an AUC of 0.93 on an independent test set [39].
While numerous studies have showcased the potential of deep learning techniques for skin cancer detection and classification [29,30,31,32,33], it is necessary to compare the performance of various CNN models dedicated to this task. Comparative studies have focused specifically on comparing the performance of different CNN models for skin cancer detection and classification [7,9,14,40,41]. For example, Codella et al. (2018) compared the performance of three different CNN models (Inception-V3, ResNet50, and DenseNet-121) for classifying skin lesions as either benign or malignant. The authors found that DenseNet-121 achieved the highest classification accuracy, with an area under the receiver operating characteristic curve (AUC-ROC) of 0.91 [39,42].
Kawahara et al. (2018) used a pre-trained CNN model to extract features from skin lesion images and then trained a Support Vector Machine (SVM) classifier to distinguish between benign and malignant lesions. The authors found that their SVM classifier achieved an accuracy of 83.6%, which outperformed several other classification methods [43]. Brinker et al. (2019) compared the performance of five different CNN models on a dataset of dermoscopic images, finding that the Inception-v3 model achieved the highest AUC of 0.90 [29]. Patel et al.’s (2021) study suggests that transfer learning can improve the performance of CNN models for skin cancer detection and classification. Their findings also indicate that the InceptionV3 model may be particularly effective for this task [44]. Zhang et al. (2020) similarly compared the performance of six different CNN models on a dataset of dermoscopic images, finding that the DenseNet-121 model achieved the highest AUC of 0.95 [45]. Zaidan et al. (2021) reviewed the use of deep learning techniques for skin cancer detection and classification in their article published in the Journal of Healthcare Engineering. They discussed the effectiveness of different CNN models and transfer learning methods and the need for further research on larger and more diverse datasets [33].
Recent research endeavors have additionally investigated the application of alternative deep learning methodologies, including Generative Adversarial Networks (GANs), in the context of skin cancer detection and classification [46,47,48]. An illustrative example is Bi et al. (2020), who employed a GAN to generate synthetic skin lesion images and subsequently harnessed a CNN for their classification into benign or malignant categories. The researchers reported that their GAN-CNN model achieved a classification accuracy of 83% [49].
More recently, Guergueb and Akhloufi (2022) [50] used ensemble learning to achieve a predictive accuracy of just under 98% for melanoma disease, the most dangerous form of skin cancer. The authors used only one image dataset for training and testing and so the generalizability of their model to other datasets and other forms of skin cancer is untested. Avanija et al. (2023) [51] recorded an accuracy rate of 86% on the ISIC Skin Cancer Dataset using an ensemble learning approach harnessing three deep learning algorithms, namely, VGG16, CapsNet, and ResUNet. Even more recently, Sethanan et al. (2023) [8] report a cancer detection rate of 99.799.7% and cancer classification rates of approximately 96%. Their ensemble model harnesses modified CNN architectures, refined image segmentation techniques, and an artificial multiple intelligence system algorithm for optimized decision fusion. The authors, however, note that further research is needed to establish the generalizability, robustness, and clinical applicability of their model.
The max voting ensemble technique is a powerful approach within ensemble learning [8,34,35]. It aggregates predictions from multiple models and selects the class with the highest frequency of votes as the final prediction. This technique leverages the wisdom of crowds to arrive at a more accurate and stable prediction [52,53]. In the context of skin cancer detection, the max voting ensemble technique presents an opportunity to harness the collective intelligence of deep learning models and enhance classification outcomes [54]. For example, the study by Kausar et al. (2021) introduces deep learning-based ensemble models, achieving accuracies up to 91.8% using an individual-based model. However, using ensemble techniques boosted accuracy to 98% and 98.6%. The proposed models outperformed recent approaches, offering significant potential for enhanced multiclass skin cancer classification [55].
In summary, deep learning techniques, particularly CNNs and transfer learning, have shown great promise in accurately detecting and classifying skin cancer from medical images. Numerous studies have showcased the efficacy of CNN models for skin cancer detection and classification, and comparative studies have shown that certain CNN models, such as DenseNet-121 and Xception, achieve higher accuracy than others [56]. Transfer learning has also been shown to improve classification accuracy for skin cancer detection. However, further research is needed to evaluate the performance of different CNN models and transfer learning methods for skin cancer detection and classification on larger and more diverse datasets.
Table 1 below provides an overview of recent research conducted on diverse skin cancer datasets.
Table 1. Recently conducted research on various skin cancer datasets.
Authors and Paper Dataset Model Published Year Performance
Gajera et al. [57] ISIC 2016, 2017, PH2, HAM10000 AlexNet, VGG16, VGG19 2023 Accuracy = 98.33%, F1 score = 96%
Alenezi et al. [58] ISIC 2017, HAM10000 deep residual network 2023 Accuracy = 96.97%
Inthiyaz et al. [59] Xiangya-Derm CNN 2023 AUC = 0.87
Alwakid et al. [60] HAM10000 CNN, ResNet50 2023 F1-score = 0.859 (CNN), 0.852 (ResNet50)
Alenezi et al. [61] ISIC 2019, 2020 ResNet-101 and SVM 2023 Accuracy = 96.15% (ISIC19), 97.15% (ISIC20)%
Abbas and Gul [62] ISIC 2020 NASNet 2022 Accuracy = 97.7%, F1-score = 0.97%
Abdar et al. [12] ISIC 2019 ResNet15V2, MobileNetV2 2021 Accuracy = 89%, F1-score = 0.91
Jain et al. [13] HAM10000 Xception, InceptionV3, VGG19, ResNet50, and MobileNet 2021 Accuracy = 90.48% (Xception)
Aljohani and Turki [14] ISIC 2019 Xception, DenseNet201, ResNet50V2, MobileNetV2, VGG16, VGG19, and GoogleNet 2022 Accuracy = 76.09%
Bechelli and Delhommelle [63] HAM10000 CNN, VGG16, Xception, ResNet50 2022 Accuracy = 88% (VGG16)
Demir et al. [64] ISIC archive ResNet101 and InceptionV3 2019 F1-score = 84.09% (ResNet101) and 87.42% (InceptionV3)
Rashid et al. [65] ISIC 2020 MobileNetV2 2022 accuracy = 98.20%
Reis et al. [66] HAM10000, ISIC 2019, 2020 InSiNet, U-Net 2022 Accuracy= 94.59% (HAM10000), 91.89% (ISIC2019), and 90.54% (ISIC2020)
Khan et al. [67] ISBI 16, 17, 18, PH2, HAM10000 ResNet101, DenseNet201 2021 Accuracy = 98.70% (PH2), Accuracy = 98.70% (HAM10000)
Khan et al. [68] ISBI 2018, A hybrid model 2021 Accuracy = 92.70%
Kaggle Compt. [69] ISIC 2018 Top 10 model Average 2020 Accuracy =86.7%
Gouda et al. [70] ISIC 2018 CNN 2022 Accuracy = 83.2%
In Table 1, it is evident that researchers have conducted experiments utilizing a diverse assortment of pre-trained deep learning models across multiple skin cancer datasets, with a particular emphasis on ISIC 16, 17, 18, 19, 20, PH2, and HAM10000. These investigations have yielded a spectrum of accuracy scores. Notably, all datasets except ISIC 2018: Task 1-2 have demonstrated accuracy ranging from 95% to 99%. Conversely, ISIC 2018 falls short of this range with an accuracy below 92%. 

This entry is adapted from the peer-reviewed paper 10.3390/diagnostics14010089

This entry is offline, you can click here to edit this entry!
Video Production Service