Cutaneous melanomas are diagnosed during skin examination. Clinically concerning lesions are identified based on features of asymmetry, border irregularity, color variation, increased diameter, and history of change (evolution) or frequently by dermoscopic features to improve diagnostic accuracy
[5]. Concerning lesions are removed by excisional or shave biopsy, or very large lesions are partially biopsied in areas most concerning for invasion for histological confirmation of diagnosis. The diagnosis of melanoma is confirmed using histological analysis of skin biopsies. Histopathology revealing an increased number of atypical melanocytes in the epidermis or dermis growing in a disorderly fashion may lead to a cancer diagnosis; however, histological diagnosis is not always clear-cut
[5]. Histological analysis of melanoma draws diagnostic and also prognostic conclusions about the disease. In addition to determining histological types of melanoma, pathologists also make observations that bear prognostic significance. The predictive histological features include the presence of ulceration, mitotic rate, lymphovascular invasion, microsatellites, neurotropism, and tumor-infiltrating lymphocytes
[6][7][8]. Despite many years of using these methods as the standard of care, there is still much room for improvement
[9][10]. In many cases, interobserver variability between pathologists is high
[10][11]. Additionally, these processes require substantial time and effort from trained pathologists. Analysis of clinical data using artificial intelligence has been shown to increase the accuracy of patient diagnosis and prognosis
[11][12] and has the long-term potential to lessen the current burden of analysis on pathologists.
2. Diagnostic Applications
The current standard risk assessment of pigmented lesions begins with a macroscopic and dermoscopic examination by a physician. Physicians look for several known clinical features of melanoma, including irregular borders, asymmetry, color, diameter, change in lesion over time, and comparison of the lesion to the patient's other nevus
[16]. Visual examination and dermoscopy allow physicians great accuracy in differentiating melanoma and nevi. ML algorithms have also shown great accuracy for melanoma vs. nevi differentiation based on clinical images
[17]. However, tissue biopsy is essential to achieve a formal diagnosis. Most deep-learning diagnostic applications for histological images are for the differentiation between melanoma and nevi
[18][19][20][21][22][23][24][25][26]. However, multiple studies show applications in the differentiation between melanoma, nevi, and normal skin
[27][28]; and differentiation between melanoma and nonmelanoma skin cancers
[29][30][31]. In addition, several studies showed deep-learning applications for the segmentation of whole tumor regions
[32][33][34][35] or individual diagnostic markers such as mitotic cells
[36][37], melanocytes
[38][39], and melanocytic nests
[40]. Several of these models were compared against the diagnostic accuracy of trained histopathologists, showing improved performance
[18][20][22][24].
The arrangement and location of melanocytic cells are essential factors for pathologists to consider when assessing the disease status of WSIs. However, visually, cells of melanocytic origin can be challenging to differentiate from surrounding keratinocytes, even to the trained eye. Multiple groups have developed programs to identify proliferative melanocytes, aiding both the discovery of melanocytes and information on the overall melanocyte growth patterns. Liu et al. developed a model to segment melanocytic proliferations. Using sparse annotations generated by the pathologist, this pipeline finetunes the segmented regions using a CNN model on tiles WSI regions with an overall accuracy of 92.7%
[39]. Kucharski et al. used a convolutional autoencoder neural network architecture to detect melanocytic nests. Slides were split into tiles, where individual tiles were classified as part or not part of a nest, eventually allowing for the segmentation of the nests
[40]. Andres et al. used random-forest classification to classify individual tiles as tumor regions based on color components and cell density. Individual cell nuclei are detected, and the probability of each nuclei pixel being a part of a mitotic nucleus is calculated, resulting in an overall prediction of whether a cell is in mitosis. They found a significant correlation between the number of mitoses detected by their program and the number of Ki67-positive cells seen in Ki67-stained tissue slides. Their model achieved 83% accuracy in correctly predicting mitotic cells
[36].
Wang et al. tested the efficacy of multiple CNN pre-trained models for the prediction of malignancy of each slide tile. Tile prediction was used to generate a heatmap, from which additional features were extracted and used in a random forest algorithm to classify WSIs. Final predictions of the model based on validation datasets were compared to those of seven pathologists. The model outperformed human pathologists, achieving an accuracy of 98.2%
[22]. Xie et al. additionally used a Grad-CAM method to reveal the logic behind the CNN and understand the impact of specific areas in the model. Using the Grad-CAM method and feature heatmaps revealed similarities between this group's model and accepted pathological features, ultimately leading to an overall accuracy of 93.3%
[23].
Multiple publications show the efficacy of multi-class support vector machine models for classifying skin WSI samples as melanoma, nevi, or normal skin
[27][28]. Lu et al. first developed a pipeline for cell segmentation and feature extraction. This pipeline first segments keratinocytes and melanocytes in the epidermis, afterward constructing spatial distribution and morphological features. Based on the most critical distribution and morphological characteristics, the final model achieved a classification accuracy of 90%
[28]. Xu et al. later expanded the model to first segment the epidermis and dermis from these images and analyzed epidermal and dermal features in parallel. This model observed similar epidermal features while performing dermal analysis focusing on textural and cytological features in those regions, achieving an improved accuracy of 95%
[27].
Spitzoid melanocytic tumors are a subset of melanocytic lesions that are particularly challenging to diagnose. Therefore, there is an acute need for improved diagnostic measures for these tumors. Using a weakly supervised CNN-based method, Amor et al. created a pipeline to identify tiles of tumor regions and then classify WSIs based on the output tiles. This group's model for ROI extraction achieved an accuracy of 92.31% and a classification model accuracy of 80%
[21].
Sankarapandian et al. further expanded the utility of WSIs for melanoma diagnosis by creating an algorithm to diagnose and classify sub-types. WSIs of nonmelanoma and melanocytic lesions of varying disease classification first undergo quality control, followed by feature extraction and hierarchical clustering. Initial clustering led to a binary classification of nonmelanoma vs. melanocytic images followed by further classification of melanoma as "high risk" (melanoma), "intermediate risk" (melanoma in situ or severe dysplasia), or "rest" consisting of nonmelanoma skin cancers, nevus, or mild-to-moderate dysplasia. On their two independent validation datasets, their model achieved an AUC of 0.95 and 0.82
[31].
Differentiation between melanoma and nonmelanoma skin cancers is typically performed by visual examination. Melanoma commonly appears as a darkly pigmented lesion, while basal and squamous cell carcinomas can display various visual characteristics, including lesion scaling, erythema, and hyperkeratosis
[41]. Ianni et al. used a dataset of over 15,000 WSI acquired from multiple institutions to ensure the reproducibility of their developed model. The acquired images were diagnosed as either basaloid, squamous, melanocytic or with no visible pathology or conclusive diagnosis. This group utilized multiple CNN models, each serving a unique purpose in their diagnostic pipeline. By testing the model's accuracy on images from three different labs, the model achieved an overall accuracy ranging from 90 to 98% in predicting the correct skin cancer sub-type
[29].
The plethora of algorithms shown to diagnose nevi and melanoma accurately on histology of biopsied samples indicate great promise for the future of automatic diagnoses using deep-learning technologies. However, much progress could still be made in this field. In addition, further advances in the field may later allow for deciphering more specific melanoma traits, including tumor melanoma subtypes and high-risk features.
3. Prognostic Applications
An accurate, individualized prognosis is essential for developing appropriate treatment and follow-up plans. Key melanoma prognostic factors include clinical, known histological, and molecular features; sentinel lymph node status; and radiologic imaging information about the distant and locoregional spread
[7]. Time-tested histological prognostic features are some of the best predictors of outcome. They include the presence of ulceration, the presence and rate of mitoses and depth of invasion, and the Breslow thickness
[6][8]. In addition, immunohistology features of melanoma and the overlying epidermis are also emerging as novel predictive biomarkers along with gene-expression profiles of the tumor and markers of mutation burden and specific features of driver mutations which allow targeted melanoma therapy
[7][8][42][43][44].
Digital histology images contain far more pixels than other commonly used medical imaging techniques, such as magnetic resonance imaging (MRI) and computerized tomography (CT)
[45]. However, there are limited histological biomarkers large enough to be observed by the human eye. Deep learning offers a path to access this hidden wealth of information in digital histology images. Kulkarni et al. developed a deep neural network to predict whether a patient would develop distant metastasis recurrence
[46]. This deep neural network uses a CNN to extract features followed by a recurrent neural network (RNN) to identify patterns, ultimately outputting a distant metastasis recurrence prediction. When tested on validation datasets, the models achieved AUCs of 0.905 and 0.88
[46].
The sentinel lymph node status is considered a critical prognostic factor of melanoma. However, this requires surgical excision of the first draining lymph node from the melanoma to provide a marker of overall nodal status. Despite being a robust prognostic indicator, the sentinel lymph node status used in combination with regional lymph node completion surgery has been found to have no benefit to disease-specific survival
[47][48]. Brinker et al. developed an artificial neural network to predict the sentinel lymph node status based on H&E-stained slides of primary melanoma tumors. WSIs were split into tiles, where cell detection classified cells as tumor cells, immune cells, or others. After classification, cell features described in Kulkarni et al.
[46] were extracted. Additionally, clinical features were implemented, including tumor thickness, ulceration, and patient age. Image features were extracted with a pre-trained CNN model. The total slide classification was determined based on the majority classification of tiles. Clinical characteristics were also implemented into the model, including tumor thickness, ulceration, and patient age. Overall, their most efficient model used a combination of image, clinical, and cell features. It achieved an AUROC of 61.8% for classification between positive and negative sentinel lymph node status on the test dataset
[49].
Targeted and immunotherapy of melanoma have revolutionized melanoma care. However, not all melanoma patients benefit from these therapies. Genomic testing of melanoma samples identifies tumors that will respond to targeted therapy, but immunotherapy response is harder to predict, and despite existing tools, novel markers for better patient selection for individualized treatment are needed
[50][51]. Multiple models have been created to predict the immunotherapy response using melanoma histology image features
[52][53]. Hu et al. predicted progression-free survival based on WSIs derived from melanoma patients who received anti-PD-1 monoclonal antibody monotherapy
[52]. Johannet et al. created a multivariable classifier to classify patients who received anti-PD-1 or anti-CTLA-4 monotherapy as having a high or low risk of cancer progression
[53]. This pipeline first used a segmentation classifier to distinguish between tumor, lymphocyte, and connective tissue slide tiles. They then implemented a response classifier to predict the response probability for each tile, ultimately leading to whole-slide classification based on the tile majority. Their final model achieved a ROC of 0.8–0.805 for the classification of progression-free survival after ICI treatment
[53].
The presence and compositions of tumor-infiltrating lymphocytes (TILs), lymphocytic cells that have migrated to the tumor, correlate with the disease progression and response to immunotherapies
[54]. The prognostic significance of TILs was initially somewhat controversial. However, recent evidence suggests that the absence of TILs is a poor prognostic factor, while the brisk presence of TILs is associated with better disease-free survival
[7][8]. There is also evidence that the quantity, localization, and phenotype of TILs are essential for predicting the response to immunotherapies and the risk of disease progression
[54]. Acs et al. developed an algorithm to recognize and segment TIL within WSIs and then calculate the frequency of these cells within each image
[55]. Automated TIL scoring was consistent with TIL scoring performed by a pathologist. Moore et al. then tested the ability of the automated TIL scores to predict patient outcomes
[56]. Separating patients by those who did or did not die of melanoma found a significant correlation between the TIL score and disease-specific survival. To show the ability of their model to enhance the currently used methods of melanoma prognosis prediction, they tested the efficacy of their model to predict the prognosis in combination with patient information on the tumor depth and ulceration status. Overall, they found that the parameters discovered by their model contributed significantly to the overall prediction.
Studies published by Chou et al. further validated those found by Acs et al., using a TIL percentage score to predict overall survival outcomes
[57]. Similar to previously described models, this model segmented regions of interest within the WSI, followed by the segmentation of various cell types, including TILs. Based on the well-known Clarke's grading system of TIL scoring, they found little difference in the probability of recurrence-free survival. However, when using a newly defined low and high TIL score, they found significant differences between recurrence-free survival and overall survival probability. They, therefore, propose that this quantification of TIL may be more efficient for clinical use than the currently used methods. Based on this group's model, Chou et al. further validated the ability of the model to interpret differences in recurrence-free and overall survival. Using a predeveloped neural network classifier that generates an automated TIL score in addition to human-based pathological analysis, this group sought to correlate automated TIL scoring with AJCC staging. They found that the percentage of TILs in the slide significantly improved the prediction of survival outcomes compared to Clarke's grading. Using a threshold score of 16.6% TIL, they found significant differences in RFS (
p = 0.00059) and OS (
p = 0.0022) between "high"- and "low"-TIL-scoring patients
[57].
BRAF mutations are common in melanoma
[58]. Since the advent of targeted therapies, the BRAF mutation status has provided essential clinical information
[59]. Kim et al. initially trained an algorithm to define melanoma vs. nonmelanoma regions. Focusing on only regions of melanoma, they then tested the efficacy of three published BRAF mutation prediction classifiers. To better understand how BRAF-mutated cells were distinguishable from the deep learning model, they performed pathomics analysis on these slides. They found that cells with BRAF mutations showed larger and rounder nuclei
[60]. In a later publication, they discovered that pixels located in the nuclei of cells were the most influential in predicting BRAF mutations. In their final prediction model, they combined clinical information, deep learning, and extracted nuclear features to predict BRAF mutation status in H&E WSIs of melanoma
[61].
4. Future of Deep Learning Applications for Cutaneous Melanoma
Accurate melanoma diagnosis and precise prognostication are crucial for establishing appropriate management and follow-up recommendations. Immunotherapies and targeted therapies have revolutionized melanoma therapy, but their severe and sometimes fatal side effects argue for improved personalization of oncology care. Unfortunately, tools are scarce to predict treatment response and select appropriate therapy. In recent years, AI has emerged as a powerful tool to aid melanoma diagnosis and prognostification. Moreover, as new therapeutic options become available, treatment planning becomes increasingly complex, AI clinical models may even assist in identifying patterns in response that may not be visible to physicians and scientists. Robust AI models are required to reach the full potential of ML-aided melanoma diagnosis and patient care. To make ML achieve this, large datasets for training algorithms and external validation datasets are essential for developing powerful models. Therefore, national or multinational consortia may aid in developing AI tools for melanoma diagnosis and prognostication. Researchers must make datasets and codes publicly available to foster high-quality team science and fight concerns about ML's black box effect. Machine learning and AI tools are destined to revolutionize melanoma diagnosis, prognostication, and personalized care by enhancing accuracy, providing tailored treatment options, and improving patient outcomes by integrating diverse datasets and developing robust predictive models.