To address these limitations, xa novel framework based on deep learning (DL) is being introduced. The rationale behind this proposal is that the intricate patterns learned by DL models for successful predictions/classifications might also contain valuable insights into the relationship between the characteristics of medical images and the associated disease. The DL model operates independently of handcrafted features or human interventions, potentially overcoming the aforementioned constraints.
2. Explainable AI
XAI is a field of Artificial Intelligence (AI) that seeks to offer insights into black-box models and their predictions. Trust, performance, legal (regulation), and ethical considerations are some reasons researchers advocate for XAI
[5]. This is increasingly critical as AI adoption reaches domains like healthcare.
External XAI techniques might explain single predictions through text or visualizations, or delve into models comprehensively using examples, local changes, or transformations to simpler models. While text and visualization explanations offer a direct, human-understandable clarification typically for a specific prediction, utilizing examples grants a broader understanding of a model by showcasing similar examples and predictions to the prediction in question. This method, however, does not provide an immediate explanation for a particular prediction. Local explanations focus on a subset of the problem, aiming to elucidate within that restricted context. Finally, to achieve higher interpretability, one can either employ a mimic model, which is an interpretable model that emulates the black-box model’s behavior, or replace the black-box model altogether.
In
[7], the authors detail four techniques that elucidate image classifiers by modifying the input. The methods include the method of concomitant variation, the method of agreement, the method of difference, and the method of adjustment. Each method offers a unique approach and insight into how models interpret and classify images.
The technical procedure to retrieve visual explanations from an image classifier comprises two parts: (1) an attribution algorithm furnishing the data for the explanation and (2) a visualization employing that data to generate a human-understandable elucidation. Broadly, image classification’s attribution algorithms can be classified as either gradient-based methods or occlusion-based methods.
Visualizations represent the interpretations derived from the attribution methods mentioned earlier. However, there is no consensus in the literature regarding what constitutes a “good” explanation. While some believe an explanation should detail parts of an image contributing to its classification, others focus on resolution quality or the trade-offs involved. Indeed, as 2D visualizations cannot fully depict a model’s intricacy, clarity about the limitations and trade-offs is essential when using such explanations.
Other research on improved visualization argues that past studies have overly concentrated on the positive alterations in an input image without contemplating the negative impacts
[8][9]. Both perspectives are necessary for a comprehensive explanation, especially for AI adoption in sensitive areas.
3. Unsupervised Learning-Clustering
Clustering is an unsupervised ML technique where the objective is to discern groupings in unlabeled data. It has diverse applications, including anomaly detection, compression, or unveiling intriguing properties in data.
3.1. K-Means and X-Means
K-means is a straightforward clustering algorithm with a time complexity of
𝑂(𝑛) in big-O notation. The algorithm commences by initializing a centroid for each of the
K clusters. Various strategies exist for this initialization. One method is to select K random points from the dataset as the initial centroids, although this can make K-means sensitive to its initialization. To mitigate this, one can execute K-means multiple times. Another method, K-means++, has been proven to be a more robust initialization scheme, outdoing standard K-means in both accuracy and time
[10]. K-means++ selects initial centroids using a probability based on a point’s distance to the current centroids. Once initialized, each point is assigned to its closest centroid. The primary loop of the algorithm then adjusts the centroids toward the mean of the points linked to them, and points are reassigned to the nearest centroid.
A limitation of K-means is the necessity to predefine K, the number of clusters. If the number of clusters is unknown, one can employ X-means. This method involves running K-means algorithms with various K values. The most fitting number of clusters for a dataset can be determined by evaluating multiple clustering performance metrics.
3.2. Clustering Performance Evaluation
Clustering performance evaluation metrics can be divided into two main categories: those requiring labeled data and those that do not.
The Rand index measures the similarity between the labels that the cluster has assigned to data points and the ground truth labels. This metric differs from standard accuracy measures because, in clustering, the label of the cluster to which a data point is assigned may not match its true label. To accurately measure the performance of clustering, one must therefore account for permutations. The Rand index provides a score in the range [0, 1], indicating the number of matching pairs between the cluster labels and the ground truth. While it is highly interpretable, other methods must be employed when labels are not available.
The silhouette coefficient is a metric suitable for use when no labels are present. It produces a value that increases as clusters become more defined. “More defined” in this context means that the distance between points within a cluster is small, while the distance to other clusters is large. The silhouette coefficient yields a value in the range [−1,1][−1,1]: −1−1 indicates an incorrect clustering, while 1 signifies highly dense clusters that are well separated.
In contrast, the Davies–Bouldin index places less emphasis on the quality of clusters and more on their separation. A low Davies–Bouldin index suggests a significant degree of separation between clusters. Its value starts at 0, which represents the best possible separation, and has no upper bound.
4. Image-to-Image Translation
Image-to-image (I2I) translation refers to the process of learning to map from one image to another
[11]. Such a mapping could, for instance, transform a healthy image into one with pathological identifiers. Differences between the input and output images can then be analyzed to extract medical insights. For this purpose, generative adversarial networks (GANs) and variational autoencoders (VAEs) have been employed.
RegGAN has proven to be the most effective I2I solution for medical data
[12]. One challenge of I2I in the medical realm is the difficulty in finding aligned image pairs in real-world scenarios. To address this, the authors used magnetic resonance images of the brain and augmented them with varying levels of noise and synthetic misalignment through scaling and rotation. RegGAN surpassed previous state-of-the-art solutions for both aligned and unaligned pairs and across noise levels ranging from none to heavy.
In the realm of I2I translation, there are also initiatives leveraging newer architectures, such as Transformers. Specifically, the Swin transformer-based GAN has demonstrated promising results on medical data, even outperforming RegGAN on identical datasets
[13].
5. Data Mining Techniques
Data mining involves extracting valuable knowledge from large datasets by identifying pertinent patterns and relationships
[14]. Clustering is one such technique and is also employed here. While clustering has found application in diverse facets of medical knowledge discovery, its direct use in medical imaging remains relatively rare.
In
[15], the authors demonstrated that K-means clustering could identify subgroups of yet-to-be-treated patients. This approach unveiled four unique subgroups.
In another study
[16], a system was proposed for medical image retrieval. The process began by searching for an image within a known primary class and subsequently by the identified markers that had not been labeled previously. The authors showcased that by using clustering, previously unlabeled subclasses could be detected, facilitating the search for analogous images. This proved beneficial, enhancing a doctor’s diagnostic accuracy from 30% to 63%.
Other research focuses on how data mining techniques can offer insights not directly as new medical knowledge from AI, but rather to equip users with enriched information, allowing them to derive novel medical insights. In
[17], a visualization solution was proposed for practitioners, grounded in spectral clustering, to decipher information from 2D and 3D medical data. Although spectral clustering was central to their approach, they recognized that no single clustering method excels universally.
6. Explainable Artificial Intelligence
In the medical domain, XAI research predominantly serves ethical and legal imperatives, fostering trust and privacy, and revealing model biases
[18]. The deployment of XAI for medical knowledge discovery is less common, yet some recognize its untapped potential
[6].
In
[19], the authors illustrated a method to cluster images, assign groups of images importance scores, and subsequently obtain explanations regarding significant components across an entire class. This technique employs super-pixel segmentation to fragment the image. This procedure is replicated for all images in a class. The segments are then clustered, and the significance of each segment group is evaluated. This approach yields explanations highlighting crucial features across the class. Although their evaluation used a general dataset, it appears feasible to adapt this to the medical context. In such cases, this methodology could potentially expose medical insights by categorizing types of markers. This aligns with the objectives here, albeit through a different modality.
In
[20], the authors exemplified how XAI can be harnessed for medical knowledge discovery. Using a ResNet, they autonomously analyzed ECG data. Their model could predict the sex of a subject with 86% accuracy, a feat they claim is nearly unachievable for human cardiologists. To elucidate the model’s learned insights, they turned to XAI. They modified Grad-CAM, presenting a visual justification of the segments deemed crucial for the prediction. This process revealed what the authors termed as “new insights into electrophysiology”.