The proposed YOLO-based approach for image segmentation is fast and efficient. It is also quite efficient in terms of the use of computing resources, which is of key importance considering the huge amounts of cardiological data that need to be processed. However, this may reduce its level of accuracy compared to more complex segmentation algorithms, which is crucial in the case of cardiac images. YOLO-based image segmentation may also lead to a reduction in spatial resolution in segmentation masks, especially for small or complex structures in radiological images.
2.2. Genetic Algorithms
The analysis of medical data can also be approached using metaheuristic methods such as Genetic Algorithms (GAs), Evolutionary Algorithms (EAs) in particular, and Artificial Immune Systems (AISs) that search the possible solution space based on mechanisms taken from the theory of evolution and natural immune systems. GAs can also be used to improve diagnosis as well as the selection of targeted therapy in the field of cardiology. Reddy et al.
[71] applied GAs to the diagnostics of early-stage heart disease, which has crucial implications in the selection of further therapy methods. For example, GAs allowed for the optimization of classification rules. As a consequence, the level of accuracy increased and the computational cost was reduced (due to the simplification of the selection process). GAs can also be applied to the determination of personalized parameters of the cardiomyocyte electrophysiology model
[76]. Here, the Cauchy mutation was applied. In most cases, GAs were used to limit the number of parameters that are then used as input to another AI-based algorithm, such as a Support Vector Machine (SVM)
[77][78]. Genetic Algorithms can effectively search for optimal segmentation solutions in the case of heart image segmentation, where anatomical structures may have different shapes. However, GAs may exhibit difficulties with complex limitations or domain-specific knowledge in cardiac image segmentation tasks. On the other hand, GAs can also be effective in the optimization of the input parameters to neural networks. They are inherently robust concerning noise and local optima. This is an important feature taking into account motion artifacts or imaging noise in cardiac image segmentation. A huge disadvantage of GAs is the cost of computing large search spaces or high-dimensional feature spaces, which is crucial, especially for real-time computations or in clinical settings (such as may occur in cardiology applications). Thus, finding the optimal parameter can be difficult and time-consuming.
2.3. Artificial Neural Networks
Artificial Neural Networks (ANNs) are networks whose structure and principle of operation are to some extent modeled on the functioning of fragments of the real nervous system (the brain)
[79][80]. This computational invention contributes to the development of medical imaging, especially in cardiology, where their design, inspired by the human brain, enables them to interpret complex patterns within medical data effectively. ANNs consist of layers composed of several neurons, which apply specific weights and biases to the inputs. These neurons utilize non-linear activation functions that enable the network to detect complex patterns and relationships that linear functions might overlook. The output layer plays a pivotal role in making predictions or classifications based on the analysis, such as identifying signs of heart disease, classifying different cardiac conditions, or determining the severity of a disorder
[81][82]. In cardiology, the ability to detect conditions accurately and at an early stage is of paramount importance, and the application of ANNs for the analysis of medical images is an important development in this area. Considering the high global prevalence of cardiovascular diseases, the application of ANNs in cardiac imaging may substantially improve diagnostic techniques
[83]. ANNs provide an efficient computational tool to detect structural abnormalities in heart tissues. They also play a vital role in assessing cardiac function, evaluating important metrics such as ejection fraction, and analysis of blood flow patterns, essential for diagnosing heart failure or valvular heart disease.
ANNs can automatically learn hierarchical features from raw image data without the need to manually extract features, which is beneficial for segmenting complex organs such as the heart. However, ANN application in the field of medical image processing requires converting two-dimensional images to one-dimensional vectors. This increases the number of parameters and increases the cost of calculation. However, as in the case of YOLO-based segmentation algorithms, an ANN-based approach also requires large and good-quality training data to provide high accuracy.
2.4. Convolutional Neural Networks
Another neural network that has been applied to medical image processing is the Convolutional Neural Network (CNN). As opposed to traditional neural networks such as ANNs, which typically process data in a straightforward, sequential manner, CNNs can discern spatial relationships within datasets. This is due to the way they are designed and constructed, intended as they are to maintain and interpret the spatial structure of input data, an attribute that is vital for the accurate assessment of medical images. For example, Roy et al.
[14] applied CNNs to cardiac image segmentation to diagnose coronary artery disease (CAD). CNNs were used to analyze 2D X-ray images, significantly enhancing image segmentation accuracy and setting new standards in medical image analysis. Similarly, as in Gao et al.
[84], Galea et al.
[85] proposed combining U-Net and DeepLabV3+ CNN architectures for the segmentation of cardiac images from smaller datasets. Tandon et al.
[86] applied CNNs in cardiology with a specific focus on cardiovascular imaging for patients with Repaired Tetralogy of Fallot (RTOF). A CNN originally designed for ventricular contouring was retrained and adapted to the complexities of RTOF. This enabled an increase in algorithm accuracy. In turn, Stough et al.
[87] developed a fully automatic method for segmenting heart substructures in 2D echocardiography images using CNNs that was validated against a robust dataset, and Sander and Išgum
[88] focused on enhancing the segmentation of cardiac structures in cardiac MRI. This method integrates automatic segmentation with an assessment of segmentation uncertainty to identify potential local failures. The measures of predictive uncertainty were calculated and trained by another CNN to detect local segmentation errors for potential expert correction. This approach combining automatic segmentation with manual correction of detected errors could significantly reduce the time required for expert segmentation.
In the context of cardiology, fully connected layers of CNNs are responsible for synthesizing information to perform critical analytical tasks. These include classifying different cardiac conditions, detecting anomalies such as irregularities in heart size or shape, and making predictive assessments based on a comprehensive analysis of cardiac structure and function. CNNs are particularly good at handling complex datasets from various imaging modalities in cardiology, including MRI, CT scans, and ultrasound
[89]. The strength of CNNs lies in their ability to handle high-dimensional data and to effectively capture the spatial structures within medical images in cardiology. This leads to more precise and comprehensive analyses of cardiac health. However, in the case of sparse or partial input data, their use is difficult and does not provide high prediction accuracy, while high segmentation accuracy is associated with high computational costs. Nor do CNNs take into account spatial relations in images which is important in the case of cardiology. To overcome this limitation, Capsule Networks (CNs) were introduced
[90]. Their output is in the form of vectors that enable some spatial relations to be saved. The disadvantage of this approach is the lack of verification on a large dataset.
2.5. Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are known for their ability to model long-term dependencies and are crucial for capturing the intricate details of cardiac structures. Unlike traditional feedforward neural networks that process inputs in a one-directional manner, RNNs are designed to handle sequences of data. This is achieved through their internal memory, which allows them to retain information from previous inputs and use it in the processing of new data
[91]. In the case of medical data in the form of echocardiography, and cardiac MRI segmentation, RNNs have shown promising performance
[11]. They also excel in handling the sequential and temporal aspects of both MRI and CT data, crucial for monitoring dynamic changes in cardiac tissues over time
[59]. In turn, Wahlang et al.
[12] combined RNNs and their variations in Long Short-Term Memory (LSTM) successfully in the segmentation and classification of 2D echo images, 3D Doppler images, and video graphic images. Wang and Zhang
[39] also considered the segmentation of the left ventricle wall in four-chamber view cardiac sequential images. RNN was applied to provide detailed information for the initial image, while LSTM to generate the segmentation result: this approach increases accuracy. Another RRN application in the field of cardiology was presented by Muraki et al.
[13]. Here, simple RNNs, LSTM, and other RNN variations (such as Gated Recurrent Units (GRU)) were successfully used to detect acute myocardial infarction (AMI) in echocardiography.
RNNs have proven to be well suited to managing the sequential and temporal characteristics inherent in MRI and CT data, a capability that is essential for accurately tracking the dynamic alterations in cardiac tissues due to the possibility of effective capturing of long-range non-linear dependencies, such as modeling the risk trajectory of heart failure
[92]. However, one limitation of RNNs is connected with vanishing or exploding gradients.
2.6. Spiking Neural Networks
Calculations related to the analysis of cardiac data are very time-consuming and involve a great deal of computing resources. One alternative that can potentially reduce computational cost could be Spiking Neural Networks (SNNs). Currently, SNNs are not yet as accurate in comparison to traditional neural networks: they have characteristics that are more similar to biological neurons
[93]. They may also be advantageous in wearable and implantable devices for their energy efficiency and real-time processing capabilities. This makes them ideal for continuous cardiac monitoring, as they require less frequent recharging or battery replacement, a significant benefit for devices like cardiac monitors and pacemakers. For example, Rana and Kim
[94] modify the synaptic weights such as to be binary. This operation provides a reduction in computational complexity and power consumption. This is crucial, especially in the context of wearable monitors where continuous monitoring is key but the constraints of power and computational resources are limiting factors. Their binarized SNN model may be a highly efficient alternative for ECG classification, setting a new standard in continuous cardiac health monitoring technologies.
SNNs, however, are more computationally efficient (connected to the high level of computational speed and real-time performance). As a consequence, SNNs consume less energy, which translates into better use of hardware resources. However, their learning algorithms require improvement (in terms of accuracy gains), in comparison, for example, to the accuracies achieved by the application of CNNs
[95]. In the case of SNNs, the requirement of increasingly powerful hardware is also of high importance. SNNs also have a significant limitation in practical applications due to the smaller number of available tools, libraries, and structures in comparison to other neural network types. SNNs also provide worse results in terms of accuracy compared to traditional approaches. To fully exploit the potential of SNNs, including detecting anomalies in biomedical signals and designing more detailed networks, the SNNs’ learning mechanisms/rules need to be improved.
2.7. Generative Adversarial Networks
Generative Adversarial Networks (GANs) are network architectures that consist of two core components: the generator and the discriminator. The generator shoulders the responsibility of creating data that faithfully emulates specific data (artificial data identical to real data) to cheat the discriminator. It initiates the process with an input of random noise, meticulously refining it through multiple layers of neural network architecture. Each layer integrated within the generator network fulfills a distinct role, harnessing techniques such as convolutional or fully connected layers. These layers operate cohesively to progressively metamorphose the initial noise input into an output that becomes increasingly indistinguishable from the target data. A discriminator is designed to distinguish artificial data (produced by a generator) from real data based on small nuances. Thus, the core concept of this solution is to train two networks that compete with each other. As a consequence, they are expected to produce more authentic data
[96]. GANs seem to be promising computational tools to elevate patient care and improve clinical outcomes, in particular in the field of cardiology. First, the most important GAN application field is CVD diagnosis
[58]. Retinal fundus images were used as input to the network. This approach led to the analysis of microstructural alterations within retinal blood vessels to pinpoint pivotal risk factors associated with CVD, such as Hypertensive Retinopathy (HR) and Cholesterol-Embolization Syndrome (CES). Moreover, the incorporation of a retrained ImageNet model for customized image classification further bolstered predictive accuracy.
GANs have shown exceptional proficiency in handling complex and varied cardiac datasets. They generate highly realistic images, aiding training and research, particularly where access to real patient data is limited. GANs are instrumental in enlarging existing datasets and creating diverse and extensive data for training more accurate and robust diagnostic models. In addition to image generation, GANs are adept at image-to-image translation tasks, a significant feature in medical imaging
[97]. They can transform MRI images into CT scans, offering different perspectives of the same anatomical structure without needing multiple imaging modalities. This is particularly beneficial in scenarios where certain imaging equipment might be unavailable. However, the main disadvantages of GANs are the complex training needed that does not necessarily lead to hoped-for results, a tendency to overfit, and high computational costs. Moreover, GANs are difficult to interpret, which is of key importance in medicine, especially in cardiology.
2.8. Graph Neural Networks
If the data format is approached differently, as in non-Euclidean space in the form of graphs, it can be understood in terms of vertices (i.e., objects). Then, the concept of Graph Neural Networks (GNNs) can be applied
[98]. All relations in this type of neural network are expressed as those between nodes and edges of the graph. These networks are designed to handle graph data that form a critical aspect in medical fields, especially when the intricate relationships and connections between data points are essential for accurate diagnosis and health condition analysis. This principle of operation is useful in medical imaging, especially in neuroimaging and molecular imaging, where understanding complex relationships is crucial
[51][99]. In the field of cardiology, GNNs have been effectively employed in several key areas. They have been used in the classification of polar maps in cardiac perfusion imaging, a critical technique for assessing heart muscle activity and blood flow. Another significant application of GNNs in cardiology is the estimation of left ventricular ejection fraction in echocardiography. This measurement is vital for evaluating heart health, specifically in assessing the volume of blood the left ventricle pumps out with each contraction
[63]. This allows for more accurate analyses through an understanding of the intricate graph structures of the heart’s imagery. GNNs are also being utilized in analyzing CT/MRI scans. This approach can also be used to interpret the relationships and structures within the scan, providing detailed insights into various conditions and helping in diagnosis and treatment planning
[65].
GNNs provide a powerful tool for understanding and interpreting complex data structures, such as those found in medical image processing. One of the key strengths of GNNs is their adaptability to varying input sizes and structures, an essential feature in medical imaging where patient data can greatly differ. The architecture of GNNs is tailored to process and interpret graph-structured data, making it a powerful tool in areas such as medical image processing where data often forms complex networks. This specialized structure of GNNs sets them apart in their ability to handle data that is inherently interconnected, such as neurological networks or molecular structures. It is also worth stressing that GNNs were created for tasks that cannot be effectively solved by other types of networks based on input data in Euclidean space. However, GNNs are difficult to interpret. On the other hand, computational cost is also a crucial parameter. Here, QNNs may provide some insight, while the GA can effectively help in the optimization of the input parameters to neural networks.
2.9. Transformers
One further type of neural network that has recently come into focus in the field of medicine concerns transformers. These learn rules based on the context and tracking the relations between the data. Originally, they were networks used for natural language processing (NLP). Their effectiveness in these tasks resulted in the development of transformers such as the Detection Transformer (DETR) for tasks related to vision analysis
[100], the Swin-Transformer
[101], the Vision Transformer (ViT)
[102], and the Data-Efficient Image Transformer (DeiT)
[102]. The DETR is dedicated to object detection which also includes manual analytical processes, and it uses CNN to learn 2D representations of the input data (images). In turn, the ViT converts input to a series of fixed-size non-overlapping patches and treats them as a token. Each of them encodes the spatial position of each part of the image to provide spatial information, while the spatial information of the pixels is lost during tokenization. However, ViTs require large training datasets. On the other hand, DeiTs also provide high accuracy in the case of small training datasets, while Swin-Transformers allow the cost of calculations to be reduced. They process an image divided into overlapping areas showing tokens at multiple scales with a hierarchical structure using a shifted window (local self-attention). The transformer principle of operation is based on the self-attention mechanism. This enables the network to decide on the importance of different parts of the input data for future prediction (i.e., weight). This may be beneficial for the evaluation of the relationships between different regions in medical images.
The application of transformer networks allows for a deeper understanding of cardiac function, which aids in refining diagnostic methods and improving treatment strategies. For example, Jungiewicz et al. focused on stenosis detection in coronary arteries, comparing different variants of the Inception Network with the ViT
[36]. They analyzed small fragments from coronary angiography videos, highlighting the role of dataset configuration in model performance. A key innovation in their approach is the use of Sharpness-Aware Minimization (SAM) alongside Vision Transformers (VTs), which enhances the accuracy and reliability of stenosis detection. They also employed Explainable AI techniques to understand the differences in classification performance between the models. Their findings indicate that while Convolutional Neural Networks generally outperform transformer-based architectures, the gap narrows significantly with the addition of SAM to VTs. In some measures, the SAM-VT model even surpasses other models. It turned out that ViT can effectively be applied to diagnose coronary angiography. Zhang et al.
[37] present a Topological Transformer Network (TTN) for automated coronary artery branch labeling in Cardiac CT Angiography (CCTA). The TTN, inspired by the success of transformers in sequence data analysis, treats vessel branch labeling as a sequence labeling learning problem. It introduces a unique topological encoding to represent spatial positions of vessel segments within the arterial tree, enhancing classification accuracy. The network also includes a segment-depth loss function to address the class imbalance between primary and secondary branches. The effectiveness of a TTN is demonstrated in CCTA scans, where it achieves unprecedented results, outperforming existing methods in overall branch labeling and side branch identification. TTNs mark a departure from traditional methods, representing the first transformer-based vessel branch labeling method in the field. The integration of this method into computer-aided diagnosis systems can enhance the generation of cardiovascular disease diagnosis reports, thereby improving patient outcomes in cardiac care.
This approach significantly enhances the detection and analysis of myocardial ischemia and infarction by tracking wall-motion abnormalities in the left ventricle. The core innovation is the integration of a co-attention mechanism within the Spatial Transformer Network (STN), which improves feature extraction between frames for smoother motion fields and enhanced interpretability in noisy 3D echocardiography images. Additionally, a novel temporal regularization term guides the motion of the left ventricle, producing smooth and realistic cardiac displacement paths. The CA-STN outperforms traditional methods that rely on heavy regularization functions, marking a new standard in cardiac motion tracking. Strain analysis using the Co-Attention STNs aligns with matched SPECT perfusion maps, illustrating the clinical utility of 3D echocardiography for localizing and quantifying myocardial strain following ischemic injury. This study contributes a novel tool for cardiac imaging and opens new possibilities for early detection and interventions in myocardial injuries.
Thus, an approach based on transformers in cardiological data segmentations offers advantages such as global context modeling, parallel processing, attention mechanisms, transfer learning, and interpretability for cardiac image segmentation. However, transformers process the input data sequentially, which may cause some important information to be missed and the segmentation performed (especially for tasks requiring precise localization of anatomical structures in heart images) to be inaccurate. Like CNN and the YOLO algorithm, this approach requires a large amount of good-quality data and the involvement of significant computational resources. Careful hyperparameter tuning and regularization techniques can overcome this disadvantage, but potentially increase the complexity of the training process.
2.10. Quantum Neural Networks
Recently, some work has also been devoted to the development of quantum neural networks (QNNs) that are based on the idea of quantum mechanics
[103][104]. These may have huge potential to speed up calculations and reduce the computational costs associated with them. This approach can be developed in two ways related to the segmentation of medical images. The first is the use of quantum circuits to train classical neural networks, and the second is the design and training of quantum networks, as proposed by Mathur et al.
[81]. Indeed, Shahwar et al.
[105] showed the potential of QNNs in the classification of Alzheimer’s detection, and Ullah et al.
[20] proposed a quantum version of the Fully Convolutional Neural Network (FCNN) as applied to a challenge that concerned the classification of ischemic heart disease. This allowed for a prediction accuracy of over 80 percent. However, the approach based on quantum neural networks requires further improvement. When it comes to interventional practice, QNNs have the potential for stenosis detection in X-ray coronary angiography
[106], and they can be also applied to selecting medicines for patients with high accuracy
[107][108]. Thus, QNNs may also provide some insight into the reduction in computational cost.
3. Evaluation Metrics in Medical Image Segmentation
Artificial Intelligence has the chance to become a high-precision tool in medicine. However, there are certain technical risks (TERs) connected with the application of AI in clinical and educational practice, including algorithm performance, legal regulation, and safety. For example, it is known that small, even imperceptible changes in the training dataset can drastically change the results of predictions, which in medicine can have very serious consequences and influence learning. The key to the evaluation of AI adaptability is to use an appropriate metric to assess the correctness and accuracy of different kinds of forecasts including clinical prognoses and for this to be understood by users
[109]. For example, overfitting between training and testing datasets will reduce the accuracy of the algorithm. Other crucial factors that influence the qualitative efficiency of the AI-based algorithm’s dataset include data availability issues. However, even if developers do not have sufficient quantity and quality of data, cross-validation can be applied
[110]. This procedure helps avoid overfitting by the selection of a subset. Thus, the choice of a proper evaluation metric depends on the specific task type. The binary classifier Dice coefficient (also called the Sørensen–Dice index) and the Index of Union (IoU) are most commonly used in medical image segmentation metrics. However, in the field of cardiology, accuracy is of particular concern.
Moreover, an important element in improving the effectiveness of cardiology data segmentation is the collection of as much reliable, good-quality data as possible while keeping class balance in mind. This procedure should take into account input data diversity that helps AI models better generalize unseen cases while their reliability is improved. It is also necessary to provide diverse and representative input data whenever possible, which can help mitigate bias in AI-based algorithms. Another issue related to data is the application of the open data policy following UNESCO guidelines (especially for scientific applications, and research) so that more efficient AI algorithms can be developed in the area of cardiology. Moreover, compliance with ethical and bioethical standards in the collection, storage, and use of medical data is essential for the development of reliable AI systems in cardiology. As a consequence, the establishment of standards for the quality, integrity, and interoperability of cardiology data used in AI applications in cardiology as well as the development of the protocols for the validation and regulation of AI-based algorithms is of high importance. It is also necessary to develop guidelines on how to integrate artificial intelligence technologies into cardiology workflows as well as strategies for managing risks associated with the implementation of AI-based technologies in cardiology. Finally, it should be the responsibility of the cardiology community to ensure the control of results and feedback loops in the implementation of mechanisms for monitoring the performance of AI algorithms in cardiology and the collection of feedback from clinicians and patients.