Multimodal Medical Image Fusion Techniques

Multimodal Medical Image Fusion Techniques: Comparison

Please note this is a comparison between Version 2 by Jason Zhu and Version 1 by kareem ahmed.

Image fusion has become one of the most promising fields in image processing since it plays an essential role in different applications, such as medical diagnosis and clarification of medical images. Multimodal Medical Image Fusion (MMIF) enhances the quality of medical images by combining two or more medical images from different modalities to obtain an improved fused image that is clearer than the original ones. Choosing the best MMIF technique which produces the best quality is one of the important problems in the assessment of image fusion techniques. There are several image modalities, such as Computed Tomography (CT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), and Single Photon Emission Computed Tomography (SPECT). Medical image fusion techniques are categorized into six main categories: spatial domain, transform fusion, fuzzy logic, morphological methods, and sparse representation methods.

image fusion
image modality
multi-scale decomposition

1. Medical Imaging Modalities

Each imaging modality has unique characteristics and properties which facilitate the study of specific human organs, illness, diagnosis, patient, and follow-up therapies. Microscopy, 3D reconstruction, visible photography light, radiology, and printed signals (waves) are some examples of imaging modalities [19]^[1]. The advancement of medical diagnosis is thanks to the recent progress in medical imaging capturing and enhancement. Medical imaging can be divided into structural systems and functional systems. All these types can be used to determine the location of the lesion. Functional and structural data from medical imaging can be combined to generate more meaningful information. In the treatment of the same human organ, medical image fusion is crucial since it enables more accurate disease monitoring and analysis.

1.1. Structural Systems

Structural images such as MRI, CT, X-rays, and Ultrasound (US) provide high-resolution images with anatomical information. Specifically, CT can better distinguish tissues with different densities, such as bones and blood vessels. On the other hand, MRI can clearly show different soft tissues rather than bones.

1.1.1. Magnetic Resonance Image (MRI)

Magnetic field and radio transform signals are used in the medical imaging procedure, which is known as magnetic resonance imaging. It’s used to produce images of the nearest location to the illness, various organic functions, and anatomical structures. The essential feature of MRI is that it uses magnetic signals to create slices that mimic the human body and provides details on diseased soft tissues.

1.1.2. Computed Tomography (CT)

CT is one of the main modalities included for making image fusion. A thin cross-segment can be seen using the CT approach, which is based on detecting X-ray weakening. CT is one of the major non-invasive diagnostic methods in contemporary medicine. The advantages of CT images are they are introduced with fewer warps and provide details concerning dense construction, such as bones and show a higher ability to see little differences in tissue construction.

1.1.3. X-rays

X-rays were utilized to create “shadow grams” of the human body. Radiography is the use of X-rays to visualize inside organs. Today, radiation is not typically recorded, but its force is measured and then converted to an image. As a result, the image’s more subdued features become more obvious. The main task of X-rays is to identify bone anomalies and fractures in the human body. Mammography is a breast cancer assessment technique that uses X-rays as its primary imaging source. Ultrasound-X rays and Vibroacoustography with X-ray mammography are two examples of current modalities that combine X-ray information with some of them.

1.1.4. Ultrasound (US)

Ultrasound imaging relies on low-recurrence vibrations induced in the body. Because of the radiation power of Vibroacoustography (VA), a different imaging technique made possible by ultrasound-animated acoustic emission is used in conjunction with mammography to improve the detection of breast cancer. In order to study retrieved human tissues such as the liver, breast, prostate, and arteries, ultrasound imaging is being tested as a non-invasive imaging tool. By determining changes in the mechanical response to vibration in an excited state, the depth and thickness of the objects are not recorded by X-ray mammography. However, tissue thickness does not interfere with ultrasound imaging. Applications for ultrasound images include improving mass lesions and breast development analysis. More auxiliary and demonstrative data may be observed in ultrasound and X-ray mammography by combining images from two distinct imaging modalities, ultrasound and X-ray, using either pixel-based or color-based fusion techniques.

1.2. Functional Systems

Functional images such as PET and SPECT provide functional information with low spatial resolution.

1.2.1. Positron Emission Tomography (PET)

Low-recurrence vibrations are necessary for ultrasound imaging. A crucial component of atomic drug imaging is PET. It’s a non-invasive imaging technique that provides a representation and evaluation of a preselected tracer’s digestion. The essential characteristics of PET images are that they provide important information about the human brain, allow for the recording of changes in the solid cerebrum’s movement, and provide signs of various disorders. For imaging any area of the body, such as for whole-body cancer diagnoses, PET has emerged as one of the most frequently utilized clinical technologies. The PET image’s exceptional sensitivity is the greatest benefit. PET-CT, MRI-PET, MRI-CT-PET, and MRI-SPECTPET are some of the contemporary modalities that use image fusion techniques that integrate PET information in fusion.

1.2.2. Single Photon Emission Computed Tomography (SPECT)

Gamma rays are commonly utilized in the atomic pharmaceutical tomographic imaging technique known as SPECT to evaluate blood flow to tissues and organs. The internal organs’ functionality is examined using SPECT scan. It provides actual 3D data. Slices through the body are frequently used to display this data. One of the most popular scans for tissues outside the brain where tissue placement is very varied is the SPECT scan. SPECT-CT and MR-SPECT are two examples of modalities that use SPECT information infusion in conjunction with some of the existing image fusion techniques.

2. Fusion Steps

The multimodal image fusion methods attempt to increase quality and accuracy without changing the complementary information of the images by integrating many images from one or various imaging modalities. Several medical image modalities exist, such as MRI, CT, PET, US, and SPECT. MRI, US, and CT modalities deliver images with anatomical information about the body with high spatial resolution. PET and SPECT provide images with functional information about the body, such as blood flow and movement of soft tissue, although having low spatial resolution. The functional image will be merged with the structural image to yield a multimodal image containing better information for health specialists to diagnose diseases.

Image fusion is the process that aims to produce a more representative image by merging the input images with each other. Two images are geometrically aligned using medical image registration, and then an image fusion technique is used to combine the two input source images to create a new image with additional and complementary information. During the image fusion process, two requirements must be satisfied: (1) the fused image must have all relevant medical information that was present in the input images, and (2) it must not contain any additional information that was not present in the input images. Fusion can be applied to multimodal images widely used in medicine, multi-focus images that are usually obtained from the same modality, and multi-sensor images taken from various sources of imaging modalities.

In the multimodal fusion process, first, the researcher determines the body organ of interest. Second, select two or more imaging modalities to perform image fusion using the appropriate fusion algorithm. It requires performance metrics to validate the fusion algorithm. In the final step, the fused image contains more information than the input images about the scanned area of the body organ.

There are five steps as follows: (1) Image Registration: the input source image is registered by mapping it with the reference image to match the corresponding images, (2) Image Decomposition: the source images are initially divided into smaller images using decomposition algorithms, and fusion algorithms such as Intensity–Hue–Saturation (IHS), pyramid, distinctive wavelet, Non-Subsampled Contourlet Transform (NSCT), shearlet transform, Sparse Representation (SR), and others are applied to merge multiple features from these sub-images [22]^[2]. (3) Fusion Rules: fusion algorithms such as fuzzy logic, Human Visual System (HVS) fusion, Artificial Neural Networks (ANNs), Principal Component Analysis (PCA), Mutual Information (MI) fusion, and Pulse-Coupled Neural Networks (PCNNs) are used to extract critical information and several features from sub-images which helpful in the following processes, (4) Image Reconstruction: in this step, the fused image is reconstructed. Image construction is the process of assembling the sub-images using an inverse algorithm, and (5) Image Quality Evaluation Methods: the image quality assessment is the last step in evaluating the quality of the fusion result by using both subjective and objective assessment measures. The radiologists are asked to assess the fusion outcome subjectively [23]^[3].

3. Image Fusion Techniques

Image fusion combines images from many modalities to produce results that are more accurate, comprehensive, and may be more easily interpreted. Combining multimodal images has several benefits, such as precise geometric adjustments, completing data for improved categorization, improving characteristics for investigations, and so forth. In several fields of study, including computer vision, multimedia analysis, medicinal research, and material sciences, image fusion is used extensively, and its effectiveness has been demonstrated. Registration is viewed as an optimization problem that is utilized to take advantage of similarities while also lowering costs. Image registration is a technique for aligning the subsequent aspects of multiple images regarding a reference image. Multiple source images are utilized for registration in which the original image serves as a reference image, and the original images are aligned using the reference image. The important features of registered images are extracted in feature extraction to generate various feature maps.

3.1. Spatial Domain Fusion Techniques

Early research has focused heavily on spatial domain-based medical image fusion techniques. In this type, the fusion rules are applied directly to pixels of source images, i.e., the actual values of pixels in the source images are combined to produce the fused image.

Spatial domain approaches work on the pixel level of the image, which is a fundamental pixel-level strategy. The generated images have less spatial distortion and a lower signal-to-noise ratio (SNR) since it is applied to the original images. It works by manipulating the values of an image’s pixels to achieve the desired effects. PCA, IHS, Brovey, High Pass Filtering Techniques, ICA, Simple Maximum, Simple Average, Weighted Average, and gradient filtering are some spatial domain methods. However, spatial domain techniques produce spectral and spatial distortion in the final fused image, which is viewed as a detriment to the fusion process.

Changtao et al. [27]^[4] proposed a methodology that enhances the quality of the fused images by combining the benefits of his and PCA fusion techniques. They compare their proposed approach with other fusion methods such as PCA, Brovey, and discrete wavelet transform (DWT). Visual and quantitative analysis shows that their suggested algorithm greatly enhances the fusion quality. Bashir et al. [20]^[5] introduced a PCA and Stationary Wavelet Transform (SWT) based model that was evaluated using a range of medical images. Results show that PCA appears to give better performance when the input images have different contrast and brightness levels in multimodal fusion. SWT appears to perform better when the input images are multimodal and multi-sensor images. There are five different sets of images: X-ray, MRI, CT, satellite, and stereo images. In order to determine whether the approach performs better for the type of imagery, several evaluation performance matrices were used for this set of images. Depoian et al. [28]^[6] proposed a unique approach to obtaining better image fusion by combining PCA with a neural network. In comparison to conventional weighted fusion approaches, the employment of an auto-encoder neural network that is used to combine this information leads to better degree results in data visualization. Rehal et al. [29]^[7] suggested a new method that used a hybrid fast Fourier transform to extract features and intensity from a group of images. They used the gray wolf optimization technique in the hue saturation to obtain the ideal result. The proposed methodology shows better results than the conventional one that was employed in previous research according to testing of the method.

3.2. Transform Fusion Techniques (Multi-Scale Decomposition Methods)

Recently, due to the spectral and spatial distortion in the fused image by using the spatial domain, many researchers have turned their research focus to the study of the transform domain for a better fusion effect. The transform domain approach has transformed the input images using a Fourier transform to obtain the low-frequency coefficient and high-frequency coefficient. The transform quantities are then subjected to the fusion process, which is then followed by an inverse transformation to generate the fused image within the final form. Transform domain techniques of image fusion are highly effective in handling spatial distortion. However, it is difficult to extend its one-dimensional property to two dimensions or several dimensions. A highly helpful technique for evaluating images from remote sensing, medical imaging, etc., is multi-resolution analysis. Discrete wavelet transform is now becoming a key technique for fusion. The transform domain is based on the multi-scale-based transform. The multi-scale-based transform fusion method is classified into three steps: decomposition, fusion, and reconstruction.

The techniques used in the transform domain are divided into pyramidal fusion techniques, wavelet fusion techniques, and multi-scale decomposition techniques. Firstly, the most common methods used in pyramidal fusion techniques are the Laplacian pyramid and the morphological pyramid, Ratio Pyramids, and Gaussian Pyramids [30]^[8]. Liu et al. [31]^[9] proposed an approach that combined Laplacian Pyramid (LP) and Convolutional Sparse Representation (CSR). Each set of pre-registered computed tomography images and magnetic resonance images performs the LP transform to produce its detail layers and base layers. The base layer is then combined using a CSR-based method, and the detail layers are combined using the well-known “max absolute” criteria. By applying the inverse LP transform to the combined base layer and detail layers, the fused image is finally rebuilt. Their approach has the advantages of being able to properly extract the texture detail information from the source images and maintaining the overall contrast of the combined image. Experimental results show how superior their suggested approach is. Zhu et al. [32]^[10] proposed a novel multi-modality medical image fusion method based on phase congruency and local Laplacian energy. On pairs of medical image sources, the non-subsampled contourlet transform is used to separate the source images into high-pass and low-pass sub-bands. A phase congruency-based fusion rule that integrates the high-pass sub-bands can improve the detailed characteristics of the fused image for medical diagnosis. Local Laplacian energy-based fusion rule is suggested for low-pass sub-bands. The weighted local energy and the weighted sum of the Laplacian coefficients reflect the structured information and specific characteristics of source image pairings, respectively, and make up the local Laplacian energy. As a result, the proposed fusion rule can incorporate two crucial elements for the fusion of low-pass sub-bands at the same time. In order to create the fused image, the combined high-pass and low-pass sub-bands are inversely converted. Three kinds of multi-modality medical image pairings are employed in the comparison tests to test the efficacy of the suggested approach. The outcomes of the experiment demonstrate that the suggested approach performs competitively in terms of both image quantity and processing expenses.

Secondly, wavelet fusion techniques include the following type Discrete Wavelet Transform (DWT), Stationary Wavelet Transform (SWT), Redundant Wavelet Transform (RWT) Dual-Tree Complex Wavelet Transforms (DT CWT). Most published research on multimodal medical image fusion algorithms depends on DWT [33]^[11]. DWT creates various input frequency signals to keep stable output perfect rank in the time domain and frequency domain which leads to maintaining the specific information of the image [34]^[12]. It has a good visual and quantitative fusion effect and overcomes PCA’s drawbacks. MRI and PET image fusion is used in DWT-based fusion methods and also used in others [35,36,37]^[13][14][15]. Bhavana et al. [35]^[13] proposed a novel fusion method using MRI and PET brain images on DWT without any reducing anatomical information and with less color distortion. A novel fusion method of DWT is different from the usual DWT fusion method, where the proposed method applied wavelet decomposition with four stages for low- and high-activity regions, respectively. Cheng et al. [37]^[15] produce a good fusion image of PET/CT, which discover and locates the disease region in the pancreatic gland using wavelet transform by a weighted fusion-based medical image algorithm. Georgieva et al. [38]^[16] introduce a brief overview of the benefits and current directions of multidimensional wavelet transform-based medical image processing techniques. They also introduce how it might be combined with other approaches that treat each coefficient set as a tensor. The use of computers in different medical processing methods, including noise reduction, segmentation, classification, and medical image fusion is made simpler by the fact that wavelet tensor modifications have only one parameter to be tweaked, as opposed to the standard wavelet decompositions. Due to the uniqueness of medical imaging data, these approaches have not yet been widely applied in research and practical applications despite their advantages. The latest recommendations in this area concern the selection of suitable techniques and algorithms to enhance their advantages in accordance with the uniqueness of various medical items in the images.

Wang et al. [39]^[17] Presented a multimodal color medical image fusion algorithm based on the discrete cosine transform in geometric algebra (GA DCT). The GADCT algorithm combines the characteristics of GA, which represents the multi-vector signal as a whole, to increase the quality of the fusion image and reduce the number of complex operations associated with encoding and decoding.

Finally, multi-scale decomposition techniques include Non-Subsampled Contourlet Transform (NSCT), Non-Subsampled Shearlet Transform (NSST), and Pulse-Coupled Neural Network (PCNN) techniques. The following sections will focus on the most common techniques used in Multi-scale Decomposition techniques.

3.2.1. Non-Subsampled Contourlet Transform (NSCT)

Do et al. [40]^[18] created the contourlet transform with the intention of capturing the inherent geometrical features of an image and enhancing the isotropic wavelet property. The decomposition consists of two filtering steps because it is a directional multi-resolution transform. First, the Laplacian pyramid is applied to selected point discontinuities. Second, performing a local directional decomposition through a directional filter bank to create a linear structure linking these discontinuities. A contourlet transform is a useful tool for selecting intrinsic contours rather than curvelets or ridgelets because of the set of basis oriented along various scales and orientations.

Li et al. [41]^[19] created a unique fusion technique for multi-modality medical images by combining it with NSCT. They use an improved innovative sum-modified Laplacian (INSML) feature in which the complementary information of multi-modality images is retrieved and utilized in the fusion rules for the low-transform NSCT coefficients. Additionally, the WLE-INSML features are used to extract the high-transform NSCT coefficients and build the fusion rules for these coefficients. They assess their suggested fusion approach by using an open dataset generated from twelve pairs of multi-modality medical images.

Li et al. [42]^[20] proposed an MMIF method that combines the advantages of the NSCT. They proposed fuzzy entropy to improve the quality of target recognition and the accuracy of fused images and provide the basis for clinical diagnosis. The image is divided into high- and low-frequency subbands through NSCT. According to the different features of the high- and low-transform components, the fusion rules are adopted. It’s necessary to calculate the membership degree of low-frequency coefficients. In order to guide the fusion of coefficients that preserve image details, fuzzy entropy is calculated and subsequently used. Maximizing regional energy is used to fuse high-frequency components. The transformation is inversed to obtain the final fused image. Based on the subjective and objective assessment criteria, experimental results show that their approach provides a satisfactory fusion effect. Additionally, this technique may successfully preserve the features of the fused image while obtaining excellent average gradient, standard deviation, and edge preservation. The outcomes of the suggested methodology can serve as an efficient guide for clinicians to evaluate patient conditions.

Alseelawi et al. [43]^[21] proposed an appeached hybrid medical image fusion using wavelet and curvelet transform with multi-resolution processing. They proposed an algorithm for enhancing the fused image quality by combining wavelet and curvelet transform techniques after the decomposition stage. They use a sub-band coding algorithm instead of curvelet fusion and the wavelet transform fusion algorithm, which leads to this technique being more efficient.

Xia et al. [44]^[22] proposed a medical image fusion technique by combining a pulse-coupling neural network and sparse representation. First, the NSCT transform is used to break down the source image into low and high-frequency sub-band coefficients. Second, the low-frequency sub-band coefficients are trained using the K Singular Value Decomposition (K-SVD) method to obtain the over-complete dictionary D. It is sparsed using the Orthogonal Matching Pursuit (OMP) algorithm to complete the fusion of the low-frequency sub-band sparse coefficients. The spatial transform of the high transform sub-band coefficients is then used to excite the Pulse Coupling Neural Network (PCNN), and the fusion coefficients of the high transform sub-band coefficients are chosen based on the number of ignition times. Finally, the NSCT inverter reconstructs the fusion medical image. The experimental results and analysis shows that the performance of the fusion result is better than the existing algorithms, and the algorithm of gray and color image fusion is higher than the contrast algorithm in the edge information transfer factor Q^AB/F index.

3.2.2. Pulse-Coupled Neural Network (PCNN)

The purpose of image fusion is not only to keep the characteristic information of the source image but also to ensure less distortion and good visual effects; therefore, the high-frequency coefficient is decomposed using PCNN. PCNN is a single-layer two-dimensional connected neuron array horizontally, which is used widely in the image processing field [45]^[23]. It’s a biologically inspired feedback neural network in which the neurons consist of a connected modulation field, receiving field, and a pulse generator. It has a more significant advantage in the biological background which can be used to obtain useful information from source images without a training process. It has many defects, such as difficulty in setting parameters.

Ouerghi et al. [46]^[24] proposed a new fusion method based on a simplified pulse-coupled neural network (S-PCNN) and NSST. First, PET images are converted into YIQ components. The NSST transform is only applied for Y components of PET images and MRI images. The standard deviation of the weight region and the local energy is used to fuse the low-frequency sub-band. The high-frequency coefficients are fused using S-PCNN. This algorithm achieves good quality in the fused image.

Duan et al. [47]^[25] proposed an MMIF framework based on PCNN and low-rank representation of image blocks. The NSCT is used to decompose the image. The low-frequency sub-band adopts the low-rank fusion strategy based on the K-SVD dictionary learning algorithm, which leads to strengthening the extraction of local features. The high-frequency sub-band adopts the PCNN fusion. Finally, the guided filter to deepen the edge details is used to fuse the image after NSCT inverse transform with MRI gray image. The proposed fusion method achieves great results over other existing algorithms in objective metrics and visual effects.

3.2.3. Non-Subsampled Shearlet Transform (NSST)

NSCT complication in computation shearlet is a new tool that can obtain mathematical properties and geometric, for example, scales, elongated shapes, oscillations, and directionality from images. The shearlet is optimally sparse in displaying images with edges because they form a tight frame in a variety of scales and directions. The decomposition of an image shearlet transforms such as that of a contourlet, but there is no restriction in directions for shearing. In the inverse ST, to improve the computational efficiency, the shearing filters need only to be aggregated instead of inverting a directional filter bank in the contourlet. Qiu et al. [48]^[26] proposed an image fusion method that transformed both CT and MR images into the NSST domain to obtain low and high-frequency components. They use the absolute-maximum rule to merge high-frequency components and use a sparse representation-based approach to merge the low-frequency components. To improve the performance of the sparse representation-based approach, they propose a dynamic group sparsity recovery algorithm. Finally, they performed the inverse NSST on the merged component to obtain the fused image. Their approach provides better fusion results in terms of objective and subjective quality evaluation.

Yin et al. [49]^[27] proposed an innovative multimodal medical image fusion method in NSST. They perform NSST decomposition on the source images to obtain their multidirectional and multi-scale representations. They use a Parameter-Adaptive Pulse-Coupled Neural Network (PA-PCNN) model to fuse high-frequency bands in which all parameters of PCNN are estimated by the input band. They use a new strategy, namely, energy preservation and detail extraction, that addresses two crucial issues in medical image fusion simultaneously. Finally, inverse NSST is performed on the fused low-frequency and high-frequency bands to reconstruct the fused image. In order to test the viability of the suggested approach, extensive tests were carried out employing 83 pairings of source photos across four categories of medical image fusion difficulties. The experimental results show that the suggested method can achieve state-of-the-art performance in terms of both visual perception and objective assessment.

3.3. Fuzzy-Logic-Based Methods

In 1965, Zadeh [50]^[28] was the first to establish the Fuzzy Logic (FL) theory, and since then, MMIF algorithms have made heavy use of it. Medical images have indistinct regions as a result of inadequate lighting. In light of this, the Fuzzy Sets Theory (FST) is used in medical image processing. The idea of FST has made tremendous progress in overcoming uncertainty. FL model consists of a fuzzifier, an inference engine, a de-fuzzifier, fuzzy sets, and fuzzy rules [51]^[29]. FL is used for image fusion as both a feature transform operator and a decision operator [52]^[30].

Kumar et al. [53]^[31] fused the medical images by implementing intuitionistic fuzzy logic-based image fusion. They repress the noise and enhance the input images, and integrate them into the IHS domain efficiently. Fuzzy sets are integrated to overcome the uncertainties caused by them due to the vagueness and ambiguity of the intuitionistic.

Tirupal et al. [54]^[32] proposed a new method, namely, Sugeno’s Intuitionistic Fuzzy Set, to fuse medical images. First, the medical images are converted into SIFI images. Second, the SIFIs split the images into blocks to calculate the count of whiteness and blackness of the blocks. Finally, the fused image is rebuilt from the recombination of SIFI image blocks.

Tirupal et al. [55]^[33] introduced a technique based on an interval-valued intuitionistic fuzzy set (IVIFS) for effectively fusing multimodal medical images, with a median filter used to eliminate noise from the final fused image. Several sets of multimodal medical images are simulated and compared to the available fusion techniques, such as an intuitionistic fuzzy set and fuzzy transform.

3.4. Morphological Methods

In the early eighties, most multimodal medical image fusion algorithms used mathematical morphology broadly, which determined its objects as a set of points and operations between two sets [56]^[34]. The structuring and the objects element are observed when using filters created with morphological operators. Extracting features from a subset of spatially localized pixels has been consistently successful [8]^[35]. The image opening and image closing filters used in the morphological pyramid decomposition were found to be ineffective for edge detection. The mathematical morphology algorithm has retained important image regions and details with increased calculation time [57]^[36].

Yang et al. [58]^[37] present a recent algorithm for medical image fusion by CT and MRI images with a shift-invariant multi-scale decomposition scheme. By eliminating the downsampling operators from a morphological wavelet, the decomposition scheme is produced. An experiment using an actual medical image demonstrates how much the suggested strategy enhances the quality of the fused image. The proposed method outperforms competing approaches in terms of maintaining both “pixel” and “edge” information.

3.5. Sparse Representation Methods

In recent years, Sparse Representation (SR) methods applied in image fusion applications have become a prevalent and important research point among the research community. It attracted significant attention and performed successfully, so the transform domain image fusion algorithms that combine sparse representation techniques have been used [59]^[38]. Wang et al. [60]^[39] proposed an image fusion framework that combines images by integrating NSCT with SR, which resulted in enhanced fusion than the fusion algorithms of single transformation. However, the processing time of the proposed framework was longer than the multi-scale transform-based methods. Li et al. [61]^[40] proposed an image fusion framework that integrates SR with NSCT, which achieves an improved fusion with respect to detail preservation and structural similarity for the visible-infrared images. Maqsood et al. [62]^[41] proposed a multimodal image fusion scheme based on two-scale medical image decomposition merged with SR which the edge details in CT-MRI image fusion are improved, but it cannot use with color images. Chen et al. [63]^[42] proposed a target-enhanced multi-scale decomposition fusion technique for infrared and visible image fusion in which the texture details in visible images are preserved, and the thermal target in infrared images is enhanced.

Shabanzade et al. [64]^[43] present an image fusion framework for image modalities (i.e., PET and MRI) based on sparse representation in the NSCT domain. Source images were obtained from their low-pass and high-pass sub-bands by performing NSCT. Then, low-pass sub-bands are fused by sparse representation based by using clustering-based dictionary learning. While high-pass sub-bands are merged by application of the salience match measure rule.

Kim et al. [65]^[44] proposed an efficient dictionary learning for the multimodal image fusion method based on joint patch clustering. They build an over-complete dictionary to represent a fused image with a sufficient number of useful atoms. Different sensor modalities transfer the image information and structural similarities, and all patches from different source images are clustered together. The joint patch clusters are collected and integrated to build the over-complete dictionary to structure an informative dictionary. Finally, sparse coefficients are evaluated in multimodal images with the common dictionary learned.

Polinati et al. [66]^[45] presented a unique approach, convolutional sparse image decomposition (CSID), that combines CT and MR images. To locate edges in source images, CSID employs contrast stretching and the spatial gradient approach, as well as cartoon-texture decomposition, which produces an over-complete dictionary. In addition, they introduced a modified convolutional sparse coding approach and used enhanced decision maps and the fusion rule to create the final fused image.

3.6. Deep Learning Fusion Methods

In recent years, deep learning has been a new research field in medical image fusion. Compared with medical image fusion, it’s widely used in medical image registration [67,68,69]^[46][47][48] and medical image segmentation [70,71,72]^[49][50][51]. It uses a number of layers, and each layer takes its information from the previous layer. It helps to structure the complicated framework architecturally layered and has the capability to handle enormous amounts of data [73]^[52]. Convolutional Neural networks (CNN), Convolution Sparse Representation (CSR), and Deep Convolution Neural Networks (DCCNs) are examples of deep learning fusion approaches. The CNN model is most often used in deep learning approaches. Each layer in CNN is composed of a number of feature maps that contain neurons as coefficients. The feature maps are connected to each stage in the numerous stages using various methods such as spatial pooling, convolution, and non-linear activation [74]^[53]. Convolutional Sparse Coding (CSC) is another popular deep-learning fusion technique.

Liu et al. [75]^[54] proposed a convolutional neural network-based technique for fusing medical images. They used a Siamese convolutional network to create a weight map that integrates the pixel activity information from source images. They perform a fusion process that is more consistent with human visual perception by conducting it in a multi-scale manner via image pyramids. Some well-known image fusion techniques, such as multi-scale processing and adaptive fusion mode selection, are appropriately used to provide perceptually pleasing outcomes. The method can produce high-quality outcomes in terms of visual quality and objective metrics according to experimental results.

Rajalingam et al. [76]^[55] introduced an effective multimodal medical image fusion method based on deep learning convolutional neural networks. They use CT, MRI, and PET as the input multi-modality medical images for the experimental work. They use a Siamese convolutional network to produce a weight map that incorporates the pixel movement information from two or more multi-modality medical images. To make the medical image fusion process more accurate with human visual insight, they carried out the procedure of the medical image fusion in a multi-scale manner via medical image pyramids. To correct the fusion mode for the decomposed coefficients, they apply a local comparison-based strategy. An experimental result shows that the suggested method outperforms alternative methods currently used in terms of processing performance and outcomes for both subjective and objective evaluation criteria.

Xia et al. [77]^[56] proposed a novel muti-modal medical image fusion schema that uses both the features of deep convolutional neural network-based and multi-scale transformation. Firstly, the Gauss–Laplace and Gaussian filters divide the source images into several sub-images in the first layer of the network. Then, the convolution kernel of the rest layers is initialized, and the basic unit is constructed using the HeK-based method. The basic unit is trained using a backpropagation algorithm. To create a deep stacked neural network, train a number of fundamental units that are sacked with the idea of SAE. The suggested network is used to decompose the input images to obtain their own low-frequency and high-frequency images, merge the fusion rule to fuse low-frequency and high-frequency images, and then return those merged images to the network’s final layer to produce the final fusion images. By doing numerous experiments on various medical image datasets, the effectiveness of their suggested fusion method is assessed. Experimental results show that, in comparison to other approaches now in use, their proposed method not only successfully fuses the numerous images to provide superior results but also ensures an improvement in the many quantitative parameters. Furthermore, the revised CNN approach runs significantly more quickly than similar algorithms with high fusion quality.

Wang et al. [78]^[57] introduced medical image fusion algorithms that can combine medical images from many morphologies to improve the accuracy and reliability of a clinical diagnosis, which plays a greatly important role in many clinical applications. The suggested approach combines the pixel activity data from the source images with the trained Siamese convolutional network to produce the weight map. Meanwhile, the source image is decomposed using a contrast pyramid. Source images are combined using various spatial transform bands and a weighted fusion operator. Comparative experiments’ results demonstrated that the suggested fusion method might successfully retain the source images’ intricate structural details while producing pleasing aesthetic effects for humans.

Wang et al. [79]^[58] introduced a new MMIF algorithm based on CNN and NSCT. To obtain better fusion results, they exploit the advantages of both NSCT and CNN. In the proposed algorithm, the source images are divided into high and low-frequency sub-bands. They use a new fusion rule, namely, Perceptual High-Frequency CNN (PHF-CNN), which produces high-frequency sub-bands. In the case of the low-frequency sub-band, the decision map is generated by adopting two result maps. Finally, they inverse NSCT to integrate fused frequency sub-bands. According to experimental results, the suggested approach is superior to existing algorithms in terms of assessment, and it increases the quality of fused images.

References

MITA. Available online: https://www.medicalimaging.org/about-mita/modalities (accessed on 22 December 2022).
Dinh, P.-H. A novel approach based on three-scale image decomposition and marine predators algorithm for multi-modal medical image fusion. Biomed. Signal Process. Control 2021, 67, 102536.
Chang, L.; Ma, W.; Jin, Y.; Xu, L. An image decomposition fusion method for medical images. Math. Probl. Eng. 2020, 2020, 4513183.
He, C.; Liu, Q.; Li, H.; Wang, H. Multimodal medical image fusion based on IHS and PCA. Procedia Eng. 2010, 7, 280–285.
Bashir, R.; Junejo, R.; Qadri, N.N.; Fleury, M.; Qadri, M.Y. SWT and PCA image fusion methods for multi-modal imagery. Multimed. Tools Appl. 2019, 78, 1235–1263.
Depoian, A.C.; Jaques, L.E.; Xie, D.; Bailey, C.P.; Guturu, P. Neural network image fusion with PCA preprocessing. In Proceedings of the Big Data III: Learning, Analytics, and Applications, Online Event, 12–16 April 2021; pp. 132–147.
Rehal, M.; Goyal, A. Multimodal Image Fusion based on Hybrid of Hilbert Transform and Intensity Hue Saturation using Fuzzy System. Int. J. Comput. Appl. 2021, 975, 8887.
Azam, M.A.; Khan, K.B.; Ahmad, M.; Mazzara, M. Multimodal medical image registration and fusion for quality Enhancement. CMC-Comput. Mater. Contin 2021, 68, 821–840.
Liu, F.; Chen, L.; Lu, L.; Ahmad, A.; Jeon, G.; Yang, X. Medical image fusion method by using Laplacian pyramid and convolutional sparse representation. Concurr. Comput. Pract. Exp. 2020, 32, e5632.
Zhu, Z.; Zheng, M.; Qi, G.; Wang, D.; Xiang, Y. A phase congruency and local Laplacian energy based multi-modality medical image fusion method in NSCT domain. IEEE Access 2019, 7, 20811–20824.
Kavitha, C.; Chellamuthu, C.; Rajesh, R. Medical image fusion using combined discrete wavelet and ripplet transforms. Procedia Eng. 2012, 38, 813–820.
Osadchiy, A.; Kamenev, A.; Saharov, V.; Chernyi, S. Signal processing algorithm based on discrete wavelet transform. Designs 2021, 5, 41.
Bhavana, V.; Krishnappa, H. Multi-modality medical image fusion using discrete wavelet transform. Procedia Comput. Sci. 2015, 70, 625–631.
Jaffery, Z.A.; Zaheeruddin; Singh, L. Computerised segmentation of suspicious lesions in the digital mammograms. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2017, 5, 77–86.
Cheng, S.; He, J.; Lv, Z. Medical image of PET/CT weighted fusion based on wavelet transform. In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18 May 2008; pp. 2523–2525.
Georgieva, V.; Petrov, P.; Zlatareva, D. Medical image processing based on multidimensional wavelet transforms-Advantages and trends. In Proceedings of the AIP Conference Proceedings, Sofia, Bulgaria, 7–3 June 2022; p. 020001.
Wang, R.; Fang, N.; He, Y.; Li, Y.; Cao, W.; Wang, H. Multi-modal Medical Image Fusion Based on Geometric Algebra Discrete Cosine Transform. Adv. Appl. Clifford Algebr. 2022, 32, 1–23.
Do, M.N.; Vetterli, M. The contourlet transform: An efficient directional multiresolution image representation. IEEE Trans. Image Process. 2005, 14, 2091–2106.
Li, B.; Peng, H.; Wang, J. A novel fusion method based on dynamic threshold neural P systems and nonsubsampled contourlet transform for multi-modality medical images. Signal Process. 2021, 178, 107793.
Li, W.; Lin, Q.; Wang, K.; Cai, K. Improving medical image fusion method using fuzzy entropy and nonsubsampling contourlet transform. Int. J. Imaging Syst. Technol. 2021, 31, 204–214.
Alseelawi, N.; Hazim, H.T.; Salim ALRikabi, H.T. A Novel Method of Multimodal Medical Image Fusion Based on Hybrid Approach of NSCT and DTCWT. Int. J. Online Biomed. Eng. 2022, 18, 28011.
Xia, J.; Chen, Y.; Chen, A.; Chen, Y. Medical image fusion based on sparse representation and PCNN in NSCT domain. Comput. Math. Methods Med. 2018, 2018, 2806047.
Xiong, Y.; Wu, Y.; Wang, Y.; Wang, Y. A medical image fusion method based on SIST and adaptive PCNN. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017; pp. 5189–5194.
Ouerghi, H.; Mourali, O.; Zagrouba, E. Non-subsampled shearlet transform based MRI and PET brain image fusion using simplified pulse coupled neural network and weight local features in YIQ colour space. IET Image Process. 2018, 12, 1873–1880.
Duan, Y.; He, K.; Xu, D. Medical Image Fusion Technology Based on Low-Rank Representation of Image Blocks and Pulse Coupled Neural Network. In Proceedings of the 2022 7th International Conference on Image, Vision and Computing (ICIVC), Xi’an, China, 26–28 June 2022; pp. 473–479.
Qiu, C.; Wang, Y.; Zhang, H.; Xia, S. Image fusion of CT and MR with sparse representation in NSST domain. Comput. Math. Methods Med. 2017, 2017, 9308745.
Yin, M.; Liu, X.; Liu, Y.; Chen, X. Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans. Instrum. Meas. 2018, 68, 49–64.
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353.
Biswas, B.; Sen, B.K. Medical image fusion technique based on type-2 near fuzzy set. In Proceedings of the 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, 20–22 November 2015; pp. 102–107.
Das, A.; Bhattacharya, M. Evolutionary algorithm based automated medical image fusion technique: Comparative study with fuzzy fusion approach. In Proceedings of the 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, India, 9–11 December 2009; pp. 269–274.
Kumar, M.; Kaur, A.; Amita. Improved image fusion of colored and grayscale medical images based on intuitionistic fuzzy sets. Fuzzy Inf. Eng. 2018, 10, 295–306.
Tirupal, T.; Mohan, B.C.; Kumar, S.S. Multimodal medical image fusion based on Sugeno’s intuitionistic fuzzy sets. ETRI J. 2017, 39, 173–180.
Tirupal, T.; Chandra Mohan, B.; Srinivas Kumar, S. Multimodal medical image fusion based on interval-valued intuitionistic fuzzy sets. In Machines, Mechanism and Robotics; Springer: Berlin/Heidelberg, Germany, 2022; pp. 965–971.
Soille, P. Morphological Image Analysis: Principles and Applications; Springer: Berlin/Heidelberg, Germany, 1999; Volume 2.
James, A.P.; Dasarathy, B.V. Medical image fusion: A survey of the state of the art. Inf. Fusion 2014, 19, 4–19.
Bai, X. Morphological image fusion using the extracted image regions and details based on multi-scale top-hat transform and toggle contrast operator. Digit. Signal Process. 2013, 23, 542–554.
Yang, B.; Jing, Z. Medical image fusion with a shift-invariant morphological wavelet. In Proceedings of the 2008 IEEE Conference on Cybernetics and Intelligent Systems, Chengdu, China, 21–24 September 2008; pp. 175–178.
Zhu, Z.; Chai, Y.; Yin, H.; Li, Y.; Liu, Z. A novel dictionary learning approach for multi-modality medical image fusion. Neurocomputing 2016, 214, 471–482.
Wang, J.; Peng, J.; Feng, X.; He, G.; Wu, J.; Yan, K. Image fusion with nonsubsampled contourlet transform and sparse representation. J. Electron. Imaging 2013, 22, 043019.
Li, Y.; Sun, Y.; Huang, X.; Qi, G.; Zheng, M.; Zhu, Z. An image fusion method based on sparse representation and sum modified-Laplacian in NSCT domain. Entropy 2018, 20, 522.
Maqsood, S.; Javed, U. Multi-modal medical image fusion based on two-scale image decomposition and sparse representation. Biomed. Signal Process. Control 2020, 57, 101810.
Chen, J.; Li, X.; Luo, L.; Mei, X.; Ma, J. Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf. Sci. 2020, 508, 64–78.
Shabanzade, F.; Ghassemian, H. Multimodal image fusion via sparse representation and clustering-based dictionary learning algorithm in nonsubsampled contourlet domain. In Proceedings of the 2016 8th International Symposium on Telecommunications (IST), Tehran, Iran, 27–28 September 2016; pp. 472–477.
Kim, M.; Han, D.K.; Ko, H. Joint patch clustering-based dictionary learning for multimodal image fusion. Inf. Fusion 2016, 27, 198–214.
Polinati, S.; Bavirisetti, D.P.; Rajesh, K.N.; Naik, G.R.; Dhuli, R. The Fusion of MRI and CT Medical Images Using Variational Mode Decomposition. Appl. Sci. 2021, 11, 10975.
Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. VoxelMorph: A learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800.
Hu, Y.; Modat, M.; Gibson, E.; Li, W.; Ghavami, N.; Bonmati, E.; Wang, G.; Bandula, S.; Moore, C.M.; Emberton, M. Weakly-supervised convolutional neural networks for multimodal image registration. Med. Image Anal. 2018, 49, 1–13.
Yang, X.; Kwitt, R.; Styner, M.; Niethammer, M. Quicksilver: Fast predictive image registration–a deep learning approach. NeuroImage 2017, 158, 378–396.
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Athens, Greece, 17–21 October 2016; pp. 424–432.
Ronneberger, O. Invited talk: U-net convolutional networks for biomedical image segmentation. In Bildverarbeitung für die Medizin 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 234–241.
Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571.
Zhou, T.; Ruan, S.; Canu, S. A review: Deep learning for medical image segmentation using multi-modality fusion. Array 2019, 3, 100004.
Nguyen, K.P.; Fatt, C.C.; Treacher, A.; Mellema, C.; Trivedi, M.H.; Montillo, A. Anatomically informed data augmentation for functional MRI with applications to deep learning. In Proceedings of the Medical Imaging 2020: Image Processing, Houston, TX, USA, 15–20 February 2020; pp. 172–177.
Liu, Y.; Chen, X.; Cheng, J.; Peng, H. A medical image fusion method based on convolutional neural networks. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 1–7.
Rajalingam, B.; Priya, R. Multimodal medical image fusion based on deep learning neural network for clinical treatment analysis. Int. J. ChemTech Res. 2018, 11, 160–176.
Xia, K.-j.; Yin, H.-s.; Wang, J.-q. A novel improved deep convolutional neural network model for medical image fusion. Clust. Comput. 2019, 22, 1515–1527.
Wang, K.; Zheng, M.; Wei, H.; Qi, G.; Li, Y. Multi-modality medical image fusion using convolutional neural network and contrast pyramid. Sensors 2020, 20, 2169.
Wang, Z.; Li, X.; Duan, H.; Su, Y.; Zhang, X.; Guan, X. Medical image fusion based on convolutional neural networks and non-subsampled contourlet transform. Expert Syst. Appl. 2021, 171, 114574.