Image Fusion Methods

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Zhisheng Gao	--	1352	2024-01-25 14:35:07	\|
2	layout & references	Sirius Huang	Meta information modification	1352	2024-01-26 02:05:43	\|

This entry is adapted from the peer-reviewed paper 10.3390/e26010057

Image fusion is the generation of an informative image that contains complementary information from the original sensor images, such as texture details and attentional targets. Existing methods have designed a variety of feature extraction algorithms and fusion strategies to achieve image fusion.

image fusion shared feature differential feature

1. Introduction

In many monitoring fields, it is difficult for a single sensor to capture enough information required to meet the monitoring tasks ^[1]. Sensors in different wavebands (for example, infrared and visible light) have obvious advantages in monitoring the same scene. However, on the one hand, multiple sensors bring data storage challenges, and on the other hand, the image information contained by a single sensor is flawed. Taking infrared and visible light images as an example, infrared sensors reflect the radiation characteristics of foreground targets via thermal radiation imaging, but infrared images often lack structural and texture information. The visible light sensor describes the background details of the scene via light reflection, but it is greatly affected by changes in lighting and weather conditions ^[2]. Therefore, image fusion has become a popular research field ^[3]. Image fusion is to fuse the input images into one image. At the same time, the fused image contains all the information of the input images and can even generate more significant information.

According to different application scenarios, image fusion is mainly divided into multi-focus ^[4], multi-spectral ^[5], and medical images ^[6]. The most studied multi-spectral image fusion is the fusion of infrared and visible light images, which also includes the fusion of hyperspectral images in the field of remote sensing. The fused multi-focus image can clearly image the background and foreground at the same time. The fused image of the multi-spectral image contains imaging information of multiple spectra. The fused image of magnetic resonance imaging (MRI) and computed tomography (CT) can clearly see the soft tissue and the bones at the same time.

The two core tasks of image fusion are feature extraction and feature fusion strategies. The original images are transformed into the feature domain, where the fusion rules are designed for features fusing, and then the fused features are reconstructed back to the original pixel space to obtain the fused image. For feature extraction tasks, existing pioneer image fusion works are divided into two major categories: methods of artificially designed transformations and methods based on feature representation learning. Feature fusion strategies are also divided into two types: manual design and global optimization learning.

Artificially constructed feature extraction methods and feature fusion rules at all levels are the most intensively studied areas of image fusion. Since such methods do not require training and are completely unsupervised methods, they have good versatility. The main feature transformation methods include discrete wavelet transform (DWT) ^[7], shearlet transform ^[8], nonsubsampled contourlet transform ^[9], low-rank representation (LRR) ^[10], and bilateral filter ^[11]. The manually designed fusion rules mainly include maximum value, average value, and nuclear norm. The usual approach is base parts adopt the average fusion rule, and detail parts adopt the maximum fusion rule ^[12]. In the representation learning domain, the typical methods are based on sparse representation (SP) ^[8]^[13]^[14]. SP learns a single-layer common over-complete dictionary from the input images, then performs sparse representation of the input images respectively; fuses the sparse coefficients; and reconstructs the fused image. Deep learning has better representation learning capabilities than SP, and it has become a popular research point in the field of image fusion ^[2]^[4]^[15]^[16]^[17]^[18]. Methods based on deep learning first train a common encoder and decoder using a large number of images; use the encoder to extract features of the input images respectively; use a fusion rule to fuse these feature maps; and finally, the decoder is used to reconstruct the fusion image ^[15]^[19].

For the fusion of representation features, manually designed fusion rules lack interpretability. Some research works try to learn fusion rules by defining loss functions. Global optimization methods such as particle swarm ^[20], Grasshopper ^[21], and membrane computing ^[22] are used for fusion rule learning. Another class of methods learns the fused decision map via deep learning ^[23]^[24].

Therefore, the core of image fusion is to transform the image from the original pixel space to a feature representation space that is easy to fuse. After fusion is achieved in the new feature space, the fused image is obtained via inverse transformation. Judging from the current research trends in academia, it is difficult to develop a new method in the traditional field of multi-scale transformation, and methods based on deep learning are the current research hotspots. The current general idea based on deep learning methods is to implement feature extraction via an encoder. After fusing the features, image fusion is achieved via a decoder. The core problem of these methods is weak interpretability and lack of criteria for judging the quality of extracted features.

2. Model-Based Feature Extraction Method

Performing pixel-level transformation on the input source images and extracting multi-scale features of the original images are hot spots in early research. The image features extracted by such methods have very nice interpretability. They decompose the input images into low-frequency (base) parts and high-frequency (detail) parts. The high-frequency part reflects the basic semantic information of the scene, and the high-frequency part reflects the target information in the background. The nonsubsampled contourlet transform (NSCT) ^[9] is a pioneer work that is used for image fusion. To combine the advantages of multi-scale and deep learning, Wang et al. ^[25] proposed an image fusion method based on a convolutional neural network and NSCT. MDLatLRR ^[26] is a baseline method in this field, which first performs a multi-level low-rank sparse decomposition of the input images, and then fuses the base and detail parts separately. Li et al. ^[27] performed norm optimization on the fused images of MDLatLRR to obtain more significant fused images. Gaussian difference is used for image fusion, which is simple, efficient, and versatile ^[28].

3. Generative-Based Methods

GAN-based methods attempt to use generative neural network models to generate fused images conditioned on inputting multi-source images ^[29]. DDcGAN ^[30] drives the deep neural network to learn complementary features to reconstruct the fused image based on a defined loss function. GAN-FM ^[31] introduces a full-scale skip-connected generator and Markovian discriminators, and Fusion-UDCGAN ^[32] adopts a U-Type densely connected generation adversarial network. AT-GAN ^[33] proposes a generative adversarial network with intensity attention modules and semantic transition modules. This type of method can obtain additional image enhancement effects based on the definition of the loss function.

4. Task-Driven Approach

Fusion to improve target segmentation accuracy in low-light environments is one of the current research hotspots. This type of method hopes that the fused image will have higher brightness and more prominent target contours. SCFusion ^[24] achieves target saliency enhancement via a mask of the target area. SGFusion ^[34] achieves saliency guidance in the fusion process through multi-tasking of target segmentation. TIM ^[35] proposes a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. SOSMaskFuse ^[36] also uses a target segmentation mask to achieve target enhancement. PIAFusion ^[37] realizes image fusion under low light conditions. This type of method generally has an enhanced effect on the original multi-source images, and the fused image has better gradient and visual saliency. But, the consistency with the original images is poor.

5. Autoencoder-Based Methods

The fusion method based on the autoencoder and decoder believes that neurons have a stronger amplitude response to salient areas. Fu et al. ^[38] proposed a dual branch network encoder to learn richer features. DeepFuse ^[39] performs feature extraction on multiple channels and is used for multi-exposure image fusion. DenseFuse ^[40] introduces a dense block in the encoder to extract multi-scale features. FusionDN ^[41] also uses a densely connected network and defines a multi-task loss function. NestFuse ^[42] introduces a nest connection architecture and also introduces a spatial attention mechanism to enhance the salient features. RFNNest ^[43] proposes a residual fusion network and can better retain detailed features. PSFusion ^[44] presents a practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity constraints and the fusion images have good visual appeal. CDDFuse ^[45] is inspired by multi-scale decomposition and uses neural networks to decompose images into basic parts and detailed parts. The fused image is reconstructed after fusing the two parts respectively. This method requires two stages of training.

References

Zhang, X. Benchmarking and comparing multi-exposure image fusion algorithms. Inf. Fusion 2021, 74, 111–131.
Wang, Z.; Wu, Y.; Wang, J.; Xu, J.; Shao, W. Res2Fusion: Infrared and visible image fusion based on dense Res2net and double nonlocal attention models. IEEE Trans. Instrum. Meas. 2022, 71, 1–12.
Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42.
Zhang, X. Deep learning-based multi-focus image fusion: A survey and a comparative study. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4819–4838.
Yilmaz, C.S.; Yilmaz, V.; Gungor, O. A theoretical and practical survey of image fusion methods for multispectral pansharpening. Inf. Fusion 2022, 79, 1–43.
Hermessi, H.; Mourali, O.; Zagrouba, E. Multimodal medical image fusion review: Theoretical background and recent advances. Signal Process. 2021, 183, 108036.
Ben Hamza, A.; He, Y.; Krim, H.; Willsky, A. A multiscale approach to pixel-level image fusion. Integr. Comput. Aided Eng. 2005, 12, 135–146.
Gao, Z.; Yang, M.; Xie, C. Space target image fusion method based on image clarity criterion. Opt. Eng. 2017, 56, 053102.
Da Cunha, A.L.; Zhou, J.; Do, M.N. The nonsubsampled contourlet transform: Theory, design, and applications. IEEE Trans. Image Process. 2006, 15, 3089–3101.
Li, H.; Wu, X.J. Multi-focus image fusion using dictionary learning and low-rank representation. In Proceedings of the Image and Graphics: 9th International Conference, ICIG 2017, Shanghai, China, 13–15 September 2017; Revised Selected Papers, Part I 9. Springer: Cham, Switzerland, 2017; pp. 675–686.
Li, X.; Zhou, F.; Tan, H.; Zhang, W.; Zhao, C. Multimodal medical image fusion based on joint bilateral filter and local gradient energy. Inf. Sci. 2021, 569, 302–325.
Liu, Y.; Wang, L.; Cheng, J.; Li, C.; Chen, X. Multi-focus image fusion: A survey of the state of the art. Inf. Fusion 2020, 64, 71–91.
Zhang, Q.; Liu, Y.; Blum, R.S.; Han, J.; Tao, D. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review. Inf. Fusion 2018, 40, 57–75.
Yang, B.; Li, S. Multifocus image fusion and restoration with sparse representation. IEEE Trans. Instrum. Meas. 2009, 59, 884–892.
Zhang, H.; Xu, H.; Tian, X.; Jiang, J.; Ma, J. Image fusion meets deep learning: A survey and perspective. Inf. Fusion 2021, 76, 323–336.
Sun, H.; Xiao, W. Similarity Weight Learning: A New Spatial and Temporal Satellite Image Fusion Framework. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17.
Xiao, Y.; Guo, Z.; Veelaert, P.; Philips, W. DMDN: Degradation model-based deep network for multi-focus image fusion. Signal Process. Image Commun. 2022, 101, 116554.
Wang, W.; Fu, X.; Zeng, W.; Sun, L.; Zhan, R.; Huang, Y.; Ding, X. Enhanced deep blind hyperspectral image fusion. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1513–1523.
Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118.
Shehanaz, S.; Daniel, E.; Guntur, S.R.; Satrasupalli, S. Optimum weighted multimodal medical image fusion using particle swarm optimization. Optik 2021, 231, 166413.
Dinh, P.H. A novel approach based on grasshopper optimization algorithm for medical image fusion. Expert Syst. Appl. 2021, 171, 114576.
Li, B.; Peng, H.; Luo, X.; Wang, J.; Song, X.; Pérez-Jiménez, M.J.; Riscos-Núñez, A. Medical image fusion method based on coupled neural P systems in nonsubsampled shearlet transform domain. Int. J. Neural Syst. 2021, 31, 2050050.
Ma, B.; Yin, X.; Wu, D.; Shen, H.; Ban, X.; Wang, Y. End-to-end learning for simultaneously generating decision map and multi-focus image fusion result. Neurocomputing 2022, 470, 204–216.
Liu, H.; Ma, M.; Wang, M.; Chen, Z.; Zhao, Y. SCFusion: Infrared and visible fusion based on salient compensation. Entropy 2023, 25, 985.
Wang, Z.; Li, X.; Duan, H.; Su, Y.; Zhang, X.; Guan, X. Medical image fusion based on convolutional neural networks and non-subsampled contourlet transform. Expert Syst. Appl. 2021, 171, 114574.
Li, H.; Wu, X.J.; Kittler, J. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 2020, 29, 4733–4746.
Li, G.; Lin, Y.; Qu, X. An infrared and visible image fusion method based on multi-scale transformation and norm optimization. Inf. Fusion 2021, 71, 109–129.
Kurban, R. Gaussian of Differences: A Simple and Efficient General Image Fusion Method. Entropy 2023, 25, 1215.
Zhou, T.; Li, Q.; Lu, H.; Cheng, Q.; Zhang, X. GAN review: Models and medical image fusion applications. Inf. Fusion 2023, 91, 134–148.
Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.P. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995.
Zhang, H.; Yuan, J.; Tian, X.; Ma, J. GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators. IEEE Trans. Comput. Imaging 2021, 7, 1134–1147.
Gao, Y.; Ma, S.; Liu, J.; Xiu, X. Fusion-UDCGAN: Multifocus image fusion via a U-type densely connected generation adversarial network. IEEE Trans. Instrum. Meas. 2022, 71, 1–13.
Rao, Y.; Wu, D.; Han, M.; Wang, T.; Yang, Y.; Lei, T.; Zhou, C.; Bai, H.; Xing, L. AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion. Inf. Fusion 2023, 92, 336–349.
Liu, J.; Dian, R.; Li, S.; Liu, H. SGFusion: A saliency guided deep-learning framework for pixel-level image fusion. Inf. Fusion 2023, 91, 205–214.
Liu, R.; Liu, Z.; Liu, J.; Fan, X.; Luo, Z. A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion. arXiv 2023, arXiv:2305.15862.
Li, G.; Qian, X.; Qu, X. SOSMaskFuse: An Infrared and Visible Image Fusion Architecture Based on Salient Object Segmentation Mask. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10118–10137.
Tang, L.; Yuan, J.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 2022, 83, 79–92.
Fu, Y.; Wu, X.J. A dual-branch network for infrared and visible image fusion. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Yichang, China, 16–18 September 2021; pp. 10675–10680.
Ram Prabhakar, K.; Sai Srikar, V.; Venkatesh Babu, R. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4714–4722.
Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623.
Xu, H.; Ma, J.; Le, Z.; Jiang, J.; Guo, X. Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12484–12491.
Li, H.; Wu, X.J.; Durrani, T. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656.
Li, H.; Wu, X.J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86.
Tang, L.; Zhang, H.; Xu, H.; Ma, J. Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Inf. Fusion 2023, 101870.
Zhao, Z.; Bai, H.; Zhang, J.; Zhang, Y.; Xu, S.; Lin, Z.; Timofte, R.; Van Gool, L. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5906–5916.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Lei Liang

Zhisheng Gao

View Times: 128

Update Date: 26 Jan 2024

Table of Contents

Video Upload Options

Confirm