Near-Infrared Band Simulation Using Conditional Generative Adversarial Network: Comparison
Please note this is a comparison between Version 1 by Xiangtian Yuan and Version 2 by Rita Xu.

Multispectral sensors are important instruments for Earth observation. In remote sensing applications, the near-infrared (NIR) band, together with the visible spectrum (RGB), provide abundant information about ground objects.

  • multispectral
  • remote sensing
  • NIR
  • RGB

1. Introduction

Multispectral remote sensing is one of the important means of Earth observation. It has been extensively employed to explore the physical, chemical and biological properties of the Earth’s surface. In addition to visible spectra that the human visual system can perceive, multispectral sensors capture signals from additional spectral ranges. Since ground objects respond differently to light of certain wavelengths, the wider spectral range allows additional information to be extracted from ground objects.
Due to limitations of budget, technology, intended application and various other reasons, not every sensor is capable of capturing a wide range of wavelengths across the electromagnetic spectrum. Moreover, differences in the wavelength characteristics of different sensors can make it challenging to use data from multiple sensors simultaneously, necessitating the process of harmonization [1]. Some pixels might be corrupted during the data down-link from satellites, which can hinder further analysis [2]. In the case of unmanned aerial vehicle (UAV) remote sensing, low-cost, off-the-shelf cameras typically capture only visible light in the red, green, and blue wavelengths, which limits their potential for downstream applications that require near-infrared spectra, such as vegetation monitoring. As a result, researchers have modified commercial cameras to capture the NIR bands, but registration of each band is often required [3][4][5][6][3,4,5,6].
Among the common spectral bands, the near-infrared (NIR) bands have been used extensively in Earth observation tasks. In combination with visible spectra, NIR bands contain additional features of ground objects, especially vegetation. For example, indices using NIR bands have been developed for tasks such as land cover classification. These indices include the normalized difference vegetation index (NDVI) and the normalized difference water index (NDWI), which have been shown to be effective in highlighting vegetation and open water in remotely sensed imagery [7]. In addition to identifying vegetation and water, NIR spectroscopy can also help detect materials such as plastics [8][9][8,9], certain minerals [10], and tree health problems [11][12][11,12]. In addition, NIR-derived indices have also been used in tasks such as atmospheric correction [13]. In data-hungry machine learning or deep learning methods for land cover classification, NIR bands are able to improve coarse ground truth or correct mislabelling for some classes that are sometimes challenging for the human eye to interpret. Therefore, the generation of additional spectral bands from known bands has potential practical applications in Earth observation but has not yet been extensively explored. However, the underlying problem is that there are no precise physical models that map a spectral response from another wavelength. Signals that the sensor receives depend on many factors, including atmospheric effects, weather condition, land cover type, and terrain, etc. Ignoring these effects, rwesearchers want to test whether a simple end-to-end model is sufficient to generate additional bands from known bands on a large scale, without the myriad of input parameters of complicated models.
The generation of artificial NIR bands using only the visible spectrum can be considered as a nonlinear mapping problem. Neural networks have been shown to be effective in nonlinear mapping [14]. It could also be viewed as a generative problem, which can be addressed by neural networks and especially GANs. Unlike computer vision tasks, which usually have some level of abstraction from the input, theour task is to ensure that the generated NIR band is also consistent in structure and spatial distribution. To this end, additional loss functions, such as L1 or L2, are added to the GAN loss to ensure that the output is close to the ground truth [15]. However, such losses are prone to outliers. Several robust loss functions are able to handle outliers by being less sensitive to large errors. A single robust loss function proposed by Barron [16] integrates several common robust loss functions that are controlled by a single continuous-valued parameter that can also be tuned when training neural networks.

2. Near-Infrared Band Simulation

Traditional techniques for solving remote sensing problems have been continuously challenged by machine learning methods over the last decades. Machine learning methods have been extended to remote sensing images, which have some peculiarities compared to natural images. Among machine learning methods, GANs have gained increasing attention due to their versatility and performance. For example, GANs have been applied to tasks such as image super-resolution and pan-sharpening. Sajjadi et al. [17][18] proposed a GAN-based network to super-resolve natural images using automated texture synthesis combined with a perceptual loss that produced realistic texture, instead of just approximating the ground truth pixel during training. Jiang et al. [18][19] applied a GAN-based model with an edge enhancement subnetwork for super-resolution, which effectively removed noise in the reconstructed image and enhanced the contour of ground objects. This model takes advantage of the adversarial learning strategy, which is insensitive to noise. GAN-based models have also been applied in remote sensing image pre-processing, such as image dehazing and cloud removal. Enomoto et al. [19][20] adopted NIR bands with an RGB image in a GAN-based framework to remove clouds from satellite images. Grohnfeldt et al. [20][21] fused Sentinel-1 SAR data and Sentinel-2 optical multi-spectral imagery in a GAN framework to produce cloud-free optical images, which has shown more robust performance due to the properties of the synthetic aperture radar (SAR) sensor. GAN-based models have been applied to treat many problems as generative, such as monocular height estimation [21][22][22,23], DSM refinement [23][24], PAN-sharpening [24][25], image super-resolution [25][26], and change detection [26][27]. One way of approaching the spectral band simulation problem is to treat it as a classification problem. Within this framework, the mapping process could be viewed as involving the injection of spectral signatures into corresponding classes of ground objects. In recent years, neural networks have been widely used for land cover classification and have become a popular classification paradigm in the remote sensing community. The problem could also be viewed as a generative one as well, and, thus, can be tackled by neural-network- and GAN-based methods. Therefore, it is theoretically possible to simulate spectral reflectance from other bands using neural-network-based methods. Specifically, hyperspectral or multispectral image reconstruction, in which responses at multiple electromagnetic wavelengths need to be simulated, is a hot research top in spectral simulation using deep learning methods. The paper by Fu et al. [14] proposed a network structure for hyperspectral image reconstruction from RGB bands. The network consists of a spectral sub-network, which performs the spectral nonlinear mapping, and a spatial sub-network, which models the spatial correlation. The network uses a traditional loss function (mean square error) to force the generated bands to be numerically similar to the real ones. Deng et al. [27][28] proposed a neural-network-based method (M2H-Net) to reconstruct hyperspectral image from arbitrary number of input bands within spectral range of 300 to 2500 nm. The method was verified by UAV and satellite data captured at different locations in China. Zhao et al. [28][29] used a model trained by a hyperspectral benchmark dataset WHU-Hi-Honghu HSI [29][30] to convert true RGB to natural RGB, which was subsequently used with its multispectral pair to train an HSCNN-R network [30][31] for reconstruction. The model was trained with a multi-temporal and multispectral dataset of a maize field and successfully tested on the imagery of a rice field. Like many other remote sensing issues, the hyperspectral reconstruction work can be tackled by GAN-based methods as well. Alvarez-Gila et al. [31][32] used a conditional GAN to reconstruct a hyperspectral image from an RGB image. The method was trained and tested on 201 natural images of 1392 × 1300 1392 × 1300 . Liu and Zhao [32][33] proposed a scale attention pyramid UNet (SAPUNet) that adopted a dilated convolution for feature extraction and an attention mechanism for feature selection. SAPW-Net was proposed in the same work, with an additional branch for boundary supervision. The work achieved improved results on the interdisciplinary Computational Vision Lab at Ben Gurion University (ICVL) dataset [33][34]. Due to the prohibitively high cost of hyperspectral imagery, the amount of open-source hyperspectral datasets is limited, and the size of the available datasets is relatively limited. According to the review by [34][35], the available open-source hyperspectral datasets are only of small to medium size [34][35]. However, multispectral datasets have better availability, and many satellite missions have global coverage, such as the Sentinel mission, enabling large-scale experiments and analysis. The integration of additional NIR bands in cameras has practical applications in various remote sensing research fields, including vegetation monitoring. However, the widespread adoption of NIR-capable cameras is limited by cost and technical constraints [3]. To overcome this, numerous researchers have modified commercial RGB cameras to enable the capture of additional near-infrared band radiation for vegetation and soil monitoring by unmanned aerial vehicles (UAVs). For instance, Rabatel et al. [3] removed the NIR blocking filter and added an external long-wavelength pass filter to a single commercial camera (Canon 500D). The optimal external filter was determined by BSOP (band simulation by orthogonal projection), which relies on known sensitivity curves of the camera. Other studies employed two cameras to capture RGB and NIR images separately [4][5][4,5], which requires accurate pixel alignment. Brown and Süsstrunk [6] created a 477-image RGB-NIR dataset captured by a modified single-lens reflex (SLR) camera. The NIR and RGB bands were registered using a feature-based alignment algorithm [35][36] with robust estimation of a similarity motion model. The joint entropy analysis suggested that NIR contains significantly different information from the visible bands. Different from the methods that involve hardware modification, learning-based methods focusing on the image data using generative methods have also been studied. TheOur previous work [36][17] was one of the first to apply GANs to simulate NIR bands from RGB bands. Subsequently, several studies have been published to explore the potential of simulated NIR bands. Koshelev et al. [37] synthesized an NIR band to boost the performance of hogweed crops segmentation. Sa et al. [38] proposed a fruit detection dataset with RGB and NIR bands. Aslahishahri et al. [39] curated a dataset with blue, green, red, NIR, and red edge bands covering canola, lentil, dry bean, and wheat breeding fields. These works provide valuable datasets that can be used for specific remote sensing tasks.
Video Production Service