Spectral reconstruction of remote sensing images mainly focused on RGB or multispectral to hyperspectral. Spectral reconstruction methods can be divided into two branches: prior-driven and data-driven methods. Earlier researchers adopted the sparse dictionary method. With the development of deep learning, owing to its excellent feature extraction and reconstruction capabilities, more and more researchers are adopting deep learning methods to gradually replace the traditional sparse dictionary approach.
The GF-6 was successfully launched in 2018 as China’s first medium-high-resolution agricultural observation satellite, which cooperated with GF-1, China’s first high-resolution earth observation satellite that was launched in 2013. It can not only reduce the time of remote sensing data acquisition from 4 days to 2, but also significantly improve the ability to monitor agriculture, forestry, grassland, and other resources, providing remote sensing data support for agricultural and rural development, ecological civilization construction 
, and other significant needs. GF-6 also realized the localization of the 8-band CMOS detector and added the red-edge band that can effectively reflect the unique spectral characteristics of crops 
However, GF-1 was launched earlier and is mission-oriented differently, so it only contains four multispectral bands. Compared with the GF-6 satellite in Table 1, GF-1 lacks four bands (purple, yellow, red-edge I, and red-edge II bands), which greatly constrains its development for crop-related joint monitoring. Therefore, researchers try to find a spectral reconstruction method to reconstruct the lacking four bands.
Table 1. Band specification of the GF-1 PMS and GF-6 WFV images.
In recent years, spectral reconstruction mainly focused on RGB or multispectral to hyperspectral. Earlier researchers adopted the sparse dictionary method 
. With the development of deep learning, owing to its excellent feature extraction and reconstruction capabilities, more and more researchers are adopting deep learning methods to gradually replace the traditional sparse dictionary approach 
In addition, it should be pointed out that most studies on spectral reconstruction focus on visible three bands (red, green, and blue) images, while remote sensing images usually contain at least four bands (red, green, blue, and nir). This results in the lack of one essential nir band as the input, which does not make full use of the original information, thereby leading to a waste of information. There are already some studies of remote sensing spectral reconstruction considering this problem 
. Few studies have been conducted on large-scale and highly complex scenarios such as satellite remote sensing. On the contrary, most of them have only done performed research in a relatively small area 
. Most deep learning methods adopt a lot of up-sampling, down-sampling, and nonlocal attention structure for ground images. Due to the large-scale, numerous, and complex ground objects of remote sensing images, these structures are difficult to play an excellent effect in the spectral reconstruction of remote sensing images 
2. Spectral Reconstruction Methods for Remote Sensing Images
Due to the limitations of the hardware resources (bandwidth and sensors), researchers have had to make trade-offs in the temporal, spatial, and spectral dimensions of remote sensing images. With the problem of low spectral dimension, researchers mainly used principal component analysis (PCA) 
, Wiener estimation (WEN) 
, and pseudoinverse (PI) 
to construct a spectral mapping matrix. In recent years, spectral reconstruction methods have been divided into two branches: prior-driven and data-driven methods.
The first type is mainly based on sparse dictionary learning, which aims to extract the most important spectral mapping features. It can represent as much knowledge as possible with as few resources as possible, and this representation has the added benefit of being computationally fast. For example, Arad and Ben-Shahar 
were the first to apply an overcomplete dictionary to recover hyperspectral images from RGB. Jonas et al. 
used the A+ algorithm to improve Arad’s approach to the sparse dictionary. The A+ algorithm directly constructs the mapping from RGB to hyperspectral at the local anchor point, and the running speed of the algorithm is significantly improved. The sparse dictionary method only considers the sparsity of spectral information and does not use local linearity. The disadvantage is that the reconstruction is inaccurate, and the reconstructed image has metamerism 
. Li et al. 
proposed a locally linear embedding sparse dictionary method to improve the representation ability of sparse coding. In order to improve the representation ability of the sparse dictionary, this method only selects the local best samples and introduces texture information in the reconstruction, reducing the metamerism. Geng et al. 
proposed a spectral reconstruction method that preserves contextual information. Gao et al. 
performed spectral enhancement of multispectral images by jointly learning low-rank dictionary pairs from overlapping regions.
The second type is mainly based on deep learning. With the development of deep learning, a large number of excellent models have gradually replaced the first method owing to its powerful generalization ability. However, compared to the first one, deep learning usually requires enormous amounts of data, and the training process takes a lot of computational time. However, with the increase in computing power, deep learning becomes much more effective, and the related methods are used by more and more researchers. Xiong et al. 
proposed a deep learning framework for recovering spectral information from spectrally undersampled images. Koundinya et al. 
compared 2D and 3D kernel-based CNN for spectral reconstruction. Alvarez-Gila et al. 
posed spectral reconstruction as an image-to-image mapping problem and proposed a generative adversarial networks for spatial context-aware spectral image reconstruction. In the NTIRE 2018 
first spectral reconstruction challenge, the entries of Shi et al. 
ranked in first (HSCNN-D) and second (HSCNN-R) place on both the “Clean” and “Real World” tracks. The main difference between the two networks is that the former adopts a series method for feature fusion, while the latter is an addition method. The series method can learn the mapping relationship between spectra very well. Respectively considering shallow feature extraction and deep feature extraction, Li et al. 
proposed an adaptive weighted attention network, which obtained the first rank on the “Clean” track. Zhao et al. 
proposed a hierarchical regression network (HRNet) that obtained first place on the “Real World” track; it is a 4-level multi-scale structure that uses down-sampling and up-sampling to extract spectral features. In the processing of remote sensing images, Deng et al. 
proposed a more suitable network (M2H-Net) for remote sensing to meet the needs of multiple bands and complex scenes. Li and Gu 
proposed a progressive spatial-spectral joint network for hyperspectral image reconstruction.