Single-Image Super-Resolution Techniques

Single-Image Super-Resolution Techniques: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

The purpose of multisource map super-resolution is to reconstruct high-resolution maps based on low-resolution maps, which is valuable for content-based map tasks such as map recognition and classification. However, there is no specific super-resolution method for maps, and the existing image super-resolution methods often suffer from missing details when reconstructing maps.

deep learning
low-resolution map
map recognition
raster map
single image super-resolution

1. Introduction

Earth observation systems and big data techniques gave birth to the exponential growth of large amounts of professionally generated and volunteered raster maps of various formats, themes, styles, and other attributes. As a great number of activities, including land cover/land use mapping ^[1], navigation ^[2], trajectory analysis ^[3], socio-economic analysis ^[4], etc., benefit from the geospatial information included in these raster maps, precisely retrieving the massive map dataset has become a pressing task. Traditional approaches for map retrieval, such as online search engines, generally rely on map annotations or the metadata of map files rather than the map content. These map annotations assigned to a raster map might vary due to subjective understanding as well as diverse map generation goal, map themes, and other factors ^[5]. In comparison to map annotations and metadata, content-based map retrieval mainly focuses on employing the information included in a map to determine whether the retrieved map is truly needed by the user or for a task. Map text and symbols are the first map language and the essential part of map features with respect to map content ^[6]^[7]^[8]. Thus, map text and symbol recognition has become a main research aspect of big map data retrieval. Recently, deep learning techniques such as convolutional neural networks (CNNs) have shown great strengths in map text and symbol recognition. Furthermore, Zhou et al. ^[9] and Zhou ^[10] reported that deep learning approaches could effectively support the retrieval of topographical and raster maps by recognizing text information.

However, poor spatial resolution and limited data size remain the two main obstacles for implementing state-of-the-art deep learning approaches into map text and symbol recognition. In the era of big data and volunteered geographical information, a majority of available maps are designed and created in unprofessional ways. This always makes the text characters and symbols in these maps unavailable for visual recognition due to poor spatial resolution or small data size.

Super-resolution techniques support the conversion of low-resolution images into high-resolution ones by extracting the mappings between low-resolution images and their corresponding high-resolution ones. Super-resolution techniques include multi-frame super-resolution (MFSR) and single-image super-resolution (SISR) ^[11]^[12]. Considering that the generation of a map is always time-consuming, generating multiple types of maps that share a similar theme or style is generally impossible on the raster. Thus, SISR would be useful for research in map reconstruction. Currently, the community of machine intelligence and computer vision reported that CNNs have achieved great success in SISR ^[13]^[14]. However, due to the limited receptive field of convolution kernels, it is difficult to effectively utilize global features. Moreover, although increasing the network depth can expand the receptive field of CNNs to some extent, this strategy still cannot fundamentally solve the receptive field in the spatial dimension. Specifically, increasing the depth might lead to an edge effect—the reconstruction of image edges is significantly worse than in the middle of the image. Otherwise, vision transformer (ViT) mainly focuses on modeling the features of the global receptive field using the attention mechanism ^[15].

Unlike natural images, maps contain a variety of information on different scales, containing both geographical information with global features and detailed information such as legends and annotations. The former follows Tobler’s first law of geography and consists mainly of low-frequency information being available for reconstruction using Transformer for global modeling. The latter has a large amount of high-frequency information and requires the use of CNN modules to focus on reconstructing the local details of maps. Up to now, super-resolution methods that fuse global and local information for maps have not been reported.

2. SISR Based on Deep Learning

2.1. CNN-Based SR

CNNs have been used for a long time. In the 1990s, LeCun et al. ^[16] proposed LeNet using the backpropagation algorithm, which initially established the structure of CNNs. Dong et al. proposed the first super-resolution network, SRCNN ^[17], in 2014, which achieved results beyond previous traditional interpolation methods by using only three layers of convolution. As a pioneering work to introduce CNN into super-resolution, SRCNN had the problem of limited learning ability due to the shallow network, but it established the basic structure of image super-resolution, that is, the three-level structure of feature extraction, nonlinear mapping, and high-resolution reconstruction. They further proposed FSRCNN ^[18] in 2016, where they moved the upsampling layer back, allowing feature extraction and mapping to be performed on low-resolution images, they used several small convolutions instead of a large convolution, both of which reduced the computations, and they replaced the upsampling method from interpolation to transposed convolution, which enhanced the learning ability of the model.

ESPCN ^[19], VDSR ^[20], DRCN ^[21], and LapSRN ^[22] were proposed to improve the existing super-resolution models from different perspectives. ESPCN proposed a PixelShuffle method for upsampling, which was proven to be better than transposed convolution and interpolation methods and has been widely used in later super-resolution models. Inspired by ResNet and RNN, Kim et al. proposed two methods, VDSR and DRCN, to deepen the model and improve the feature extraction ability. LapSRN was proposed as a progressive upsampling method, which was faster and more convenient for multi-scale reconstruction than single upsampling.

Lim et al. ^[23] found that the Batch Normal (BN) layer normalized the image color and destroyed the original contrast information of the image, which hindered the convergence of training. Therefore, they proposed EDSR to remove the BN layer, implement a deeper network, and proposed residual scaling to solve the problem of numerical instability in the training process caused by the overly deep network. Inspired by DenseNet, Haris et al. ^[24] proposed DBPN, which proposed an iterative upsampling and downsampling process that provided an error feedback mechanism for each stage and achieved excellent results in large-scale image reconstruction.

2.2. Transformer-Based SR

In 2017, Ashish et al. ^[25] first proposed the Transformer model for machine translation using stacked self-attention layers and fully connected layers (MLP) to replace the circular structure of the original Seq2Seq. Because of the great success of Transformer in NLP, Kaiser et al. ^[26] soon introduced it to image generation work, and Alexey et al. ^[27] proposed Vision Transformer (ViT), which segmented images into blocks and then serialized them, using Transformer to implement image classification. In recent years, Transformer, especially ViT, has gradually attracted the attention of the SISR academic community.

RCAN ^[28] introduced Transformer to image super-resolution and proposed a channel attention (CA) mechanism that adaptively adjusted features considering the dependencies between channels, further improving the expressive capability of the network compared to CNN. Dai et al. ^[29] proposed a second-order channel attention mechanism to better adaptively adjust the channel features considering that the global covariance could obtain higher-order and more discriminative feature information compared with the first-order pooling used in RCAN.

Inspired by Swin Transformer, Liang et al. ^[30] proposed SwinIR, which used window partitioning to obtain several local windows, Transformer was used in each window, and the information across windows was fused by window shifting at the next layer. This solution of implementing Transformer for partitioned windows could greatly reduce the computations, had the advantage of processing larger sized images, and also could take advantage of Transformer by shifting windows to achieve modeling of the global dependency. Considering that the window partitioning strategy of SwinIR limited the receptive field and could not establish long dependencies at an early stage, Zhang et al. ^[31] used a fast Fourier convolutional layer with a global receptive field to extend SwinIR, while Chen et al. ^[15] combined SwinIR with a channel attention mechanism to propose a hybrid attentional Transformer model. Both approaches enabled SwinIR to establish long dependencies at the early stage and improved model performance using different methods.

Transformer breaks away from the dependence on convolution by using an attention mechanism and has a global receptive field, which can achieve better image super-resolution results compared to CNN. However, Transformer does not have the capability of capturing local features and is not sensitive enough to some local details of maps. Therefore, a Transformer model that fuses local features may be more effective for map super-resolution.

This entry is adapted from the peer-reviewed paper 10.3390/ijgi12070258

References

Herold, M.; Liu, X.; Clarke, K.C. Spatial metrics and image texture for mapping urban land use. Photogramm. Eng. Remote Sens. 2003, 69, 991–1001.
Foo, P.; Warren, W.H.; Duchon, A.; Tarr, M.J. Do humans integrate routes into a cognitive map? Map- versus landmark-based navigation of novel shortcuts. J. Exp. Psychol. Learn. Mem. Cogn. 2005, 31, 195–215.
Qi, J.; Liu, H.; Liu, X.; Zhang, Y. Spatiotemporal evolution analysis of time-series land use change using self-organizing map to examine the zoning and scale effects. Comput. Environ. Urban 2019, 76, 11–23.
Sagl, G.; Delmelle, E.; Delmelle, E. Mapping collective human activity in an urban environment based on mobile phone data. Cartogr. Geogr. Inf. Sci. 2014, 41, 272–285.
Li, H.; Liu, J.; Zhou, X. Intelligent map reader: A framework for topographic map understanding with deep learning and gazetteer. IEEE Access 2018, 6, 25363–25376.
Pezeshk, A.; Tutwiler, R.L. Automatic feature extraction and text recognition from scanned topographic maps. IEEE Trans. Geosci. Remote Sens. 2011, 49, 5047–5063.
Leyk, S.; Boesch, R. Colors of the past: Color image segmentation in historical topographic maps based on homogeneity. Geoinformatica 2010, 14, 1–21.
Pouderoux, J.; Gonzato, J.; Pereira, A.; Guitton, P. Toponym recognition in scanned color topographic maps. In Proceedings of the Ninth International Conference on Document Analysis and Recognition, Curitiba, Brazil, 23–26 September 2007; Volume 1, IEEE. pp. 531–535.
Zhou, X.; Li, W.; Arundel, S.T.; Liu, J. Deep convolutional neural networks for map-type classification. arXiv 2018, arXiv:1805.10402.
Zhou, X. GeoAI-Enhanced Techniques to Support Geographical Knowledge Discovery from Big Geospatial Data; Arizona State University: Tempe, AZ, USA, 2019.
Li, J.; Pei, Z.; Zeng, T. From beginner to master: A survey for deep learning-based single-image super-resolution. arXiv 2021, arXiv:2109.14335.
Li, K.; Yang, S.; Dong, R.; Wang, X.; Huang, J. Survey of single image super-resolution reconstruction. IET Image Process. 2020, 14, 2273–2290.
Yang, Z.; Shi, P.; Pan, D. A Survey of Super-Resolution Based on Deep Learning. In Proceedings of the 2020 International Conference on Culture-Oriented Science & Technology, Beijing, China, 28–31 October 2020; pp. 514–518.
Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 457–466.
Chen, X.; Wang, X.; Zhou, J.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. arXiv 2022, arXiv:2205.04437.
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE. 1998, 86, 2278–2324.
Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307.
Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407.
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883.
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654.
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645.
Lai, W.; Huang, J.; Ahuja, N.; Yang, M. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632.
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144.
Haris, M.; Shakhnarovich, G.; Ukita, N. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, 18–22 June 2018; pp. 1664–1673.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, A.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008.
Kaiser, L.; Bengio, S.; Roy, A.; Vaswani, A.; Parmar, N.; Uszkoreit, J.; Shazeer, N. Fast decoding in sequence models using discrete latent variables. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2390–2399.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2010, arXiv:2010.11929.
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 286–301.
Dai, T.; Cai, J.; Zhang, Y.; Xia, S.; Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11065–11074.
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844.
Zhang, D.; Huang, F.; Liu, S.; Wang, X.; Jin, Z. SwinFIR: Revisiting the SWINIR with fast Fourier convolution and improved training for image super-resolution. arXiv 2022, arXiv:2208.11247.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.