Deep Learning for Land Use | Encyclopedia MDPI

Deep Learning for Land Use: Comparison

Please note this is a comparison between Version 2 by Jessie Wu and Version 1 by Chunyang Wang.

Image super-resolution (SR) techniques can improve the spatial resolution of remote sensing images to provide more feature details and information, which is important for a wide range of remote sensing applications, including land use/cover classification (LUCC). Convolutional neural networks (CNNs) have achieved impressive results in the field of image SR, but the inherent localization of convolution limits the performance of CNN-based SR models.

super-resolution
land use/cover classification
deep learning
remote sensing

1. Introduction

Long time series and high-spatial-resolution remote sensing images play a crucial role in high-precision land use/cover classification (LUCC) ^[1]. However, due to the limitation of hardware technology and cost, publicly available remote sensing data with high spatial resolution usually do not have long time series. For example, the Sentinel-2 satellite has a spatial resolution of up to 10 m, but its temporal coverage starts from 2015, and even expensive commercial satellite data are usually available from 2000 onwards. In contrast, remote sensing data with long time series usually do not have high spatial resolution. For example, Landsat series satellites have been providing valuable data since 1972. These data are frequently utilized for time series land use analysis. However, their spatial resolution is limited to 30 m, which restricts their application in long-term series and high-precision LUCC analysis. Therefore, it is crucial to improve the spatial resolution of long time series and low-spatial-resolution remote sensing images by means of algorithm development. Traditional SR methods for remote sensing images mainly include interpolation ^[2], Pansharp ^[3], sparse representation-based ^[4], and convex set projection ^[5]. The interpolation method has the advantage of simplicity and speed, but the interpolation results are usually blurred. The Pansharp method requires the sensor to have a high spatial resolution of the panchromatic bands, and can then improve the spatial resolution of other bands by data fusion. Methods based on sparse representation and convex set projection have high computational complexity and have difficulty recovering high-frequency details of the image. Convex set projection, in particular, demands a substantial amount of prior knowledge ^[6]. In recent years, deep learning techniques have been rapidly developed and have achieved impressive results in various computer vision (CV) tasks, including image super-resolution. Using deep learning techniques, LR data can be processed with super-resolution to improve the spatial resolution, which provides an opportunity to obtain higher-quality LUCC maps ^[7].

2. Deep Learning for Image Super-Resolution

The super-resolution convolutional neural network (SRCNN) ^[8] was the first convolutional neural network used for image super-resolution. SRCNN uses a stack of three convolutional layers to directly learn the mapping relationship between LR and HR end-to-end. Deep residual learning ^[9] was proposed to shift the structure of deep learning networks towards greater depth. The very deep super-resolution network (VDSR) ^[10] improves the super-resolution performance of the network by using residual concatenation and stacking very deep convolutional layers. SRCNN and VDSR up-sample the image when it is fed into the network, which results in slower training and high computational resource usage. The fast super-resolution convolutional neural network (FSRCNN) ^[11] and the enhanced deep super-resolution network (EDSR) ^[12] up-sample the feature maps in the last part of the network and achieved better super-resolution results in terms of the peak signal-to-noise ratio (PSNR) metric. Subsequently, the residual channel attention network (RCAN) ^[13] achieved better performance than deep convolutional networks such as VDSR and EDSR. This improvement was achieved by integrating the attention mechanism into the super-resolution network. This breakthrough showcased the mechanism’s ability to reconstruct intricate texture details within the image. Consequently, the attention mechanism has become a widespread inclusion in various super-resolution networks. The multispectral remote sensing images super-resolution convolutional neural network (msiSRCNN) ^[14] verified the feasibility of applying a convolutional neural network to the super-resolution of multispectral remote sensing images by fine-tuning SRCNN, and achieved better results than the traditional methods. The convolutional neural network became the mainstream method for the super-resolution of remote sensing images. Remote sensing images are distinct from ordinary optical images due to their diverse feature types and varying scales among these features. Aiming to address these problems, researchers have proposed many new structures [15^{[15][16][17][18][19]},16,17,18,19], which are used to enhance the feature-learning capability of super-resolution networks for remote sensing images. Although CNN-based methods have achieved significant results in remote sensing image super-resolution tasks, the inherent localization of CNNs makes it difficult to model the global pixel dependencies of remote sensing images, which limits the further improvement of CNN performance in super-resolution tasks.

The Transformer ^[20] has quickly become the dominant approach in the field of natural language processing (NLP) with its powerful global modeling capabilities. The emergence of Vision Transformer (VIT) ^[21] introduced Transformer to the CV domain, achieving performance beyond CNNs on large datasets. Currently, there have been many works combining CNNs with Transformers for image super-resolution [22,23,24,25,26,27]^{[22][23][24][25][26][27]}. These methods use a CNN as a shallow feature extractor and Transformer for deep feature extraction, combining the local feature extraction capability of the CNN and the global modeling capability of Transformer to further improve the quality of SR images. Although Transformer can effectively compensate for the localization of CNN, in addition to the local–global learning capability, multi-scale feature learning is equally important for the super-resolution task of remote sensing images [17,18,27]^[17][18][27]. Unfortunately, Transformer does not have the ability of multi-scale feature learning. Numerous researchers have explored the integration of multi-scale information into the Transformer model [28,29,30,31]^{[28][29][30][31]}. However, these methods generally result in an increase in the number of network parameters, thereby further impeding the training process of the already large Transformer model. In addition to that, there are ways [29,32]^[29][32] to realize Transformer’s multiscale hierarchical representation of images by gradual down-sampling, but this is not applicable to the image super-resolution task. Although the CNN and Transformer-based methods are better than the traditional super-resolution methods and result in higher peak PSNR values, they are more ambiguous in visual perception.

Generative adversarial networks (GAN) ^[33] have powerful image-generation capabilities in image generation ^[34], style migration ^[35], and image super-resolution [36,37,38]^[36][37][38]. The GAN consists of two sub-networks, the generator and the discriminator, which are trained against each other in a ”zero-sum game”. The generator’s goal is to generate realistic images to deceive the discriminator, while the discriminator’s goal is to determine the authenticity of the input images. The generator updates the gradient based on the feedback from the discriminator. Adversarial training allows the GAN to generate images that are visually superior to the CNN. Super-resolution GAN (SRGAN) ^[36] uses the GAN network for the image super-resolution task and the pretrained VGG19 ^[39] network is used as a feature extractor to compute the perceptual loss to optimize the perceptual quality of the SR images. The enhanced super-resolution GAN (ESRGAN) ^[37] is an improvement of SRGAN; ESRGAN uses dense residual blocks to enhance the feature-learning capability of the network and removes the BatchNorm layer ^[40] from SRGAN. ESRGAN is still one of the most advanced image super-resolution methods. For the task of remote sensing image super-resolution, researchers have made many improvements to the GAN network, including the introduction of the attention mechanism [41^[41][42],42], processing after super-resolution ^[43], and improvements to the discriminator ^[44], etc. GAN-based methods have more powerful image-generation capabilities than CNN-based methods, which generate SR images with more details. Therefore, wresearchers choose to train ourreseachers' model using the GAN framework.

3. Deep Learning for Land Use/Cover Classification

Land use/cover classification can extract information of natural land types as well as artificially utilized land types from remote sensing images, which is important in the fields of ecological protection, urban planning, and precision agriculture. The traditional LUCC methods [45,46,47]^[45][46][47] often rely on artificially designed features, such as spectral indices ^[48], and the spatial correlation of pixels is ignored. In contrast to traditional classification methods, deep-learning-based approaches eliminate the dependence on artificial features. They effectively capture both the spatial and spectral features inherent in remote sensing images ^[49], leading to superior classification accuracy and enhanced robustness. Fully convolutional neural networks (FCN) ^[50] represent an enlightening approach for the semantic segmentation task based on deep learning, which can realize the classification of images at pixel level. U-net ^[51] is a new approach for semantic segmentation, which was initially proposed for biomedical image segmentation, but has been widely used for image segmentation in various fields, including remote sensing, due to its superior performance. The Deeplab [52,53,54,55]^{[52][53][54][55]} family of models are another classic set of approaches for image segmentation tasks as well as U-net networks. In contrast to the stepwise down-sampling structure of U-net, Deeplab employs dilated convolutions ^[56] to facilitate multi-scale feature learning, thereby enhancing the segmentation accuracy. At present, in the remote sensing image LUCC task, the Transformer-based classification method is one of the hotspots in research. The self-attention mechanism of Transformer means that it can model the spectral features well. Many researchers have opted to integrate CNNs and Transformer by utilizing a CNN for extracting the spatial features and employing Transformer to capture the spectral features. These methods incorporating both spatial and spectral features have achieved better accuracy in the LUCC task [57,58,59,60]^{[57][58][59][60]}. The morphFormer, proposed by Roy et al. ^[61], integrates the learnable spectral morphological convolution operation and a self-attention mechanism. This combination enhances the interaction between the spectral features and improves the representation of the structure and shape information in tokens. When compared to traditional CNNs and other Transformer LUCC models, morphFormer achieves higher classification accuracy in experiments. Thus, it stands as one of the most advanced LUCC methods available at present. In this restudyearch, this method was directly employed for the SR data in the second stage, specifically for the LUCC task.

The objective of this restudy earch is to enhance the spatial resolution of remote sensing images using deep learning techniques. This enhancement aims to provide richer and more accurate surface information for LUCC tasks, thereby further improving the precision of LUCC. This study research is mainly divided into two stages: remote sensing image SR and LUCC. In the SR stage, weresearchers propose a new model named the dilated Transformer GAN (DTGAN) for real remote sensing image super-resolution. The generator of this model combines CNN and Transformer, using a CNN as a shallow feature extractor and Transformer for deep feature extraction. At the same time, weresearchers seek to solve the problems of Transformer’s inability to learn multi-scale features and its slow computation and large resource consumption. This research is influenced by [32^{[32][62][63][64]},62,63,64], with regard to the attention mechanism called dilated window multi-head self-attention (DW-MHSA), which can introduce multi-scale information into the Transformer and improve the computation efficiency of the self-attention without increasing the network parameters. The discriminator of DTGAN uses PatchGAN ^[38]. In the LUCC stage, wresearchers directly adopt morphFormer ^[61] in the LUCC of SR to verify the availability of the SR data.

References

Vuolo, F.; Neuwirth, M.; Immitzer, M.; Atzberger, C.; Ng, W.T. How much does multi-temporal Sentinel-2 data improve crop type classification? Int. J. Appl. Earth Obs. Geoinf. 2018, 72, 122–130.
Zhang, L.; Wu, X. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE Trans. Image Process. 2006, 15, 2226–2238.
Vivone, G.; Alparone, L.; Chanussot, J.; Dalla Mura, M.; Garzelli, A.; Licciardi, G.A.; Restaino, R.; Wald, L. A Critical Comparison Among Pansharpening Algorithms. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2565–2586.
Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image Super-Resolution Via Sparse Representation. IEEE Trans. Image Process. 2010, 19, 2861–2873.
Lei, J.; Zhang, S.; Luo, L.; Xiao, J.; Wang, H. Super-resolution enhancement of UAV images based on fractional calculus and POCS. Geo-Spat. Inf. Sci. 2018, 21, 56–66.
Anna, H.; Rui, L.; Liang, W.; Jin, Z.; Yongyang, X.; Siqiong, C. Super-resolution reconstruction method for remote sensing images considering global features and texture features. Acta Geod. Cartogr. Sin. 2023, 52, 648.
Zhu, Y.; Geiß, C.; So, E. Image super-resolution with dense-sampling residual channel-spatial attention networks for multi-temporal remote sensing image classification. Int. J. Appl. Earth Obs. Geoinf. 2021, 104, 102543.
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a Deep Convolutional Network for Image Super-Resolution. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; pp. 184–199.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385.
Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. arXiv 2016, arXiv:1511.04587.
Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. arXiv 2016, arXiv:1608.00367.
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. arXiv 2017, arXiv:1707.02921.
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; pp. 294–310.
Liebel, L.; Körner, M. Single-Image Super Resolution for Multispectral Remote Sensing Data Using Convolutional Neural Networks. ISPRS—Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41B3, 883–890.
Lei, S.; Shi, Z.; Zou, Z. Super-Resolution for Remote Sensing Images via Local–Global Combined Network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247.
Zhang, D.; Shao, J.; Li, X.; Shen, H.T. Remote Sensing Image Super-Resolution via Mixed High-Order Attention Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5183–5196.
Lei, S.; Shi, Z. Hybrid-Scale Self-Similarity Exploitation for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–10.
Dong, X.; Wang, L.; Sun, X.; Jia, X.; Gao, L.; Zhang, B. Remote Sensing Image Super-Resolution Using Second-Order Multi-Scale Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3473–3485.
Huang, B.; He, B.; Wu, L.; Guo, Z. Deep Residual Dual-Attention Network for Super-Resolution Reconstruction of Remote Sensing Images. Remote Sens. 2021, 13, 2784.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.
Lei, S.; Shi, Z.; Mo, W. Transformer-Based Multistage Enhancement for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5615611.
Conde, M.V.; Choi, U.J.; Burchi, M.; Timofte, R. Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. In Proceedings of the Computer Vision—ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; pp. 669–687.
Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for Single Image Super-Resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 456–465.
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.V.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844.
Zheng, L.; Zhu, J.; Shi, J.; Weng, S. Efficient Mixed Transformer for Single Image Super-Resolution. arXiv 2023, arXiv:2305.11403.
Shang, J.; Gao, M.; Li, Q.; Pan, J.; Zou, G.; Jeon, G. Hybrid-Scale Hierarchical Transformer for Remote Sensing Image Super-Resolution. Remote Sens. 2023, 15, 3442.
Lee, Y.; Kim, J.; Willette, J.; Hwang, S.J. MPViT: Multi-Path Vision Transformer for Dense Prediction. arXiv 2021, arXiv:2112.11010.
Wang, W.; Yao, L.; Chen, L.; Lin, B.; Cai, D.; He, X.; Liu, W. CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention. arXiv 2021, arXiv:2108.00154.
Fan, H.; Xiong, B.; Mangalam, K.; Li, Y.; Yan, Z.; Malik, J.; Feichtenhofer, C. Multiscale Vision Transformers. arXiv 2021, arXiv:2104.11227.
Ren, S.; Zhou, D.; He, S.; Feng, J.; Wang, X. Shunted Self-Attention via Multi-Scale Token Aggregation. arXiv 2022, arXiv:2111.15193.
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030.
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661.
Yu, Y.; Gong, Z.; Zhong, P.; Shan, J. Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images. In Proceedings of the Image and Graphics, Shanghai, China, 13–15 September 2017; pp. 97–108.
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232.
Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arXiv 2017, arXiv:1609.04802.
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Loy, C.C.; Qiao, Y.; Tang, X. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. arXiv 2018, arXiv:1809.00219.
Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. arXiv 2021, arXiv:2107.10833.
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556.
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167.
Jia, S.; Wang, Z.; Li, Q.; Jia, X.; Xu, M. Multiattention Generative Adversarial Network for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5624715.
Wang, C.; Zhang, X.; Yang, W.; Li, X.; Lu, B.; Wang, J. MSAGAN: A New Super-Resolution Algorithm for Multispectral Remote Sensing Image Based on a Multiscale Attention GAN Network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5001205.
Jiang, K.; Wang, Z.; Yi, P.; Wang, G.; Lu, T.; Jiang, J. Edge-Enhanced GAN for Remote Sensing Image Superresolution. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5799–5812.
Lei, S.; Shi, Z.; Zou, Z. Coupled Adversarial Training for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3633–3643.
Cariou, C.; Chehdi, K. A new k-nearest neighbor density-based clustering method and its application to hyperspectral images. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 6161–6164.
Li, J.; Bioucas-Dias, J.M.; Plaza, A. Spectral–Spatial Hyperspectral Image Segmentation Using Subspace Multinomial Logistic Regression and Markov Random Fields. IEEE Trans. Geosci. Remote Sens. 2012, 50, 809–823.
Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790.
Kulkarni, K.; Vijaya, P.A. NDBI Based Prediction of Land Use Land Cover Change. J. Indian Soc. Remote Sens. 2021, 49, 2523–2537.
Huang, B.; Zhao, B.; Song, Y. Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery. Remote Sens. Environ. 2018, 214, 73–86.
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015, arXiv:1411.4038.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597.
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv 2016, arXiv:1412.7062.
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848.
Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587.
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611.
Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2015, arXiv:1511.07122.
Mei, S.; Song, C.; Ma, M.; Xu, F. Hyperspectral Image Classification Using Group-Aware Hierarchical Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5539014.
Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–Spatial Feature Tokenization Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522214.
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking Hyperspectral Image Classification With Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5518615.
Huang, X.; Dong, M.; Li, J.; Guo, X. A 3-D-Swin Transformer-Based Hierarchical Contrastive Learning Method for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5411415.
Roy, S.K.; Deria, A.; Shah, C.; Haut, J.M.; Du, Q.; Plaza, A. Spectral–Spatial Morphological Attention Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5503615.
Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150.
Hassani, A.; Walton, S.; Li, J.; Li, S.; Shi, H. Neighborhood Attention Transformer. arXiv 2023, arXiv:2204.07143.
Hassani, A.; Shi, H. Dilated Neighborhood Attention Transformer. arXiv 2023, arXiv:2209.15001.