Deep Learning Approaches for Wildfires Using Satellite Data

Deep Learning Approaches for Wildfires Using Satellite Data: Comparison

Please note this is a comparison between Version 2 by Rafik Ghali and Version 4 by Rafik Ghali.

Wildland fires are one of the most dangerous natural risks, causing significant economic damage and loss of lives worldwide. Every year, millions of hectares are lost, and experts warn that the frequency and severity of wildfires will increase in the coming years due to climate change. To mitigate these hazards, numerous deep learning models were developed to detect and map wildland fires, estimate their severity, and predict their spread. In this paper, we provide a comprehensive review of recent deep learning techniques for detecting, mapping, and predicting wildland fires using satellite remote sensing data. We begin by introducing remote sensing satellite systems and their use in wildfire monitoring. Next, we review the deep learning methods employed for these tasks, including fire detection and mapping, severity estimation, and spread prediction. We further present the popular datasets used in these studies. Finally, we address the challenges faced by these models to accurately predict wildfire behaviors, and suggest future directions for developing reliable and robust wildland fire models.

fire detection
fire mapping
fire spread
damage severity
smoke
wildfire
satellite
deep learning

1. Satellite Systems

Spaceborne systems use satellites in space to provide telecommunication services. They cover a very large area and provide a secure connection that is not affected by physical and weather obstacles compared to terrestrial systems. They are employed for many applications such as tracking the position of ships, sending and receiving data, collecting data about the earth’s surface, monitoring, and analyzing a variety of environmental changes.

Recently, satellite systems were adopted as a solution for detecting, monitoring, and mapping wildland fires, as well as in firefighting on the earth’s surface in near-real-time. They use thermal, optical, vision, and radar sensors to produce accurate information such as temperature, humidity, vegetation, atmospheric conditions, meteorological data, topographic data, historical fires, and human activities by providing information on the location and intensity of fires. Optical sensors can detect changes in vegetation and land cover that may indicate the presence of smoke and fire. Thermal sensors can detect heat information associated with smoke and flames and provide information on the temperature and intensity of fires. Radar sensors transmit and receive signals to see through smoke, darkness, and clouds to generate high-resolution images of the land surface, even under nighttime conditions. These data are then processed using mathematical models or artificial intelligence techniques such as ML and DL models, to detect and monitor potential wildland fire activities. In addition, the information obtained from satellite remote sensing systems can be employed to: (1) Support evacuation efforts by providing real-time information about the extent and location of a wildfire, which can be used to ensure the safety of nearby human populations. (2) Predict wildfire behaviors by estimating and tracking fire spread rates. This information helps to allocate firefighting resources and to develop efficient firefighting strategies. (3) Identify fire perimeters by detecting the boundaries of a wildfire. This information can be used to generate wildfire maps and to provide firefighting operations with better situational awareness. (4) Assess the impact of wildfires by determining the damage caused by a wildfire and estimating the severity of burned areas. This information can be used to plan post-fire recovery efforts and to protect ecosystems. Based on their orbit, these systems can be divided into three categories: geostationary orbit (GEO), low-earth orbit (LEO), and polar sun-synchronous orbit (SSO).

Geostationary orbit (GEO) circles the earth above the equator of the earth following the rotation of the earth. It orbits the earth at an altitude of 35,786 km. The satellite in GEO appears to be stationary above a fixed point on the earth’s surface, thus providing continuous coverage to the same area. Many satellites are used, such as Geostationary Operational Environmental Satellites (GOES) [24], Landsat [25], and Sentinel [26]. They have a high temporal resolution. GOES offers a high spatial, temporal, and spectral resolution. It provides accurate weather information and real-time weather monitoring. However, some of them have a low spatial resolution and long revisit times, for example, eight days for LandSat-8, and 2 to 3 days for Sentinel-2B. GEO systems allow for the detection of the size, intensity, and locations of wildfires. They provide information on the wind direction and speed, which can help in estimating the spread of wildfires and in firefighting operations.
Low-earth orbit (LEO) is an orbit centered on the earth having an altitude of less than or equal to 2000 km. Its orbital period is less than one day. It is more suitable for observation and communication as it is closer to the earth. It provides high-resolution imagery, low communication latency, and high bandwidth. However, LEO satellites have a limited lifetime due to their low altitude. LEO systems can be used to detect wildland fires, as well as their locations and behaviors, which helps firefighting operators provide accurate strategies for wildfire prevention.
Polar sun-synchronous orbit (SSO) is an orbit around the earth in which the ground track of the satellite follows the sun. It has the same position at all times, with an altitude of between 200 and 1000 km, which allows for a continuous coverage of a precise zone at the same time and place every day. Numerous satellites are used, such as MODIS (Moderate Resolution Imaging Spectroradiometer) [27], AVHRR (Advanced Very-High-Resolution Radiometer) [28], and VIIRS (Visible Infrared Imaging Radiometer Suite) [29]. SSO satellites are used for monitoring the climate and for forecasting the weather. They are also capable of detecting and monitoring wildland fires, providing the size, location, and intensity of wildfires, as well as their spread based on weather information. However, their lifetime is very limited due to their low altitude .

2. Deep Learning-Based Approaches for Fire Detection Using Satellite Data

To detect and monitor fires on remote sensing satellite images, DL-based fire segmentation and detection methods have been developed in recent years. Both methods have shown an interesting result compared to traditional ML methods. In addition, they are very useful for efficient fire management. Fire detection focuses on identifying the presence of fire (smoke, flame, or both) and classifying it (see Figure 1), while fire segmentation is the process of grouping similar pixels of smoke or flame in an input satellite image based on their characteristics such as color, shape, and texture, and outputting the result as a mask (see Figure 2).

Figure 1. Fire detection based on DL model.

Figure 2. Fire segmentation based on DL model.

DL models were used to analyze smoke ignition and to automatically detect the presence of fires. They are capable of recognizing patterns in satellite imagery data corresponding to smoke plumes and fires, and of using this information to identify fire instances as they appear. Numerous DL models were proposed to detect and segment fires using satellite data. CNN (Convolutional Neural Network) is a popular approach used to detect smoke by analyzing satellite images. CNNs are designed to identify patterns in visual data, and to recognize smoke plumes and other fire-related features. Kang et al. [30] developed a CNN, which consists of three 3 × 3 convolutional layers, each followed by a ReLU activation function and max pooling layer, and two fully connected layers to detect forest fires on geostationary satellite data. Using 2157 Himawari-8 images with 93,270 samples without fire and 7795 samples with fire, the proposed CNN showed superior performance, achieving an F1-score of 74% compared to the random forest method. Azami et al. [31] evaluated the CNN models (MiniVGGNet, ShallowNet, and LeNet) in detecting and recognizing wildfires on images collected by the KITSUNE CubeSat. Using 715 forest fire images, MiniVGGNet, ShallowNet, and LeNet achieved an accuracy of 98%, 95%, and 97%, respectively. Kalaivani and Chanthiya [32] proposed a custom optimized CNN which integrated an ALO (Antlion Optimization) method inside a PReLU activation function to detect forest fires. An accuracy of 60.87% was achieved using Landsat satellite images. Seydi et al. [33] presented a deep learning-based active forest fire detection method called Fire-Net, which consists of residual, point/depth-wise convolutional, and multiscale convolutional blocks. Fire-Net was trained using 578 Landsat-8 images, and tested on 144 images of the Australian forest, Central African forest, Brazilian forest, and Chernobyl areas, achieving F1-scores of 97.57%, 80.52%, 97.00%, and 97.24%, respectively. Palacio and Ian [34] used two pretrained deep learning models, MobileNet v2 [35] and ResNet v2 [36], on the ImageNet dataset [37] to predict wildfire smoke through satellite imagery of the California regions. Using fire perimeters, fire information (date, area, longitude, and altitude), and a historical fire map collected from Landsat 7 and 8, and MobileNet v2 obtained the best accuracy of 73.3%. Zhao et al. [38] investigated the impact of using the IR (infrared) band in detecting smoke. They proposed a lightweight CNN, called VIB_SD (Variant Input Bands for Smoke Detection), which integrates the inception structure, attention method, and residual learning. VIB_SD was trained using 1836 multispectral based on Landsat 8 OLI and Landsat 5 TM imagery data, and divided into three classes, (“Clear”, “Smoke”, and “Other_aerosol”), with horizontal/vertical flip as the data augmentation methods. The experimental results showed that adding an NIR band improved the performance by 5.96% compared to only using an RGB band (an accuracy of 86.45%). Wang et al. [39] proposed a novel smoke detection method named DC-CMEDA (Deep Convolution and Correlated Manifold Embedded Distribution Alignment), consisting of deep CNN (ResNet-50) and CMEDA as an improved MEDA (Manifold Embedded Distribution Alignment). First, ResNet-50 extracted the smoke features of the target and source domains (satellite and video images) data. Then, CMEDA was employed to reduce the bias in the source domain and make it more similar to the target domain, Finally, the presence or absence of smoke was generated as the output. A total of 200 satellite images and 200 RGB images were utilized in DC-CMEDA training, each including 100 smoke and 100 non-smoke images. In transferring from satellite images to video images, DC-CMEDA achieved an accuracy of 93.0%, overcoming the state-of-the-art methods, Filonenko’s method [40], and DC-ILSTM [41], by 2.5% and 1.5%, respectively. In transferring from video images to satellite images, DC-CMEDA also reached a high accuracy of 89.5%, surpassing Filonenko’s method and DC-ILSTM by 6.5% and 4.0%, respectively. Higa et al. [42] explored object detection methods such as PAA [43], VFNET [44], ATSS [45], SABL [46], RetinaNet [47], and Faster R-CNN [48] to detect and locate active fires and smoke in the Brazilian Pantanal regions. Using 775 images downloaded from the CBERS (China-Brazil Earth Resources Satellite) 04A WFI dataset [49], VFNET achieved the highest F1-score of 81%. Ba et al. [50] proposed a smoke detection method based on CNN and SmokeNet, using MODIS data. They presented a new dataset, called USTC_SmokeRS [51], comprising 6255 satellite images of smoke and various classes very close to smoke, such as haze, clouds, fog, etc. SmokeNet is a modified AttentionNet method that merges spatial and channel-based attention with residual blocks. It achieved an accuracy of 92.75%, outperforming VGG [52], ResNet [36], DenseNet [53], AttentionNet [54], and SE-ResNet [55]. Phan et al. [56] proposed a 3D CNN model to locate wildfires using satellite images collected from the GOE satellite, GOES-16. They integrated spatial and spectral information at the same time. The 3D CNN contains three 3D convolutional layers, followed by the ReLU activation function and batch normalization, fully connected layer, and softmax function. Imagery data and weather information were used as inputs to detect the presence of forest fires. Using 384 satellite images, an F1-score of 94% was achieved, outperforming baseline models such as MODIS-Terra [5], AVHRR-FIMMA [57], VIIRS-AFP [58], and GOES-AFP [59]. Hong et al. [60] designed a new CNN, FireCNN, to detect fires in Himawari-8 satellite images. FireCNN consists of three convolutional blocks and a fully connected layer. Each convolutional block consists of two convolutional layers, each followed by a ReLU activation and a max pooling layer. FireCNN was tested on a dataset containing 3646 non-fire spots and 1823 fire spots [61], obtaining an accuracy of 99.9% higher than the traditional ML methods (thresholding, SVM (Support Vector Machine), and random forest). Wang et al. [62] employed Swin transformer [63], which adopts attention mechanism to model local and global dependencies in detecting smoke and flame. A set of 5773 images obtained from FASDD (Flame and Smoke Detection Dataset) [64] were used in training this model, obtaining a mAP (mean Average Precision) of 53.01%.

FCN (Fully Convolutional Network) and the encoder–decoder model are the widely used types of CNNs in image segmentation tasks. FCN is the first CNN developed for pixel-level classification. It employs a series of convolutional and pooling layers to extract features from the input image, and then generates a binary mask as the output. Larsen et al. [65] proposed an FCN to predict smoke in satellite images acquired by Himawari-8 and the NOAA (National Oceanic and Atmospheric Administration) Visible Infrared Imaging Radiometer Suite in near-real-time [66]. FCN consists of four convolution layers, three max pooling layers, three transposed convolution layers, batch normalization, and ReLU activation functions. It was trained on 975 satellite images, attaining an accuracy of 99.5%. The encoder–decoder architecture contains two parts (encoder and decoder). The encoder performs convolutional and pooling layers to extract high- and low-level features, while the decoder employs transposed convolutions, which are designed to upsample the compressed feature map to produce a segmentation mask as output. U-Net [67] is the popular encoder–decoder architecture used for image segmentation. It also employs skip connections to combine the features from the encoder and the decoder to better capture finer details and to produce a more accurate result.

3. Deep Learning-Based Approaches for Fire Mapping Using Satellite Data

Similarly to fire detection, fire mapping can provide maps to visualize the intensity and location of detected wildland fires. It is the process of generating maps, and showing the locations and extents of wildland fires. These maps were utilized for a wide variety of purposes, such as fire damage reporting, firefighting and evacuation efforts planning, and wildland fire management. Many DL approaches were adopted for the fire mapping task.

Belenguer-Plomer et al. [88] investigated CNN performance using Sentinel-1 and/or Sentinel-2 data, which were downloaded from the Copernicus Open Access Hub in detecting and mapping burned areas. The proposed CNN comprises two convolutional layers, each one followed by the ReLU activation function, max pooling layer, two fully connected layers, and the sigmoid function to predict the probabilities of burned or unburned areas. It reached a Dice coefficient of 57% and 70% using Sentinel-1 and Sentinel-2 data, respectively. Abid et al. [89] developed an unsupervised deep learning model to map the burnt forest zones on Sentinel-2 imagery data of Australia. First, a pretrained VGG16 was used to extract features of input data Then, K-means clustering and thresholding methods were used to perform the clustering of input data, which has similar features. This method achieved an F1-score of 87% using the real-time data of Sentinel-2 as the learning data. Hu et al. [90] explored numerous semantic segmentation networks such as U-Net, Fast-SCNN [91], DeepLab v3+ [86], and HRNet [92] in mapping burned areas using multispectral images of the boreal forests in Sweden and Canada, and the Mediterranean regions (Portugal, Spain, and Greece). Sentinel-2 and Landsat-8 images and data augmentation methods (resizing, mirroring, rotation, aspect, cropping, and color jitter) were used in training these DL models. The testing results demonstrated that DL-based semantic segmentation models showed higher performances compared to ML methods (LightGBM, KNN, and random forest) and thresholding methods, NBR (Normalized Burnt Ratio). and dNBR (delta NBR). As an example, with Sentinel-2 images, U-Net achieved the best Kappa coefficient of 90% in a Mediterranean fire site in Greece, and Fast-SCNN performed better, with a kappa coefficient of 82% in a boreal forest fire in Sweden. With Landsat-8 images, HRNet reached the best Kappa coefficient of 78% in a Sweden forest fire. Cho et al. [93] also employed U-Net as a semantic segmentation method to map burned areas. They used learning data, including satellite images with a resolution of 3 m per pixel collected from the PlanetScope dataset [94], and their ground truth masks, the dissimilarity obtained from GLCM (Gray-Level Co-occurrence Matrix), NDVI (Normalized Difference Vegetation Index), and land cover map data, as well as the topographic normalization of each image to reduce the effect of shadow, and a data augmentation technique (rotation, mirroring, and horizontal/vertical flip) to train U-Net, achieving F1-scores of 93.0%, 93.8%, and 91.8% in the Andong, Samcheok, and Goseong study areas, respectively. Brunt and Manandhar [95] also used U-Net to map burned areas in Sentinel-2 images of Indonesia and Central African regions. U-Net obtained an F1-score of 82%, 91%, and 92% using the Indonesia test data, the Central Africa test data, and the test data of both regions, respectively. Seydi et al. [96] developed a DL method, Burnt-Net, to map burned areas from post-fire Sentinel-2 images. Burnt-Net is an encoder–decoder architecture, including convolutional layers, ReLU functions, max pooling layers, batch normalization layers, residual multi-scale blocks, morphological operators (dilation and erosion), and transposed convolutional layers. Burnt-Net was trained with Sentinel-2 images of wildland fires in Spain, France, and Greece, France, and tested using wildland fires located in Portugal, Turkey, Cyprus, and Greece, obtaining an accuracy of 98.08%, 97.38%, 95.68%, and 95.51%, respectively, superior to the accuracy of U-Net and the Landsat burned area product. Prabowo et al. [97] also employed U-Net to map burned areas. They collected a new dataset comprising 227 satellite images with a resolution of 512 × 512 pix. acquired by the Landsat-8 satellite in Indonesian regions, and their corresponding binary masks. Using a data augmentation method (rotation), U-Net obtained a Jaccard index of 93%. U-Net was also evaluated in Colomba et al. [98] to map burned areas. It was trained and evaluated with 73 images downloaded from the satellite burned area dataset [98,99] and data augmentation techniques (rotation, shear, and vertical/horizontal flip), obtaining an accuracy of 94.3%. Zhang et al. [100] performed deep residual U-Net, which adopts the ResNet model as a feature extractor to map wildfires using the Sentinel-2 MSI time series and Sentinel-1 SAR data. Deep residual U-Net reached an F1-score of 78.07% and 84.23% on the Chuckegg Creek fire data acquired in 2019, and on the Sydney fire data collected between 2019 and 2020, respectively. Pinto et al. [101] proposed a deep learning approach, BA-Net (Burned Areas Neural Network), based on daily sequences of multi-spectral images to identify and map burned areas. BA-Net is an encoder–decoder with five connections between the encoder and decoder. The encoder comprises ST-Conv3 modules and spatial convolution. Each ST-Conv3 consists of two 3D convolution layers, followed by batch normalization with a ReLU activation function and a 3D dropout layer. The decoder contains UpST-Conv3 modules and 3D transposed convolutions. Each UpST-Conv3 module includes two 3D transposed convolution layers, followed by batch normalization with the ReLU activation function and a 3D dropout layer. Several datasets covering five regions (California, Brazil, Mozambique, Portugal, and Australia) were used to train and test this approach: VIIRS Level 1B data [102], VIIRS Active Fires data [58], MCD64A1C6 [103], FireCCI51 [104], Landsat-8 53 scenes [105], the FireCCISFD11 dataset [106], the MTBS dataset [107], TERN AusCover data [108], and ICNF Burned Areas [109]. BA-Net showed an excellent result (a Dice coefficient of 92%) in dating and mapping burned areas, outperforming the FireCCI51 simulators, thus confirming its ability in determining the spatiotemporal relations of active fires and the daily surface reflectances. Seydi et al. [110] presented a new method, DSMNN-Net (Deep Siamese Morphological Neural Network) to map burned areas using PRISMA (PRecursore IperSpettrale della Missione Applicativa) and Sentinel-2 multispectral images of the Australian continent. Two deep feature extractors, which adopt 3D/2D convolutional layers and a morphological layer based on dilation and erosion, were employed to extract features from the pre-fire and post-fire datasets. Two scenarios were investigated: in the first scenario, pre- and post-fire datasets collected from Sentinel-2; and in the second scenario, pre-fire datasets downloaded from Sentinel-2 and post-fire datasets obtained from PRISMA. The numerical results showed that DSMNN-Net achieved an accuracy of 90.24% and 97.46% in the first and second scenarios, respectively, outperforming existing state-of-the-art methods such as CNN proposed by Belenguer-Plomer et al. [88], random forest [111,112], and SVM [113] .

4. Deep Learning-Based Approaches for Fire Susceptibility Using Satellite Data

Deep learning models were applied to estimate fire severity and susceptibility using vegetation data, meteorological data, topographic data, historical fires, and human activities by providing information on the locations and intensities of fires, as shown in Figure 3. The severity level refers to the degree of vegetation damage caused by the wildland fire, and can be classified as very low, low, moderate, high, or very high, according to the severity of the damage.

Figure 3. Fire severity damage prediction based on DL model.

Zhang et al. [114] proposed a spatial prediction model based on CNN for wildfire susceptibility modeling in China (Yunnan province). This CNN includes three convolutional layers, a ReLU activation function, two pooling layers, and three fully connected layers. The authors used numerous data, including the interpretation of satellite images and historical fire reports from 2002 to 2010 (7675 fires occurred) to generate a wildfire event map as the output. They also used fourteen fire influencing factors, divided into three categories: vegetation-related (distance to road, distance to rivers, NDVI, and forest coverage ratio), climate-related (average temperature, specific humidity, average precipitation, average wind speed, precipitation rate, and maximum temperature) [115], and topography-related (aspect, slope, and elevation) [116]. A higher accuracy of 95.81% was achieved, outperforming four benchmark models that are multilayer perceptron neural networks, random forests, kernel logistic regression, and SVM. Prapas et al. [117] proposed a DL method, named ConvLSTM, for forest fire danger forecasting in the regions of Greece. The Datacube dataset [118] was used in training and testing this model. It contains burned areas and fire information (climate data, human activity, and vegetation information) between 2009 and 2020, daily weather data, satellite data collected from MODIS (LAI (Leaf Area Index), Fpar (Fraction of Photosynthetically Active Radiation), NDVI, EVI (Enhanced Vegetation Index), day/night LST (Land Surface Temperature), road density, land cover information, and topography data (aspect, slope, and elevation). ConvLSTM reached a precision of 83.2% better than random forest and LSTM (Long Short-Term Memory). Zhang et al. [119] studied the ability of CNN in predicting forest fire susceptibility maps divided into five levels (very high, high, moderate, low, and very low). Based on the processing cell type (grid and pixel), they proposed two CNNs, CNN-1D based on pixel cells and CNN-2D based on grid cells. CNN-1D consists of two 1D convolutional layers, three fully connected layers, ReLU activation functions, and a sigmoid function. CNN-2D contains two 2D convolutional layers, each one followed by ReLU activation and the max pooling layer, and three fully connected layers, the first two of which are followed by the ReLU activation function, and the last by a sigmoid function. Various data were employed in learning CNN: daily dynamic fire behaviors, individual fire characteristics, and estimated day of burn information, collected from the GFA (Global Fire Atlas) product from 2003 to 2016; metrology data obtained from the GLDAS (NASA Global Land Data Assimilation System), including average temperature, specific humidity, accumulated precipitation, soil moisture, soil temperature, and standardized precipitation index; vegetation data (LAI and NDVI) downloaded from the USGS (United States Geological Survey) website. Testing results showed that CNN-2D achieved accuracies of 91.08%, 89.61%, 93.18%, and 94.88% for four seasons (March–May, June–August, September–November, and December–February, respectively), surpassing the accuracies of CNN-1D and multilayer perceptron models. Le et al. [120] developed deep neural computing, Deep-NC, which includes three ReLU activation functions and a sigmoid function to predict wildfire danger in Gia Lai province in Vietnam. In total, there were 2530 historical fire locations from 2007 to 2016; 2530 non-forest fire data points, slope, land use, NDWI (Normalized Difference Water Index), aspect, elevation, NDMI (Normalized Difference Moisture Index), curvature, NDVI, speed of the wind, relative humidity, temperature, and rainfall information were assessed to remove noise and were used as input [121]. In the training step, four optimizers that are SGD (Stochastic Gradient Descent), Adam (Adaptive Moment Estimation), RMSProp (Root Mean Square Propagation), and Adadelta were employed to optimize Deep-NC’s weights. Deep-NC with Adam optimizer reached an accuracy of 81.50%. Omar et al. [122] employed a DL method, which consists of LSTM, fully connected layers, dropout, and a regression function in predicting forest fires. In total, 536 instances and twelve features, including temperature, relative humidity, rain, wind, ISI (Initial Spread Index), DMC (Duff Moisture Code), FWI (Forest Fire Weather Index), FFMC (Fine Fuel Moisture Code), and BUI (Buildup Index) were used to train the proposed model, obtaining an RMSE (Root Mean Squared Error) score of 0.021, and surpassing machine learning methods (decision tree, linear regression, SVM, and a neural network). Zhang et al. [123] developed a hybrid deep neural network (CNN2D-LSTM) to predict the global burned areas of wildfires based on satellite burned area products and historical time series predictors. CNN2D-LSTM includes two convolutional layers, three fully connected layers, ReLU functions, two max pooling layers, and two LSTM layers. A good RMSE of 4.74 was obtained using monthly burned area data between 1997 and 2016, collected from the GFED dataset, and temporally dynamic predictors (monthly maximum/minimum temperatures, average temperature, specific humidity, accumulated precipitation, soil moisture, soil temperature, standardized precipitation index, LAI, and NDVI), which affect forest fires. Shoa et al. [124] proposed a DL model, which includes linear layers, batch normalization layers, and LeakyReLU activation functions to predict the occurrence of wildfires in China. To train the proposed model, they used historical fire data (96,594 wildfire points collected on MODIS from 2001 to 2019) in China’s regions, climatic data (daily maximum temperature, average temperature, daily average ground surface temperature, average air pressure, maximum wind speed, sunshine hours, daily average relative humidity, and average wind speed), vegetation data (fractional vegetation cover), topographic data (slope, aspect, and elevation), and human activities (population, gross domestic product, special holiday, residential area, and road). The testing results showed that the proposed DL model reached an accuracy of 87.4%. Shams-Eddin et al. [125] proposed the 2D/3D CNN method to predict wildfire danger. The 2D CNN method was used to generate static inputs such as a digital elevation model, slope, distance to roads, population density, distance to waterways, etc., while 3D CNN generated dynamic inputs, including temperature, day/night land surface temperature, soil moisture index, relative humidity, wind speed, 2 m temperature, NDVI, surface pressure, 2 m dewpoint temperature, and total precipitation. Two LOADE (Location-Aware Adaptive Denormalization) blocks were also integrated into the 3D CNN to modulate dynamic features based on static features. Using FireCube [126] and NDWS (Next Day Wildfire Spread) [79] datasets, 2D/3D CNN showed a high performance (an accuracy of 96.48%), better than the baseline methods such as random forest, XGBoost, LSTM, and convLSTM. Jamshed et al. [127] adopted the LSTM method to predict the occurrence of wildland fires from 2022 to 2025. Historical wildfire data and burned data from Pakistan during 2012 and 2021 were used as training data and provided an accuracy of 95%. Naderpour et al. [128] designed a spatial method for wildfire risk assessment in the Northern Beaches region of Sydney. This method consists of two steps. In the first step, twelve influential wildfire factors (NDVI, slope, precipitation, temperature, land use, elevation, road density, distance to river, distance to road, wind speed, humidity, and annual temperature) [129,130] were fed into a deep NN (Neural Network) as a susceptibility model, which included more than three hidden layers to determine the weight of each factor, and then an FbSP (supervised fuzzy logic approach) method was used to optimize the results generated by deep NN. In the second step, the AHP (hierarchical analytical process) method was adopted as the vulnerability model to generate the physical and social vulnerability index using social and physical parameters such as population density, age, employment rate, housing, land use, high density, high value, etc. [131,132]. Finally, a risk function was used to calculate the wildfire risk map, giving the inventory of fire events (very low, low, medium, high, and very high). The proposed method obtained a Kappa coefficient of 94.3%. Nur et al. [133] proposed the hybrid models CNN-ICA and CNN-GWO, which include a CNN and a metaheuristic method (ICA: Imperialist Competitive Algorithms [134] and GWO: Grey Wolf Optimization [135]) to assess wildland fire susceptibility divided into five classes (very low, low, moderate, high, and very high). First, the DPM (Damage Proxy Map) method was adopted to identify burned forest areas on Sentinel-1 SAR (Synthetic Aperture Radar) images from 2016 to 2020 in the Plumas National Forest regions, and to generate an inventory dataset. Then, the inventory data and 16 wildfire conditioning factors, including topography factors (aspect, altitude, slope, and plan curvature), meteorological factors (precipitation, maximum temperature, solar radiance, and wind speed), environmental factors (distance to stream, drought index, soil moisture, NDVI, and TWI (Topographic Wetness Index)), and anthropological factors (land use, distance use, and distance to settlement) were used to train and test this model. Finally, the CNN hyperparameters were optimized using the ICA and GWO methods, and forest fire likelihoods were produced. The obtained result revealed that the CNN-ICA performance (an RMSE of 0.351) is better than the CNN-GWO result (an RMSE of 0.334). Bjånes et al. [136] designed an ensemble learning model based on two CNN architectures, namely CNN-1 and multi-input CNN, to predict forest fire susceptibility classes, which are split into five classes (very low, low, moderate, high, and very high) using satellite data from the Biobio and Nuble regions. CNN-1 is a modified Zhang’s CNN [114] by adding batch normalization in the first convolutional layer and dropout in the fully connected layers. Multi-input CNN is a simple CNN proposed by San et al. for flower grading [136]. A large data was used as learning data, consisting of fifteen fire influencing factors and fire history data from 2013 to 2019 (including 18,734 fires). The fire influencing factors were grouped into four categories: climatic data collected from the TerraClimate dataset [137] (minimum/maximum temperature, precipitation, wind speed, climatic water deficit, and actual evapotranspiration), anthropogenic data (distance to urban zones and distance to roads), vegetation data (NDVI, distance to rivers, and land cover type), and topographic data (aspect, surface roughness, slope, and elevation). This proposed model showed an F1-score of 88.77%, surpassing CNN-1 and multi-input CNN, and traditional methods such as XGBoost and SVM.

Deep learning methods were also employed to map burn severity as a multi-class semantic segmentation task. Huot et al. [138] studied four deep learning models; convolutional autoencoder, residual U-Net, convolutional autoencoder with convolutional LSTM, and residual U-Net with convolutional LSTM to predict wildfires. To train deep learning models, several datasets were used: historical wildfire data [139] since 2000 collected from MOD14A1 V6 of daily fire mask composites at 1 km resolution, vegetation data [140] obtained from the Suomi National Polar-Orbiting Partnership (S-NPP) NASA VIIRS Vegetation Indices (VNP13A1) dataset, and contained vegetation indices since 2012 sampled at 500 m resolution, topography data [141] obtained from the SRTM (Shuttle Radar Topography Mission) and sampled at 30 m resolution, drought [142], and weather data (temperature, humidity, and wind) [143] collected from GRIDMET (Gridded Surface Meteorological) at 4 km resolution since 1979. Residual U-Net achieved the best accuracy of 83%, showing a great ability to detect zones of high fire likelihood. Farasin et al. [144] proposed a novel supervised learning method, called Double-Step U-Net, to estimate the severity level of affected areas after wildfires through Sentinel-2 satellite data, giving each sub-area of the wildland fire area a numerical severity level of between 0 and 4, where 0, 1, 2, 3, and 4 represent an unburned area, negligible damage, moderate damage, high damage, and areas destroyed by fire, respectively. First, a binary classification U-Net method was employed to identify each sub-area as unburned or burned. Then, a regression U-Net method was applied to determine the severity level only of the burned area. Two sources of information were used, Copernicus EMS (Emergency Management Service), which offers the damage severity maps of five burned regions (Spain, France, Portugal, Sweden, and Italy) affected by past fires used as ground truth maps, and Sentinel-2, which offers satellite imagery. Using data augmentation techniques (horizontal/vertical flip, rotation, and shear), Double-Step U-Net achieved an F1-score of 95% for binary classification U-Net, and a high RMSE for regression U-Net, outperforming the U-Net and dNBR (delta Normalized Burnt Ratio) [145] methods. Monaco et al. [146] also studied the ability of Double-Step U-Net with varying loss functions (Binary Cross Entropy (BCE) and Intersection on Union Loss) in generating the damage severity map on manually labeled data collected by Copernicus EMS. The experiment results showed that the Double-Step U-Net with BCE loss achieved the best MSE of 0.54. Monaco et al. [147] also developed a two-step CNN solution to detect burned areas and predict their damage on satellite data. First, a binary semantic segmentation method-based CNN was used to detect burned areas, and then a regression method-based CNN was applied to predict their damage severity between 0 (no damage) and 4 (completely destroyed). Four semantic segmentation methods (U-Net, U-Net++, SegUNet, and attention U-Net [148]) were employed as a backbone to extract wildfire features. Using a satellite image collected from Copernicus EMS, DS -UNet, and DS-UNet++ models with BCE loss showed a higher IoU of 75% and 74%, respectively, in delineating the burnt areas compared to DS-AttU and DS-SegU; DS-AttU, DS-UNet, and DS-UNet++ performed better in predicting the damage severity levels of burned areas, obtaining an RMSE of 2.429, 1.857, and 1.857, respectively. Monaco et al. [149] also used DS-UNet to detect wildfire and to predict the damage severity level, from 0 (no damage) to 5 (completely destroyed) on Sentinel-2 images. DS-UNet achieved an average RMSE of 1.08, overcoming baseline methods such as DS-UNet++, DS-SegU, UNet++, PSPNet, and SegU-Net. Hu et al. [150] also investigated various deep learning-based multi-class semantic segmentation methods such as U-Net, U

^{2}

-Net [151], UNet++, UNet3+ [152], attention U-Net, Deeplab v3 [153], Deeplab v3+, SegNet, PSPNet, etc. in mapping burn severity into five classes that are unburned, low, moderate, high, and non-processing area/cloud. A large-scale dataset, named MTBS (Monitoring Trends in Burn Severity), was developed to learn these models. It includes post-fire and pre-fire top-of-atmosphere Landsat images, dNBR images, perimeter mask, RdNBR (relative dNBR) images, and thematic burn severity from 2010 to 2019 (more than 7000 fires). Five loss functions (Cross-entropy, Focal, Dice, Lovasz softmax, and OHEM loss) and data augmentation techniques (vertical/horizontal flip) were used to evaluate these models. Attention U-Net achieved the best Kappa coefficient of 88.63%. Ding et al. [154] designed a deep learning method based on U-Net, called WLF-UNet, to identify the wildfire location and intensity (no-fire, low-intensity, and high-intensity) on the Himawari-8 satellite data. More than 5000 images captured by the Himawari-8 satellite between November 2019 and February 2020 in the Australian regions were employed as training data, achieving an accuracy of greater than 80%. Prapas et al. [155] also applied U-Net++ as a global wildfire forecasting method. Using the seasFire cube dataset [156], which includes variables related to fire such as historical burned areas and fire emissions between 2001 and 2021, climate, vegetation, oceanic indices, and human related data, U-Net++ reached an F1-score of 50.7%.

5. Deep Learning-Based Approaches for Fire Spread Prediction Using Satellite Data

The fire spread approach estimates fire danger by representing the variable and fixed factors that affect the rate of fire spread, and the difficulty in controlling fires, thereby predicting how fires move and develop over time. Wildfire risk is mainly influenced by various factors such as weather (e.g., wind and temperature), fuel information (e.g., fuel type and fuel moisture), topographic data (e.g., slope, elevation, and aspect), and fire behaviors. Several systems were developed to estimate fire spread, area, behavior, and perimeter; for example, the Canadian FFBP (Forest Fire Behavior Prediction) system [157]. Throughout the years, numerous research studies have been proposed to predict fire spread. This section reports only the methods based on deep learning. Stankevich [158] describes the process of an intelligent system to predict wildfire spread, avoiding state-of-the-art challenges such as low forecast performance, computational cost and time, and limited functionality in uncertain and unsteady conditions. Various data were used as inputs: satellite images collected from several sources: fire propagation data obtained from the NASA FIRMS resource management system [159]; environment data including air temperature, window speed, and humidity; forest vegetation data obtained from the European Space Agency Climate Change Initiative’s global annual Land Cover Map [160]; and weather data from Ventusky InMeteo [161]. Four CNNs, which consist of convolutional layers, ReLU activation functions, max pooling layers, and fully connected layers were used. First, a simple CNN was adopted to recognize objects in the forest fire data. Then, three CNNs were employed to estimate the environmental data, air temperature 2 m above the ground, wind speed at a height of 10 m above the ground, and relative air humidity. Finally, an auto-encoder generated the fire forecast. Radke et al. [162] proposed a novel model, FireCast, which integrates two CNNs to predict fire growth. Each CNN includes convolutional layers, one average pooling layer, three dropout layers, ReLU activation functions, two max pooling layers, and a sigmoid function. Given an initial fire perimeter, atmospheric data, and location characteristics as inputs, FireCast predicts the areas of the current fires that are expected to burn over the next 24 h. It obtained an important result (an accuracy of 87.7%), overcoming Farsite simulator [163] and the random prediction method using geospatial information such as Landsat8 satellite imagery [25], elevation data, GeoMAC dataset as fire perimeters data, and atmospheric and weather data collected from NOAA. Bergado et al. [164] proposed a deep learning method, AllConvNet, which includes convolutional layers, max pooling layers, and downsampling layers to estimate the probabilities of wildland fire burn over the next seven days. A heterogeneous dataset [165,166,167,168] was used as input, consisting of historical forest fire burn data from the Victoria Australia region during 2006 and 2017, topography data (slope, elevation, and aspect), weather data (rainfall, humidity, wind direction, wind speed, temperature, solar radiation, and lighting flash density), proximity to anthropogenic interface (distance to the power line and distance to roads), and fuel information (fuel moisture, fuel type, and emissivity). The experimental study reported that AllConvNet reached an accuracy of 58.23% better than baseline methods such as SegNet (56.03%), logistic regression (51.54%), and multilayer perceptron (50.48%). Hodges et al. [169] developed a DCIGN (Deep Convolutional Inverse Graphics Network) to determine the spread of wildland fire and to predict the burned zone up to 24 h. DCIGN consists of two convolutional layers, ReLU activation functions, two max pooling layers, one fully connected layer followed by TanH (hyperbolic tangent) activation functions, and one transpose convolutional layer. Various data are used as input, such as vegetation information (canopy height, canopy cover, and crown ratio), fuel model, moisture information (100-h moisture, 10-h moisture, 1-h moisture, live woody moisture, and live herbaceous moisture), wind information (north wind and east wind), elevation, and initial burn map. DCIGN was trained to predict homogenous and heterogeneous fire spread using 9000 and 2215 simulations, respectively, achieving an F1-score of 93%. Liang et al. [170] proposed an ensemble learning model, which includes a BPNN (Backpropagation Neural Network), LSTM, and RNN (Recurrent Neural Network) to predict the scale of forest fire. They used fire data on the Alberta region between 1990 and 2018, obtained from the Canada National Fire Database. They also employed eleven meteorological data (minimum temperature, mean temperature, maximum temperature, cooling degree days, total rain, total precipitation, heating degree days, total snow, speed of maximum wind gust, snow on ground, and direction of maximum wind gust) as input. The testing result showed that this method is suitable for estimating the size of the burned area and the duration of the wildfire, with a high accuracy of 90.9%. Khennou et al. [171] developed a deep learning model based on U-Net and FU-NetCast to determine the wildfire spread over 24 h, and to predict the newly burned areas. FU-NetCast showed excellent potential in predicting forest fire spread using satellite images, atmospheric data, digital elevation models (DEMs) [66], fire perimeter data, and climate data (temperature, humidity, wind, pressure, etc.) [171]. Khennou and Akhloufi [172] also developed FU-NetCastV2 to predict the next burnt area after a 24-h scale. Using GeoMAC data (400 fire perimeters) from 2013 to 2019, FU-NetCastV2 achieved a high accuracy of 94.60%, outperforming FU-NetCast by 1.87%. Allaire et al. [173] developed a deep learning model to identify fires and to determine their spread. This model is a deep CNN with two types of inputs that are the remaining scalar inputs and the spatial fields describing the surrounding landscape. It consists of four convolutions layers followed by a batch normalization layer, the ReLU activation function, an average pooling layer, and various dense layers, followed by batch normalization and the ReLU activation function. A MAPE (Mean Absolute Percentage Error) of 32.8% is reached using large training data, which includes a data map of Corsica (land cover field and elevation field) and various environmental data: fuel moisture content (FMC), wind speed, terrain slope, ignition point coordinates, heat of combustion perturbation, particle density perturbation, fuel load perturbations, fuel height perturbations, and surface–volume ratio perturbations, confirming the potential of this method in estimating fire spread in a wide range of environments. McCarthy et al. [174] illustrated a deep learning model based on U-Net to downscale GEO satellite multispectral imagery, monitor, and estimate fire progress. An excellent performance (a precision of 90%) is obtained, showing the effectiveness of this method in determining fire evolution with high spatiotemporal resolution (375 m) using quasi-static features (terrain, land, and vegetation information) and dynamic features obtained from GEO satellite imagery.

Finding a large and reliable dataset for training and testing deep learning models is a critical challenge, as the dataset is the main factor in helping to build accurate models and in benchmarking multiple developed methods. However, for fire detection and mapping, as well as fire severity and spread prediction, there is no baseline dataset, which makes the comparison of models a critical issue. In this section, we present the most popular datasets used for fire detection and mapping, and in predicting fire spread and damage severity.

The GeoMAC (Geospatial Multi-Agency Coordination) database [176,177] illustrates stored fire perimeters data since August 2000. It is presented via the United States Geological Survey (USGS) data series product. It contains wildland fire perimeters information obtained from wildfire accidents, is evaluated for accuracy and completeness, and is collected via intelligence sources such as IR (infrared) imagery and GPS. It has public access via the GeoMAC Web application [176]. This data are archived by year and state.
Landsat8 satellite imagery is used as a visual imagery data collected from GloVis [25] every few months. Each imagery has a resolution of 30 m, where each pixel corresponds to a 30 × 30 square meter area on the ground.
Weather and atmospheric data are collected from the National Oceanic and Atmospheric Administration (NOAA) [66]. These include atmospheric pressure, wind direction, temperature (Celsius), precipitation, dew point, relative humidity, and wind speed for each wildfire location.
Digital Elevation Models (DEMs) information [66] represents the zero surface elevation to which scientists and geodesists refer. It was generated from remotely sensed data collected by drones, satellites, and planes with spatial resolutions of 20 m or higher using various remote sensing methods such as SAR interferometry, LiDAR, Stereo Photogrammetry, and Digitizing contour lines. It was collected from the USGS National Map [188] for each fire location.
VIIRS (Visible Infrared Imaging Radiometer Suite) Level 1B [102] data are developed by NASA (National Aeronautics and Space Administration) and generated using SIPS (NASA VIIRS Science Investigator-led Processing Systems). VIIRS is on two satellites, the JPSS (Joint Polar Satellite System) satellites and the SNPP (Suomi National Polar-orbiting Partnership). VIIRS Level 1B data contain an array of related information, calibrated radiance values, and uncertainty indices. These include three products for image resolution, day–night band, and moderate resolution. They provide geolocation products and calibrated radiances.
VIIRS Active Fire [58] is a fire monitoring product generated by FIRMS (Fire Information for Resource Management System) from MODIS (Moderate Resolution Imaging Spectroradiometer) and VIIRS. It includes near-real-time (within 3 h of satellite observation) and real-time (only in the US and Canada) fire locations.
The GlobFire (Gloab wildfire) dataset [178] is a public dataset generated by a data mining process utilizing MCD64A1 (MODIS burnt area product Collection 6). It was developed and available under the GWIS (Global Wildfire Information System) platform. It provides detailed information about each fire, such as the initial date, final date, perimeter, and burned area, which helps to determine the evolution of the fire.
The Wildfires dataset [179,180] are public data obtained from the CWFIS (Canadian Wildland Fire Information System) [189]. It contains diverse data related to weather (land surface temperature), ground condition (NDVI), burned areas, and wildfire indicators (thermal anomalies) collected from MODIS. The burned areas represent various regions that differ in their burning period, size, extent, and burn date. This dataset contains 804 instances divided into 386 wildfire instances and 418 non-wildfire instances.
MCD64A1 (Collection 6) C6 [103] is a burned area data product, which maps and identifies the approximate date and spatial extent of burning areas, employing a spatial resolution of 500 m of MODIS Surface Reflectance imagery. It includes the following data: burn date, quality assurance, burn data uncertainty, and the first and last days of the year, for reliable change detection.
The LANDFIRE 2.0.0 database [186] consists of public data for Puerto Rico, Alaska, the continental United States, and Hawaii. It contains fuel and vegetation data collected from various existing information resources such as the USGS National Gap Analysis Program (GAP), NPS Inventory and Monitoring, State Inventory Data, and USFS Vegetation and Fuel Plot Data. It also includes landscape disturbances and changes such as wildland fire, storm damage, fuel and vegetation treatments, insects, disease, and invasive plants.
The USTC_SmokeRS dataset [50,51] are public data for smoke detection tasks collected from MODIS satellites. It consists of 6225 satellite images with a spatial resolution of 1 km, a size resolution of 256 × 256 pix., and saved in “.tif” format. It includes six classes that are smoke (1016 images), seaside (1007 images), land (1027 images), haze (1002 images), dust (1009 images), and cloud (1164 images).
The Sentinel-2 dataset [147,183] includes the data of 73 areas of interest collected in various regions of Europe by Copernicus EMS, which are used to delineate forest fires and to predict the damage level. Each area of interest was presented with an image with a resolution of 5000 × 5000 × 12 (12 illustrates the twelve channels acquired via satellite remote sensing) and classified according to the wildfire damage level, varying over 0 (no damage), 1 (negligible damage), 2 (moderate damage), 3 (high damage), and 4 (completely destroyed).
The Landsat-8 Active fire detection (LAFD) dataset [69,70] is a public dataset developed for active fire detection. It contains 8194 satellite images (with a resolution of 256 × 256 pix.) of wildfires collected by Landsat-8 around the world in August 2020, 146,214 image patches with a resolution of 256 × 256 pix., consisting of 10-band spectral images, and associated results produced by three hand-crafted active fire detection methods [71,72,73], and 9044 image patches extracted from thirteen Landsat-8 images captured in September 2020 as well as their corresponding masks, which were manually annotated by a human specialist.
The WildfireDB dataset [181,182] is an open source data for wildfire propagation tasks collected from the VIIRS thermal anomalies/active fire database. It presents the historical wildfire occurrence over 2012–2017, as well as the vegetation (the maximum, median, sum, minimum, mode, and count values of canopy base density, as well as canopy height, canopy cover, canopy base height, and existing vegetation height and cover), topography (slope and elevation), and weather (total precipitation; maximum, average, and minimum temperatures; relative wind speed; and average atmospheric pressure). It contains 17,820,835 data points collected from a large area that covers 8,080,464 square kilometers of the continental United States (United States, Brazil, and Australia).
The TerraClimate dataset [137,184] is a high-resolution global dataset of monthly climate and climatic water balance from 1958 to present. It presents monthly the following climate factors: minimum and maximum temperature, precipitation, solar radiation, wind speed, climatic water deficit, vapor pressure, and reference evapotranspiration.
The Datacube dataset [118] includes nineteen features collected from MODIS, grouped into dynamic and static attributes. The dynamic attributes are thirteen features, which are max and min 2 m temperature, precipitation, LAI, Fpar, day and night LST, EVI, NDVI, and the min and max u-/v-component of wind. The static attributes introduce six features, namely CLC (Corine Land Cover), slope, elevation, aspect, population, and road density.
The GEODATA DEM-9S dataset [165] refers to Digital Elevation Model Version 3 and Flow Direction Grid 2008. It is public data, which presents ground level elevation points for all of Australia with a grid spacing of nine seconds in longitude and latitude, approximately 250 m in the GDA94 coordinate system. It is resampled to 500 m resolution using bilinear interpolation to generate the elevation, aspect, slope, sine, and cosine components of the spectrum.
The Vicmap data [166,167] show the distance to anthropogenic interfaces in Victoria, including distance to roads and distance to power lines.
The dynamic land cover dataset [168] is developed by the Australian Bureau of Agriculture, and Resource Economics and Sciences and Geoscience Australia. It reports land cover, vegetation cover, and the land use information of Australia.
The MTBS (Monitoring Trends in Burn Severity) dataset [107,150] is a large-scale public database developed to determine trends of burn severity. It describes burn severity and burn area delineation data for the entire United States land area between 1984 and 2021. It includes fire occurrence data and burned areas boundaries data, providing various influencing factors of fires such as post-fire and pre-fire Landsat Top of Atmosphere images, dNBR (delta Normalized Burnt Ratio) images, perimeter mask, RdNBR (relative dNBR) images, thematic fire severity from 1984 to 2021, and fire location obtained from various remote sensing satellites such as Landsat OLI, Landsat TM, Sentinel 2A, Sentinel 2B, Landsat ETM+, and Sentinel 2A.
The CALFIRE (California Fire Perimeter Database) dataset [75] was developed by the Fire and Resource Assessment Program. It contains the records of perimeters of forest fires collected in the state of California between 1950 and 2019.
The GFED (Global Fire Emissions Database, Version 4.1) dataset [185] includes the estimated monthly burned area, fractional contributions of different fire types, and monthly emissions, as well as 3-hourly or daily fields, which allow for scaling the monthly emissions to higher temporal resolutions. Additionally, it provides data for monthly biosphere fluxes. The spatial resolution of the data is 0.25 degrees latitude by 0.25 degrees longitude, and the time period covered is between 1995 and 2016. The emissions data consist of a variety of substances such as carbon, carbon monoxide, methane, dry matter, nitrogen oxides, total particulate matter, etc. These data are presented annually by region, globally, and by the source of fire for each area.
The MapBiomas Fire dataset [175] is a public dataset of burned areas for Brazil between 1985 and 2020. Maps of the burned area are available in various temporal domains that are monthly, annual, and accumulated periods, as well as fire frequency. They are combined with annual land cover and land use to show the zones affected by the fires over the last 36 years.
The PlanetScope dataset [94] is developed by the PlanetLabs cooperation. It includes high-resolution images with a spatial resolution of 3 m per pixels collected from 130 CubeSat 3U satellites, named Dove.
The burned areas in the Indonesia dataset was developed by Prabowo et al. [94,97] to train and evaluate deep learning models related to burned area mapping tasks. It comprises 227 images with a resolution of 512 × 512 pix. (in GeoTIFF format) collected from the Landsat-8 satellite in regions of Indonesia between 2019 and 2021, as well as their corresponding ground truth images, which are manually annotated.
The satellite burned area dataset [98,99] is a public dataset for burned area detection tasks based on the semantic segmentation method. It includes 73 forest fire images with a resolution of up to 10 m per pixel collected by the Sentinel-2 L2A satellite from 2017 to 2019 in Europe regions and their binary masks. It also contains the annotation of five severity damage levels, which range between undamaged and completely destroyed, generated by the Copernicus emergency management service annotation.
The FASDD (Flame and Smoke Detection Dataset) [62,64] is a very large public dataset consisting of flame and smoke images collected from multiple sources such as satellite and vision camera. It includes 310,280 remote sensing images with resolutions of between 1000 × 1000 pix. and 2200 × 2200 pix. obtained by Landsat-8 with a 30 m resolution and Sentinel-2 with a 10 m resolution, covering numerous regions such as Canada (5764 images), America (8437 images), Brazil and Bolivia (6977 images), Greece and Bulgaria (10,725 images), South Africa (9537 images), China (624 images), Russia (2111 images), and Australia (266,069 images). Among these remote sensing images, 5773 images were labeled via human–computer interaction in four kinds of formats such as JSON, XML, and text.
SeasFire Cube [156] is an open access dataset developed under the SeasFire project and funded by the ESA (European Space Agency). It contains fire data between 2001 and 2021 in 0.25 degree grid resolution and 8 day temporal resolution, including historical burned areas and wildfire emissions, meteorological data (humidity, direction of wind, wind speed, average/max/min temperature, solar radiation, total precipitation, etc.), human-related variables (population density), oceanic indices, vegetation data (LAI, land cover, etc.), and drought data.
The NDWS (Next Day Wildfire Spread) dataset [79,187] is a public, large-scale, multivariate remote sensing dataset over the continental United States during 2012 and 2020. It comprises 2D fire data with numerous variables such as vegetation (NDVI), population density, weather (wind direction, wind speed, humidity, precipitation, and maximum/minimum temperature), topography (elevation), drought index, and an energy release component. It also includes 18,455 fire samples; each represents 64 km × 64 km at 1 km resolution from the time and location of the fire, as well as the previous fire mask (mask at time t) and fire mask (time at t + 1 day).
The FireCube dataset [126] is a daily datacube for the modeling and analysis of wildfires in Greece. It includes numerous variables during 2009 and 2021 at a daily 1 km × 1 km grid: average (avg)/ maximum (max)/minimum (min) 2 m dewpoint temperature, avg/max relative humidity, avg/max/min surface pressure, avg/max/min total precipitation, avg/max/min 10 m V wind component, avg/max/min 10 m U wind component, avg/max/min 2 m temperature, 8 day evapotranspiration, FPAR (Fraction of Absorbed Photosynthetically Active Radiation), FWI (Forest Fire Weather Index), rasterized ignition points, LAI, day/night land surface temperature, wind direction of maw wind, max wind speed, daily number of fires, soil moisture index, soil moisture index anomaly, aspect, elevation, population density (2009–2021), distance from roads, roughness, slope, distance from waterways, etc.
The CBERS 04A WFI dataset is a public dataset developed by Higa et al. [42,49] to map active fires. It contains 775 RGB images collected by the Wide Field Imager (WFI) sensor on board the CBERS 04A remote sensing satellite between May 2020 and August 2020 in the Brazilian Pantanal areas.

78. Discussion

In this section, we discuss the data preprocessing methods used before training the deep learning models for fire detection and mapping, as well as for the fire spread and damage severity prediction tasks. In addition, we analyze the performances of the deep learning models for each of these tasks.

78.1. Data Preprocessing

The availability of satellite datasets is crucial in developing reliable and accurate DL models for detecting and mapping wildland fires, as well as for predicting their damage and spread. However, the sensitivity of wildland fire data is the main reason for the lack of public access to it, as it often includes sensitive information such as fire location and the severity of the damage. Moreover, there are several challenges depending on numerous factors, including the size of the data, the noise in the data, the variability of the weather and environmental conditions, and the complexity of the images. Numerous datasets were designed for these tasks, such as PlanetScope [94], GFED [185], MapBiomas Fire [175], FASDD [62,64], NDWS [79,187], FireCube [126], WildfireDB [181,182], USTC_SmokeRS [50,51], and Wildfires [179,180]. We can see that the wildland fire data include a wide range of fire influencing factors, such as vegetation data (canopy height, canopy cover, NDVI, fuel moisture, fuel type, etc.), topography data (slope, aspect, surface roughness, and elevation), weather data (precipitation, temperature, wind speed, wind direction, pressure, solar radiation, vapor pressure, etc.), and proximity to anthropogenic interfaces (distance to power lines and roads), as well as historical fires and satellite images. Data preprocessing plays a crucial role in producing a reliable and accurate performance for wildland fire detection and mapping, as well as for predicting the severity and spread of fire damage. Many preprocessing steps were employed to remove, clean, and correct the data to ensure that DL models were trained on compatible and accurate learning data. These steps include: (1) data filtering to remove information that is not relevant to wildland fires, (2) data cleaning to remove noise such as cloud cover or smoke, and to correct or remove data anomalies resulting from sensor malfunction or other errors, and (3) data normalization to adjust the range of inputs to a similar scale, ensuring that DL models are trained on comparable and coherent data. On the other hand, data augmentation techniques were used to increase the size and diversity of wildland fire datasets and to overcome overfitting, as well as to improve the robustness of DL models for fire detection, mapping, and prediction. Several augmentation techniques were used for this task, including mosaic data augmentation, image occlusion methods, photometric transformations, and geometric transformation. As an example, Zhao et al. [38] employed random vertical and horizontal flips to diversify the training data. Khryashchev et al. [68] used multiple techniques such as rotation, mirroring, shifting, and random chromatic distortion in HSV color format to augment the number of input images, resulting in eight times more images than the original training and testing data (1850 images). Colomba et al. [98] used four data augmentation methods during the training phase, including rotation, shear, and vertical/horizontal flips, to improve the robustness of their proposed model for fire severity forecasting. In [149], three data augmentation techniques (rotation, vertical/horizontal flips, and shear) were used to change the data variability, especially for unbalanced classes. In [80], cropping and horizontal/vertical mirroring methods were used, resulting in 47 multispectral smoke images. Hu et al. [90] also employed six data augmentation techniques (resize, mirror, rotation, aspect, crop, and color jitter) in training their fire mapping model.

78.2. Discussion of Model Results

The performances of the DL models are measured by their ability to accurately detect and map fires, as well as in predicting damage severity and fire spread. Evaluating the performance of a fire detection model depends on assessing its ability to correctly identify wildland fires. Similarly, in a fire mapping task, the reliability of DL models can be evaluated based on their success in correctly mapping and detecting burned areas on the input map. In addition, the performance of DL models in predicting damage severity and fire spread tasks is evaluated based on their effectiveness in determining the level of fire damage and estimating fire spread using various influencing factors. The comparison of the DL models is challenging due to the use of various metrics and datasets for evaluation. In general, deep learning models demonstrated remarkable performances in detecting and mapping wildland fires, as well as in predicting the severity and spread of the fires using satellite remote sensing data, outperforming traditional machine learning models. In fire detection and mapping, satellite data were utilized to identify smoke plumes and fires. Convolutional Neural Networks (CNNs), which are commonly composed of convolutional layers, pooling layers, and fully connected layers, are frequently used for fire detection. For instance, in [31], a deep CNN, MiniVGGNet, demonstrated excellent accuracy, at 98.00%. In [56], a 3D CNN was proposed to identify fire using GOE satellite images, achieving a superior F1-score of 94.0% compared to traditional methods such as GOES-AFP, AVHRR-FIMMA, and VIIRS-AFP. Additionally, a deep CNN, named FireCNN, was designed to detect fire in Himawari-8 satellite data, showing excellent performance with an accuracy of 99.90%, surpassing machine learning methods such as SVM, random forest, and thresholding methods. The CNN model FireCast showed a high accuracy of 87.7% in predicting fire spread, which is superior to the Farsite simulator by 24.1% and 19.9% using dry and wet fuel moisture, respectively [162]. CNN models were also used to predict fire damage severity, with promising results. For example, a 2D/3D CNN method [125] was employed to analyze multiple factors that influence fires, such as temperature, day/night land surface temperature, soil moisture index, relative humidity, wind speed, NDVI, surface pressure, and precipitation, obtaining an accuracy of 96.48% better than machine learning methods such as random forest and XGBoost. On the other hand, DL models based on the encoder–decoder architecture were utilized to detect fire or smoke areas and to map burned areas as a segmentation task. For instance, in [80], an encoder–decoder model called Smoke-UNet, was proposed to detect smoke areas using multispectral satellite images, showing a high accuracy of 92.3%, outperforming existing semantic segmentation methods such as UNet, FCN, PSPNet, and SegNet. In [87], UNet was used to detect fire areas, demonstrating a high accuracy, with 98.0%. In [96], the Burnt-Net, an encoder–decoder model, was developed to accurately map the extent of burned areas using Sentinel-2 images. It achieved an accuracy of 98.08%, surpassing U-Net by 1.15%. Numerous multi-class semantic segmentation techniques based on U-Net architecture, including the Double-Step U-Net [144,146], WLF-UNet [154], and DS-AttU [147] were also used to predict the severity of damage. These methods demonstrated their effectiveness in accurately predicting the level of damage severity.

The transfer learning technique was also employed to reutilize a pretrained DL model trained on a large dataset, such as ImageNet [37], for detecting and mapping wildfires using satellite data. The main idea is to adapt and fine-tune the pretrained model’s parameters to avoid overfitting caused by the limited amount of training data available for these tasks. Pretrained DL models showed great potential in accurately detecting and mapping wildfires in satellite data. For instance, in [34], a pretrained MobileNet v2 was employed to detect wildfires, achieving an accuracy of 73.3%. In [89], a pretrained VGG16 was also used as a feature extractor to map fires using real-time Sentinel-2 data, achieving an impressive F1-score of 87.0%.

To predict the severity of fire damage and fire spread, historical fire damage data were analyzed using an LSTM network, which is a type of RNN designed to capture the temporal dependencies and patterns in time series data. As an example, in [122], an LSTM was used to analyze dynamic predictors such as temperature, relative humidity, rain, wind, and ISI, etc., achieving a good RMSE of 0.021 compared to other methods, including decision tree, linear regression, SVM, and NN. The LSTM and RNN models also achieved a high accuracy of 90.9% when used to predict fire spread using historical fire data between 1990 and 2018, and eleven meteorological variables [170]. This suggests that these models were able to accurately learn the patterns and relationships between historical wildfire data and fire influencing factors to provide reliable predictions.

Attention mechanism was also used to analyze satellite data and to address the problem of wildland fires. It allows for the determination of global dependencies and to focus on relevant features in the input satellite data, thereby improving the performance of DL models in detecting and mapping wildland fires. In [150], attention U-Net, which adopts the attention gate mechanism to remove noisy and irrelevant features transmitted by skip connections, showed promising results (a Kappa coefficient of 88.63%), overcoming several DL models, including U-Net, U

^{2}

-Net, U-Net++, DeepLab v3, DeepLab v3+, and FCN by 0.81%, 0.83%, 1.48%, 11.65%, 8.26%, and 7.39%, respectively, in detecting the burn severity level. AOSVSSNET, which also integrates the attention mechanism in skip connection, demonstrated a high IoU of 72.84%, outperforming FCN, UNet, and DeepLab v3+ by 43.21%, 7.17%, and 0.7% in detecting smoke using satellite data [84]. Moreover, the Swin Transformer, which adopts the attention mechanism as a main component, showed promising results (mAP of 53.01%), better than Yolo v5 (mAP of 41.39%) and Faster R-CNN (mAP of 32.05%) in detecting flame and smoke using a large satellite dataset (5773 images) [62].

In summary, the DL models showed interesting performances in detecting and mapping wildland fires using satellite remote sensing data as input. They automatically extract features from the input data, and detect smoke and fires more accurately than classical ML models. In addition, for predicting the damage and spread of wildland fires, DL models also showed promising results using various influencing fire factors such as temperature, wind speed, humidity, topography, etc. However, the comparison of DL models in these topics is challenging due to the use of various metrics and datasets for training and testing. Therefore, it is important to develop standard evaluation metrics and datasets for future research to provide solid comparisons and to facilitate the development of more reliable fire models.