High false-alarm rates are still present due to background complexity; the variability of smoke regarding its size, intensity, and shape; and the presence of smoke-like objects, such as haze, dust, and clouds. These objects often have very similar textures, colors, shapes, and spectral features to smoke, leading to false results in detecting smoke. Therefore, this paper presents a novel ensemble learning method, namely BoucaNet, for recognizing smoke on remote sensing satellite images, addressing these challenging limitations.
2. Deep Learning Methods for Smoke Recognition
Over the years, numerous DL methods were developed to improve the performance of smoke classification in different fields of application. Among them, Tao et al. [
15] suggested a simple CNN to recognize smoke in ground images, addressing challenging limitations such as varying smoke colors, shapes, and textures. The proposed CNN is a modified AlexNet [
16] by changing the order of the max pooling layers and normalization layers, which follow the first and second convolutional layers. The modified AlexNet was trained and evaluated using the Yuan dataset [
17], resulting an accuracy of 96.88%. Yin et al. [
18] proposed a new deep normalization CNN, namely DNCNN, to improve smoke detection performance. DNCNN incorporates batch normalization into convolutional layers to deal with overfitting and gradient dispersion. Data augmentation techniques were also used to address the challenges of imbalanced data between smoke and non-smoke images. Test results showed that DNCNN achieved an impressive performance with an accuracy of 98.08%, surpassing popular CNNs such as AlexNet, ZF-Net [
19], and VGG-16 [
20]. Khan et al. [
21] studied three CNN models (AlexNet, VGG-16, and GoogleNet [
22]) to identify smoke in a normal and foggy IoT environment. Experimental tests were performed using a very large dataset. VGG-16 obtained the higher performance with an accuracy of 97.72%, demonstrating its ability to detect smoke in a foggy environment.
Peng and Wand [
23] proposed a video smoke detection method to recognize smoke in complex environments. First, a GMM (Gaussian Mixture Model) [
24] was employed as an image processing method to extract the suspected smoke areas from images collected from surveillance cameras. Then, the SqueezeNet model [
25] was adopted to detect the presence of smoke. Using a large dataset, this proposed method showed a high performance with an accuracy of 97.12% and a high prediction time compared with existing wildfire models such as AlexNet, ShuffleNet [
26], Xception [
27], and MobileNet [
28]. Gu et al. [
29] developed a DCNN (Deep Dual-Channel Neural Network) as a smoke recognition method. The DCNN is composed of two deep subnet channels, SBNN (Selective-based Batch Normalization Network) and SCNN (Skip Connection-based Neural Network). DCNN was trained on large public learning data [
17] and data augmentation techniques. It achieved an accuracy of 99.5%, higher than hand-crafted methods and state-of-the-art DL methods such as DNCNN [
18], AlexNet, VGG, GoogLeNet, Xception, ResNet, etc. Zhang et al. [
30] presented a DL method, called DC-CNN (Dual-Channel Convolutional Neural Network), for detecting smoke. Extensive studies were conducted using learning data, including 9794 smoke and 9794 non-smoke images, to handle the challenges related to smoke features, such as transparency properties, homogeneity, and visual similarity to clouds, steam, haze, and fog. DC-CNN obtained the highest accuracy of 99.33% compared with baseline DL models. Jia et al. [
31] designed a new method for detecting smoke in videos. Firstly, GMM-based domain knowledge of smoke was adopted to segment the suspected areas of smoke. Then, three pretrained deep learning models (AlexNet, Inception v3, and ResNet50 [
32]) were used to recognize smoke. ResNet50 with GMM performed best, with an F1-score of 99.32% compared with the other models using 138 smoke videos as testing data. He et al. [
33] proposed a DL method for smoke detection in a foggy environment. This method combines the VGG-16 method as a backbone to extract smoke features and an attention method, which consists of channel attention and spatial attention to improve the detection of small smoke areas. It was also trained and evaluated using 33,666 images. It achieved an F1-score of 99.97%, outperforming the AlexNet, VGG-16, and SqueezeNet methods. Zhang et al. [
34] developed an end-to-end CNN method to identify smoke. Two CNNs (spatial stream and temporal stream) were adopted to extract the spatial and temporal features of smoke. This method achieved an accuracy of 96.8%, better than state-of-the-art methods. Cheng et al. [
35] presented a deep convulational network, namely PACNN, to improve the robustness of smoke recognition tasks. Testing results showed that PACNN reached a high accuracy of 98.91% compared with popular CNNs and vision transformers using the Yuan dataset.
Deep learning methods performed better in recognizing smoke. However, several challenging limitations persist, including the complexity and dynamics of the background; the visual similarity between smoke, clouds, dust, and haze; the varying characteristics of smoke regarding its air concentration, flow pattern, and color; and detecting small smoke zones.
3. Materials and Methods
3.1. Proposed Method for smoke Classification
A new ensemble learning approach, namely BoucaNet, is introduced for recognizing smoke in satellite images. BoucaNet combines the deep CNN EfficientNet v2 (EfficientNetV2M) [
13] and the vision transformer EfficientFormer v2 (EfficientFormerV2L) [12].
The preprocessing steps start with resizing the input satellite images to 224 × 224 pixels. Next, four data augmentation techniques, including rotation, shearing, shifting, and zooming, are utilized to diversify learning data, improve the potential of BoucaNet to generalize different real-world scenarios, and ovoid overfitting. Then, the input satellite images and the generated images are simultaneously fed into the EfficientNet v2 and EfficientFormer v2 models to extract complex contextual features, comprising both smoke plume patterns and background contextual information, and provide a comprehensive representation of various smoke scenarios. After concatenating the two feature maps generated by the EfficientNet v2 and EfficientFormer v2 models, the Gaussian dropout regularization technique with a rate of 0.3 is employed. Finally, a Softmax function generates a probability score ranging from 0 to 1, determining the appropriate class, such as smoke, cloud, haze, dust, seaside, or land, for the input satellite images.
3.2. Datasets
To train and test the proposed smoke recognition method, BoucaNet, the available satellite data, USTC_SmokeRS [14], is utilized. This dataset is collected using MODIS (Moderate Resolution Imaging Spectroradiometer) and represents numerous smoke scenes through satellite remote sensing. It comprises a total of 6225 satellite images with dimensions of 256 × 256 pixels and a spatial resolution of 1 km.
4. Results and Discussion
BoucaNet was trained using the USTC_SmokeRS satellite dataset. This dataset allowed BoucaNet to learn on various classes and scenarios, thereby enabling it to learn and recognize various aspects of smoke in satellite images. It comprises a total of 6225 satellite images, divided into six distinct classes.
BoucaNet showed a high performance during testing, achieving a loss of 0.2184, an accuracy of 93.67%, and an F1-score of 93.64%. This performance was obtained thanks to the diversity of feature maps extracted by EfficientNet v2 and EfficientFormer v2 models, thus enabling BoucaNet to distinguish between smoke and complex backgrounds and identify small areas of smoke. It demonstrated its potential to address and overcome challenging limitations related to recognizing smoke in satellite images. These challenges include complex backgrounds, comprising various land covers and geographical features, which can make it difficult to accurately identify smoke in input satellite images. Additionally, BoucaNet handled the varying and dynamic nature of smoke in terms of its shape, color, intensity, and flow pattern features, as well as the visual similarities of smoke, including color, shape, and spectral characteristics, which are often shared with clouds, dust, and haze. On the other hand, BoucaNet achieved an efficient processing speed with an inference time of 0.16 seconds. This inference time showed BoucaNet’s suitability for real-time processing of satellite images for smoke recognition while maintaining high performance.
In addition, BoucaNet achieved superior results with an F1-score of 95.58%, 91.00%, 90.82%, 95.01%, 98.76%, and 90.36% for recognizing cloud, dust, haze, land, seaside, and smoke classes, respectively, compared with CT-Fire, RegNetY-16GF, and EfficientFormer v2. It demonstrated its ability to accurately differentiate between cloud, smoke, haze, dust, land, and seaside features, thereby proving its capability to overcome challenges related to background complexity and visual similarities, including color, shape, and spectral characteristics, between smoke and other classes (cloud, dust, and haze).
In conclusion, BoucaNet performed well in recognizing smoke in satellite images compared with baseline models. Notably, it demonstrated its potential to address challenging limitations, including complex backgrounds; the dynamic nature of smoke in terms of its shape, intensity, and color; detecting small areas of smoke; and distinguishing visual similarities in terms of color, shape, and spectral characteristics between smoke and other elements, including clouds, dust, and haze. Additionally, BoucaNet achieved an interesting inference time.
5. Conclusions
A novel ensemble learning method, namely BoucaNet, was presented for recognizing smoke in satellite images while addressing the associated challenges. BoucaNet combines the strengths of EfficientNet v2 and EfficientFormer v2 to extract rich and diverse feature maps for smoke, cloud, haze, dust, land, and seaside classes. It demonstrated a high performance. It also showed an interesting processing speed. Additionally, BoucaNet demonstrated its potential as a robust solution to the challenges of recognizing smoke in satellite images, including complex backgrounds; the dynamic nature of smoke, which can present variations in shape, intensity, and color; detecting small areas of smoke; and visual similarities between smoke and other elements, such as clouds, dust, and haze.