Each year, the number of fire-related natural disasters increases, resulting in more human deaths. In addition to human and material losses, fires frequently cause extensive economic harm. Both natural and anthropogenic forces are significant contributors. Factors such as dryness, wind, heat appliances, chemical fires, and cooking are conductive to fire ignition. Accidental fires can start with alarming randomness and rapidly spread out of control. To prevent unforeseen fires and ensure the safety of individuals, prompt evaluation of potential threats and prompt mitigation are necessary. According to the State Fire Agency of Korea, there were 40,030 fires in the country in 2019, which resulted in 284 deaths and 2219 hospitalizations [
1]. The number of fires caused record-breaking levels of property damage. Thus, numerous research organizations have implemented tangential techniques for identifying fires. Fire alarm systems, sensor-based frameworks, and other sensing technologies are just a few examples of the warning systems and identification devices that have been adopted over the past several decades to detect specific fire and flame characteristics; however, numerous issues remain unresolved [
2]. Recent research has proved the effectiveness of computer vision and deep learning-based methods for fire detection. Computer vision and artificial intelligence (AI) based approaches, such as static and dynamic texture analysis [
3,
4], neural network convolutions (CNNs), and 360-degree sensors [
5,
6,
7], are widely used in the field of fire detection.
2. Fire Detection Strategies Based on Image Processing and Computer Vision
Location, rate of spread, length, or surface are only a few of the geometrical features of flames that Toulouse et al. [
8] aimed to identify with their novel approach. The pixels depicting the fire were sorted into categories based on their color nonrefractive pixels were able to detect smoke sorted based on their average intensity. The edge computing framework for early fire detectors developed by Avgeris et al. [
9] is a multi-step process that significantly facilitates border identification. However, these computer-vision-based frameworks were only used on relatively static images of the fire. Recently developed techniques based on fast Fourier transform (FFT) and wave variation have been utilized by other researchers to analyze the boundaries of wildfires in movies [
10]. Studies have demonstrated that these methods are effective only in specific scenarios.
Color pixel statistics have been used to examine both foreground and background photos to search for signs of fire. By fusing color information and recording foreground and background frames, Turgay [
11] created a real-time fire detection system. Fire color data is derived from statistical assessments of representative fire photos. The pixel value color information in each color channel is modeled using three Gaussian filters. This technique is used for simple adaptive data scenarios. Despite the widespread use of color in flame and fume recognition, such methods are currently infeasible due to the influence of environmental factors such as lighting conditions, shadows, and other distractions. Even though the fire has long-term dynamic movements, color-based approaches are inferior to the new dynamics for fire and smoke detection.
By analyzing the motion of smoke and flames with linear dynamic systems, researchers in [
3] created a method for detecting fires (LDSs). They discovered that by including color, motion, and spatial-temporal features in their model, they could achieve both high detection rates and a considerable reduction in false alarms. We aim to enhance the efficiency of the current fire detection monitoring system, which issues early warning alerts, by employing two different support vector classifier methodologies. To locate forest fires, researchers analyzed the fire’s spatial and temporal dynamic textures [
12]. In a static texture investigation, hybrid surface descriptors were employed to generate a significant feature vector that could differentiate flames or distortions from each other without using conventional texture descriptors. These approaches rely heavily on easily discernible data, such as the presence of flames in images. The appearance of fire is affected by several factors, including its color, movement speed, surroundings, size, and borders. Challenges to using such methods include a poor picture and video quality, adverse weather, and an overcast sky. Therefore, modern supplemental methods must be implemented to enhance current methods.
3. Techniques for Fire Detection Based on Deep Learning Approaches
Recently, several deep learning (DL) techniques have been effectively applied in various fields of fire and face detection research [
13,
14,
15]. In contrast to the manual qualities of the techniques we have studied, DL methods can automate the selection and removal of features. Automatic feature extraction based on learned data is another area where DNNs have proven useful [
16,
17]. Rather than spending time manually extracting functions, developers may instead focus on building a solid dataset and a well-designed neural network.
We have previously presented [
4] a novel DL-based technique for fire detection that uses a CNN with dilated convolutions. To evaluate the efficacy of our approach, we trained and tested it using a dataset we created, which contained photographs of fire that we collected from the web and manually tagged. The proposed methodology is contactless and applicable to previously unseen data. Therefore, it can generalize well and eliminate false positives. Our contributions to the suggested fire detection approach are fourfold: this proposal includes a custom-built dataset, a few layers, small kernel sizes, and dilation filters all used in our experiments. Researchers can find this collection to be a valuable resource for utilizing images of fires and smoke.
To improve feature representations for visual classifications, Ba et al. [
2] created a novel CNN model called Smoke Net that uses spatial and flow attention in CNN. An approach for identifying flames was proposed by Luo et al. [
18], which uses a CNN and smoke’s kinetic characteristics. Initially, they separated the potential candidates into two groups, one using the dynamic frame references from the backdrop and the other from the foreground. A CNN with five convolutional layers plus three fully linked layers then automatically retrieved the candidate pixels’ highlights. Deep convolutional segmentation networks have been developed for analyzing fire emergency scenes, specifically for identifying and classifying items in an image based on their construction information regarding color, a relatively brilliant intensity compared to its surroundings, numerous shifts in form and size, and the items’ propensity to catch fire [
19].
The proposed CNN models enabled a unique picture fire detection system in [
20] to achieve maximum accuracy of 83.7%. In addition, a CNN technique was utilized to improve the performance of image fire detection software [
21,
22,
23,
24]. Algorithms based on DL require large amounts of information for training, verifying, and testing. Furthermore, CNNs are prone to spurious regressions and are computationally expensive due to the large datasets required for training. We compiled a large dataset to address these issues, and the associated image collections will soon be made accessible to the public.
4. Fire Detection Approaches Based on YOLO (You Only Look Once) Networks
YOLO, invented in 2016 by Joseph Redmon et al. [
25], is an object detection system. Built on CNNs, it was developed to be quick, precise, and adaptable. This system comprises a grid of cells that divide an image into regions, a collection of bounding boxes that are employed to detect objects within those regions, and a collection of predefined classes that are associated with those regions. The YOLO system takes an input image and divides it into a grid of cells, with each cell representing a different area of the image. Thereafter, the system analyzes the regions and places them into one of several categories based on the types of objects found there. Once an object has been recognized, the system constructs a bounding box within it and assigns a class to the box. With the object’s identity established, the system can calculate its coordinates, dimensions, and orientation. Park et al. suggested a fire detection approach for urban areas that uses static ELASTIC-YOLOv3 at night [
26]. They recommended using ELASTIC-YOLOv3, which is an improvement on YOLOv2 (which is only effective for detecting tiny objects) and can boost detection performance without adding more parameters at the initial stage of the algorithm. They proposed a method of constructing a movable fire tube that considered the particularities of the flame. However, traditional nocturnal fire flame recognition algorithms have these issues: a lack of color information, a relatively high brightness intensity compared to the surroundings, different changes in shape and size of the flames due to light blur, and movements in all directions. Improved real-time fire warning systems based on advanced technologies and YOLO versions (v3, v4, and v5) for early fire detection approaches have been previously proposed [
27,
28,
29,
30].