SmokeFireNet: A Lightweight Network for Joint Detection of Forest Fire and Smoke

Chen, Yi; Wang, Fang

doi:10.3390/f15091489

Open AccessArticle

SmokeFireNet: A Lightweight Network for Joint Detection of Forest Fire and Smoke

by

Yi Chen

¹

and

Fang Wang

^2,*

¹

School of Arts and Design, Nanjing Vocational University of Industry Technology, Nanjing 210023, China

²

College of Electronic Engineering, Nanjing XiaoZhuang University, Nanjing 211171, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(9), 1489; https://doi.org/10.3390/f15091489 (registering DOI)

Submission received: 30 July 2024 / Revised: 15 August 2024 / Accepted: 23 August 2024 / Published: 25 August 2024

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Applications in Forestry)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, forest fires have been occurring frequently around the globe, affected by extreme weather and dry climate, causing serious economic losses and environmental pollution. In this context, timely detection of forest fire smoke is crucial for realizing real-time early warning of fires. However, fire and smoke from forest fires can spread to cover large areas and may affect distant areas. In this paper, a lightweight joint forest fire and smoke detection network, SmokeFireNet, is proposed, which employs ShuffleNetV2 as the backbone for efficient feature extraction, effectively addressing the computational efficiency challenges of traditional methods. To integrate multi-scale information and enhance the semantic feature extraction capability, a feature pyramid network (FPN) and path aggregation network (PAN) are introduced in this paper. In addition, the FPN network is optimized by a lightweight DySample upsampling operator. The model also incorporates efficient channel attention (ECA), which can pay more attention to the detection of forest fires and smoke regions while suppressing irrelevant features. Finally, by embedding the receptive field block (RFB), the model further improves its ability to understand contextual information and capture detailed features of fire and smoke, thus improving the overall detection accuracy. The experimental results show that SmokeFireNet is better than other mainstream target detection algorithms in terms of average AP_all of 86.2%, FPS of 114, and GFLOPs of 8.4, and provides effective technical support for forest fire prevention work in terms of average precision, frame rate, and computational complexity. In the future, the SmokeFireNet model is expected to play a greater role in the field of forest fire prevention and make a greater contribution to the protection of forest resources and the ecological environment.

Keywords:

climate change; forest fire modeling; ShuffleNetV2; ECA; receptive field

1. Introduction

Forests are the cornerstone of national economic development and are crucial for national economic construction and sustainable development of agroforestry. Nevertheless, the occurrence of forest fires globally has been on the rise due to extreme weather and dry climate in recent years. This has also had severe consequences in China [1,2]. According to reports, a forest fire broke out on 30 March 2020 in Xichang, Liangshan Prefecture, resulting in the loss of 792 hectares of forest and a direct economic loss of CNY 97.31 million, with 19 lives lost and three injuries. Forest fires produce significant amounts of smoke and dust, which severely pollutes the environment. Hence, establishing an efficient forest fire detection system is of utmost importance. This system can detect fires beforehand, allowing for a quick response and suppression, which reduces the harm caused to living beings and property. At the same time, timely detection and control of fires can also protect forest resources, maintain the sustainable development of agroforestry, and promote the healthy and sustainable development of the ecological environment [3].

Amidst the progression in computer vision technology, computer vision-based image processing and pattern recognition have important applications in forest fire smoke detection. Avudaiammal et al. [4] proposed extracting perceptual dynamic features using a color model of forest fires and features such as wavelet energy and grayscale covariance matrices, which were then trained using machine learning classifiers. Sheng et al. [5] proposed a deep belief network (DBN) fire detection method based on statistical image features to address the challenges of different fire stages in complex environments, which can effectively extract the features of fire in the time, frequency, and time–frequency domains and classify them using DBN. Bakri et al. [6] proposed a color pixel classification algorithm to separate fire pixels from the background and detect fire using image enhancement techniques and color models. To improve the early detection of forest fire smoke, Han et al. [7] proposed a semantic segmentation method with multi-color spatial feature fusion to extract complementary smoke features by combining multi-scale features and attention mechanisms. However, despite the progress made by these feature extraction and classification-based techniques in forest fire smoke recognition, there are still some challenges. They pertain to the color, contour, and dynamic texture of the detected objects, which leads to difficulties in accurately extracting the fire and smoke features, thus reducing the accuracy of fire-smoke recognition.

With the swift progress in deep learning within the realm of forest fire and smoke detection, researchers have begun to actively explore the application of various deep learning models in this scenario [8,9]. A convolutional neural network (CNN)-based forest fire smoke detection method has a significant advantage in that it uses a more powerful feature extraction network, which can extract richer, more advanced, and more abstract semantic features, thus effectively improving the detection effect [10,11]. Zhang et al. [12] developed a multi-scale feature extraction model (MS-FRCNN) suitable for detecting small targets in forest fires. Huang et al. [13] proposed a GXLD forest fire detection technique that is lightweight and relies on YOLOX-L and defogging algorithms to address challenges posed by fog-related disruptions. Avula et al. [14] improved fire detection efficacy and reduced false detections by introducing fuzzy entropy optimization thresholds and spatial transformations to CNNs. Xue et al. [15] proposed the small-target forest fire detection model, which solved the problem where the forest fire image model captured at a long distance cannot learn effective information. Recognition is difficult due to the complex texture of smoke and interference factors in the forest environment. Li et al. [16] proposed a high-precision edge-focused detection network. The network enhances the extraction of global texture features by introducing Swin Multidimensional window extractor (SMWE), reduces redundant information using a guillotine feature pyramid network (GFPN), and reduces boundary blurring using a contour adaptive loss function. Chen et al. [17] proposed a lightweight forest fire and smoke early detection method based on GS-YOLOv5, which effectively reduces the model parameters and false-alarm rate and improves the detection accuracy by introducing the Super-SPPF structure, the C3Ghost module and the coordinate attention module. The deep learning-based algorithm has a superior feature extraction network compared to the traditional fire smoke detection algorithm. As a result, this leads to the extraction of more comprehensive semantic feature details and a significant enhancement in the model’s detection performance.

Previous research has tended to focus on a single fire or smoke as the basis for detection; however, there are several limitations to this approach. Given the intricate and diverse nature of forest surroundings, smoke can affect the visibility of fire, leading to false or missed alarms. In large wildfires, smoke can act as a visual barrier to fire, making it difficult to accurately detect fire. In the literature [18,19], authors have explored the limitations of single-fire or smoke detection, emphasizing that environmental disturbances such as light variations and meteorological conditions may affect the visibility of fire and smoke, which in turn reduces the accuracy of detection. To tackle these challenges, this paper adopts a joint fire and smoke detection strategy, aiming to detect both fire and smoke simultaneously to enhance the precision and dependability of detecting forest fire and smoke.

The above studies have some limitations, such as low model detection rate, poor real-time performance, high computational complexity, as well as the ability to detect only fire or smoke and not joint detection. In order to solve these problems, this paper proposes an efficient, accurate and real-time method for joint detection of forest fire and smoke, and presents a lightweight detection network, SmokeFireNet, which adopts ShuffleNetV2 as the backbone and combines advanced techniques such as FPN, PAN, RFB, and ECA. The network uses ShuffleNetV2 as the backbone and combines advanced technologies such as FPN, PAN, RFB, ECA, and DySample to achieve efficient, accurate, and real-time forest fire and smoke detection. Compared with existing research, SmokeFireNet has the following advantages. (1) Joint detection: SmokeFireNet detects fire and smoke at the same time, fully considering the mutual influence between the two, and improving the accuracy and reliability of detection. (2) Lightweight design: SmokeFireNet adopts the lightweight ShuffleNetV2 backbone network and combines with DySample and other technologies to reduce the computational complexity of the model and the number of parameters, which meets the demand for real-time detection. (3) Multi-scale feature fusion: SmokeFireNet introduces the FPN and PAN structures to fuse features of different scales to better understand the contextual information of fire and smoke and improve the detection accuracy. (4) Attention mechanism: SmokeFireNet introduces RFB and ECA mechanisms to better capture the detailed features of fire and smoke and suppress irrelevant features to further improve the detection accuracy. The experimental results show that SmokeFireNet outperforms other mainstream target detection algorithms in terms of average accuracy, frame rate, and computational complexity, which provides effective technical support for forest fire prevention and new ideas and methods for future research.

The rest of the paper is organized as follows. In Section 2, we describe in detail the process of building the forest fire and smoke datasets, the construction of the SmokeFireNet model, and the model performance evaluation metrics. In Section 3, we compare the experimental results of different models, analyzing in detail the experimental comparison of models of the attention mechanism, the performance of the models at different resolutions, the results of the ablation experiments, the comparison of different data enhancement methods, and the detection effect of the models. In Section 4, we discuss the limitations of the current model and outlines potential future research directions. In Section 5, we conclude the paper and summarize the key contributions of SmokeFireNet to the field of forest fire and smoke detection.

2. Materials and Methods

2.1. Forest Fire Smoke Dataset

The datasets utilized in this paper were obtained primarily from institutions such as the Machine Intelligence Laboratory of the University of Salerno in Italy, the CV&PR Lab of Keimyung University in South Korea, and Yuan Fei Niu Team Lab. The collected video data were processed and filtered frame by frame, resulting in the acquisition of 1776 images of forest fire smoke. The dataset constructed in this study is mainly used to detect smoke and fire in the spreading stage of forest fire. A section of the sample dataset illustrated in Figure 1 was selected from them.

2.2. Data Enhancement

The sample scale of the forest fire smoke dataset obtained above is small, and is not enough to support the effective training and evaluation of deep learning models. To solve the problem of the small sample, this paper adopts three image enhancement methods according to the characteristics of forest fire. These enhancement methods aim to generate more diverse image samples, thus improving the robustness and accuracy of the model in different scenarios [20].

(1): Wind Dynamic Orientation Adjustment (WDOA)

During the construction of the aerial forest fire dataset, WDOA is used to simulate the changes in target objects in different orientations and angles in the images due to wind effects and different shooting angles. The irregular fire patterns and smoke distribution of forest fires due to wind effects lead to different visual characteristics of images taken at different angles and orientations. Therefore, by modeling the effect of wind on fire and smoke morphology in this way, the diversity of the dataset is increased to enable the model to better understand forest fires at different angles and orientations.

(2): Illumination Condition Modulation Simulation (ICMS)

During the construction of the aerial forest fire dataset, there may be changes in lighting conditions due to images taken at different times, resulting in different brightness and contrast of the images. The simulation of light condition modulation can make the images in the dataset more diverse and allow the model to learn the characteristics of the fire under different light conditions. As a result, the model trained in this way will be more robust and better able to adapt to different lighting conditions that may occur in practical applications.

(3): Photographic Equipment Vibration Simulation (PEVS)

During the construction of the aerial forest fire dataset, image blurring is caused by the UAV due to motion or instability during the shooting process. Simulating this blurring can help the model to better cope with the problems that may be encountered in the real world and improve its robustness and generalization ability in real scenarios.

The example images after data augmentation are shown in Figure 2. These images cover a wide range of states and scenarios of forest fires and smoke, including fire and smoke situations with different lighting conditions and different angles. Such a dataset enriches the diversity of samples, which helps the training model to better understand and adapt to forest fire and smoke situations in different environments and improves the generalization ability and detection accuracy of the model.

After data augmentation, we obtained a dataset containing 4000 images and labeled the images, in which the number of labels for “fire” and “smoke” is 2946 and 2624, respectively. In the experiments, the dataset is divided into training and validation sets in the ratio of 8:2. Table 1 shows the number of images and labels in the training and validation sets.

2.3. SmokeFireNet

In this paper, we present SmokeFireNet, a network designed to detect forest fire smoke, which is shown in Figure 3. SmokeFireNet consists of input, backbone, neck, and head. To curtail the computational intricacy of the network and the number of model parameters, SmokeFireNet adopts a lightweight network, ShuffleNetV2, as the fundamental network for extracting features, which meets the requirements of devices with limited computational resources while maintaining efficient performance. The feature pyramid network (FPN) [21] and path aggregation network (PAN) [22] structures are introduced in the neck to fuse the feature of different layers, along with ECA, RFB and DySample. Finally, the head layer undertakes the dual tasks of classifying and regressing the multi-scale feature layer. This is achieved by adapting the channel count, which in turn enables precise detection and localization of fire and smoke targets.

2.3.1. Backbone

ShuffleNet [23] is a lightweight convolutional neural network for computationally limited devices, which uses pointwise group convolution to reduce the amount of computation and the number of parameters, and at the same time uses channel shuffle operation to solve the problem of non-shared exchange of feature information between groups caused by group-by-group convolution, so as to realize the exchange of channel information. However, ShuffleNetV1 [23] violates the four efficient network design principles proposed in the literature; therefore, Zhang et al. [24] proposed a ShuffleNetV2 network using a channel splitting operation. In Figure 4a, the base unit uses a configuration with a step size of 1. The feature channel is divided into two by a channel division technique. One branch retains the original features, and the other branch extracts the features through 1 × 1 and 3 × 3 convolution operations. By this channel division method, these two branches achieve the interaction of information. In Figure 4b, the spatial downsampling unit with a step size of 2 does not use channel division, but downsampling is performed by deep convolution with a step size of 2. The two branches are then merged, resulting in a halving of the feature space size while doubling the number of channels.

The structure of the backbone network is shown in Figure 5. It begins with an initial downsampling of the input image using standard 3 × 3 convolution and maximum pooling operations to reduce the image resolution and extract initial features. Subsequently, the network enters three stages, each of which contains a repeated stack of ShuffleNet’s basic and downsampling units. When the step size of the unit is set to 2, it means that the downsampling unit is used to further extract features by reducing the output size, and when the step size is 1, it means that the features are extracted using the base unit without downsampling. After three stages of feature extraction, the network channels the output features of stages 1 to 3 by convolution operation, and finally obtains three effective feature layers with sizes (80 × 80 × 64), (40 × 40 × 128), and (20 × 20 × 256), labeled C1, C2, and C3, respectively.

2.3.2. Neck

In the context of convolutional neural networks, the dimensions of the feature diminish progressively with the augmentation in the quantity of network layers, which can cause the position information of small target fires to become blurred, leading to missed detection. To tackle this challenge, this study adopts a multi-scale fusion strategy in the neck network and introduces a feature pyramid structure with neck layers as shown in Figure 6. The backbone network generates multi-scale features (denoted C1, C2, C3) at different network layers. The feature pyramid FPN fuses the higher-level features with the lower-level features in a top-down manner, such as through upsampling or lateral connectivity, to counterbalance the deficiency in semantic information within the lower-level features. On this basis, the path aggregation network (PAN) further introduces bottom-up paths to pass shallow features to higher layers, which enhances the network’s ability to localize at different scales, enabling the network to better capture the location information of targets at different scales, especially small targets.

DySample [11] is a novel dynamic upsampler that uses point sampling as its core idea and achieves an efficient and lightweight upsampling process through content-aware sample point generation and offset range control. The biggest difference between DySample and other upsampling operators is its design based on point sampling, which gives it a significant advantage in terms of light weight, efficiency, and ease of use. Compared to kernel-based dynamic up samplers, DySample eliminates the need for complex kernel generation and convolution operations, has fewer parameters, a lower computation and memory footprint, and an inference speed close to that of bilinear interpolation. At the same time, DySample only needs a low-resolution feature map as input, without the need for high-resolution bootstrap features, making it more flexible and easier to use.

In order to enhance the network’s ability in recognizing forest fires and smoke and to gain a deeper understanding of their intrinsic contextual properties and overall attributes, this study introduces receptive field block(RFB) [25] after feature C3 to broaden the network’s receptive field. As shown in Figure 7, the RFB module mainly consists of multi-branched small convolutional kernel layers and expanded convolutional layers. These small convolutional kernel layers include 3 × 3, 1 × 3, and 3 × 1 convolutions, which help to reduce the number of parameters and computational burden of the model. In addition, to improve the resolution of the features, each standard convolutional branch is combined with a dilated convolutional branch, and each dilated convolutional layer is set to a different dilation rate. The role of the dilated convolutional layer is to mimic the eccentricity effect of human vision so that the network can capture the complex details in the image more comprehensively. The RFB module ultimately splices the features from all the branches and integrates them into a convolutional feature set. Through this feature splicing and integration, the RFB module can extract image features more efficiently, providing a more accurate and efficient feature description for forest fire smoke detection.

Forest fire and smoke images have few pixels and information is easily lost, in which case the network can easily ignore these small targets, resulting in the phenomenon of missed detection and misdetection. To solve this problem, the network’s ability to perceive fire and smoke targets is enhanced, focusing especially on small targets. As shown in Figure 6, this paper adds the ECA mechanism [26] to the output features C2 and C3 of the backbone network and PAN, respectively, which enables the network to focus more on the important feature channels by adaptively weighting the feature responses of the channel dimensions to improve the perception capability for forest fire and smoke.

A schematic depiction of the ECA mechanism is given in Figure 8. The ECA mechanism first distills the global representation of each channel through global average pooling to capture the core information of the channel. Then, the size of the 1D convolution kernel is adaptively determined according to the number of channels to accurately capture the inter-channel dependencies without adding additional parameters. Next, the convolution kernel is applied to the pooled features to achieve localized modeling of inter-channel relationships. After a sigmoid activation function, these features are transformed into weights between 0 and 1, reflecting the importance of each channel. Finally, these weights are multiplied with the original feature maps to complete the dynamic recalibration of the feature maps, retaining the important feature information and suppressing the noise. The ECA mechanism is concise and efficient, which enhances the network’s focus on the key features and improves the accuracy and efficiency of the feature extraction.

2.3.3. Loss Function

Designing a loss function is integral to this target detection algorithm, as it measures the discrepancy between the predicted and labeled fire and improves the model parameters by minimizing the loss function, hence enhancing the accuracy of the detected targets. The loss function of this model consists of three components: bounding box regression loss, confidence regression loss, and classification loss.

(1): Bounding Box Prediction

This paper calculates the bounding box loss using SIoU_Loss [27], a loss function that considers not only the overlapping area, distance, and aspect but also the vector angle between the real and predicted boxes. Compared to other loss functions such as CIoU, DIoU [28], or GIoU [29], SIoU_Loss redefines the associated loss function by dividing it into four parts: angle loss, distance loss, shape loss, and IoU loss. A schematic of the bounding box losses is shown in Figure 9.

Λ = 1 - 2 \times {s i n}^{2} (\arcsin (x) - \frac{π}{4}),

(1)

x = \frac{C_{h}}{σ} = \sin α,

(2)

σ = \sqrt{(b_{C_{x}}^{g t} - b_{C_{x}})^{2} + (b_{C y}^{g t} - b_{C y})^{2}},

(3)

C_{h} = \max (b_{c_{x}}^{g t}, b_{c_{x}}) - \min (b_{c_{y}}^{g t}, b_{c_{y}}),

(4)

Distance loss:

Δ = \sum_{t = x, y} (1 - e^{- γ p t}),

(5)

ρ_{x} = {(\frac{b_{C_{x}}^{g t} - b_{C_{x}}}{C_{w}})}^{2}, ρ_{y} = {(\frac{b_{C y}^{g t} - b_{C y}}{C_{h}})}^{2},

(6)

ϒ = 2 - Λ,

(7)

Shape loss:

Ω = \sum_{t = w, h} (1 - e^{- ω_{t}})^{θ},

(8)

ω_{w} = \frac{|w - w^{g t}|}{\max (w, w^{g t})},

(9)

S I o U = 1 - I o U + \frac{Δ + Ω}{2},

(10)

(2): Class Prediction

In this paper, the detection object is only two categories of forest fire and smoke, so the confidence regression loss adopts the binary cross-entropy loss function, which is defined as follows:

obj_l o s s = \sum_{i = 0}^{s^{2}} \sum_{j = 0}^{N} I_{i j}^{o b j} [{C_{i}}^{*} \log (C_{i}) + (1 - {C_{i}}^{*}) \log (1 - C_{i})] - \sum_{i = 0}^{s^{2}} \sum_{j = 0}^{N} I_{i j}^{n o o b j} [{C_{i}}^{*} \log (C_{i}) + (1 - {C_{i}}^{*}) \log (1 - C_{i})],

(11)

where

C^{*}

and

C

denote the forest fire smoke confidence and true labels, respectively;

N

denotes the number of a priori frames corresponding to each grid point; and

I_{i j}^{n o o b j}

denotes that no forest fire smoke target to be detected exists in this a priori frame.

(3): Confidence Prediction

The classification loss also uses the binary cross-entropy loss function, which is defined as follows:

c l s_L o s s = \sum_{i = 0}^{s^{2}} I_{i j}^{o b j} \sum_{c \in c l a s s e s} [P_{c}^{*} \log (P_{c}) + (1 - P_{c}^{*}) \log (1 - P_{c})],

(12)

where

P^{*}

denotes the category prediction probability of the prior frame,

P_{c}

denotes the category label of the labeled frame,

I_{i j}^{o b j}

used to determine whether the prior frame with coordinates (

i, j

) is a positive sample or not, and non-zero is one.

L o s s = S I o U_{L o s s} + o b j_{L o s s} + c l s_{L o s s},

(13)

2.4. Training

In the experimental phase, to ensure the reproducibility and reliability of the results, this study was conducted in the following strictly configured environment: using Python 3.9 as the programming language, executed on a Windows 11 operating system; model training and evaluation with the help of PyTorch 2.0.1 Deep Learning framework; and optimizing the computational resources using NVIDIA GeForce RTX 3060 GPUs and a 12th-generation Intel Core i3-12100F CPU to optimize computational resources. The 12th-generation Intel Core i3-12100F CPU is manufactured by Intel Corporation. Intel Corporation is headquartered in Santa Clara, CA, USA.

To train efficient forest fire and smoke detection models, a set of experimental parameters are selected in this paper. After 300 training cycles of iterations, the data size of each batch is set to 32, which balances training efficiency and memory requirements. The image size is 640 × 640 to ensure that the model adapts to different target sizes. The initial learning rate is 0.01 and the SGD optimization algorithm is used to ensure that the model converges quickly and achieves a high level of performance. During the training process, we use data enhancement and regularization techniques to improve the generalization ability and robustness of the model. Data augmentation increases the diversity of the training data by simulating different variations in light, color, angle, position, and size, enabling the model to better adapt to various scenarios. Regularization techniques, on the other hand, prevent overfitting by limiting model complexity and adjusting the training data distribution, thus improving the generalization ability of the model. For example, weight decay prevents model overfitting, and label smoothing techniques reduce the model’s dependence on training data labels. Setting these parameters appropriately effectively enhances the performance of the model in real applications.

In addition, to ensure the performance of the model under different input image sizes, the adaptability and robustness of the model for different sizes of image inputs can be determined by testing multiple resolutions. Therefore, in our experiments we set up three different resolutions: 224 × 224, 416 × 416, 640 × 640, 800 × 800, and 1024 × 1024. The performance of the model under different conditions is evaluated using images with different resolutions to select the model and input resolution that is most suitable for a particular application scenario.

2.5. Evaluation Metrics

In this paper, the performance of the model is measured using [email protected], which measures the average accuracy of the model at an IoU threshold of 0.5 on different target categories. In other words, it evaluates the performance of the model in accurately localizing and classifying targets in the image. Higher [email protected] values indicate better target detection performance. In this paper, we denote the [email protected] of forest fire and smoke AP_fire and AP_smoke, respectively, and the total model [email protected] AP_all.

To meet the real-time and accuracy requirements of forest fire and smoke detection, we use the GFLOPs and FPS of the model as another evaluation metric to assess the computational efficiency and processing speed of the model in real applications. GFLOPs measure the computational complexity and processing power of the model, while FPS indicates the number of image frames per second that the model can process. Higher GFLOPs of a model means that the model is more computationally intensive and takes longer to perform inference. Therefore, in resource-constrained situations, the GFLOPs and FPS of a model need to be considered to ensure a balance between performance and efficiency.

3. Results

3.1. Comparison of Different Detection Models

In order to verify the effectiveness of SmokeFireNet proposed in this paper, we compare it with the current mainstream target detection algorithms Faster R-CNN [30], SSD [31], YOLOv5, YOLOv7 [32], ShuffleNetv2 [23], and MobileNetv3 [33]. The results of the comparison experiments are shown in Table 2.

From the experimental data, SmokeFireNet performs well in all indicators, especially in the average precision (AP). Its AP_all reaches 86.2, AP_fire 82.3, and AP_smoke 90, which are the highest values. In addition, SmokeFireNet achieves a frame rate (FPS) of 114, second only to ShuffleNetv2’s 121, and maintains a low GFLOPs of 8.4, achieving a balance between performance and efficiency. In contrast, Faster R-CNN performs well in terms of average accuracy, but has the lowest FPS of 98 and the highest computational complexity of 24.5, which makes it unsuitable for real-time applications. YOLOv5, YOLOv7-tiny, and MobileNetv3 also perform well in some of the metrics, but when considering the accuracy, frame rate, and computational complexity, the performance of SmokeFireNet is not as good. SmokeFireNet is undoubtedly the best choice when considering accuracy, frame rate, and computational complexity.

3.2. Performance Comparison of Different Attention Mechanism Models

Different attentional mechanisms have their own unique characteristics and performance, so comparing them can help us identify the best-performing mechanism in the forest fire detection task. By evaluating their impact on model performance, we can select the most suitable attention mechanism for our task. To validate the impact of the attention mechanisms cited in this paper on the model, this section introduces several different attention methods, including GAM (global attention mechanism) [34], CBAM (convolutional block attention module) [35], SimAM (simple attention mechanism) [36], and ECA (efficient channel attention) [26], generates the corresponding feature maps, and visualizes them as heat maps. The visualization results are shown in Figure 10.

With these visualization results, we observe that the model focuses more on the to-be-detected region after the introduction of the ECA mechanism, whereas in the baseline network without the introduction of the attention mechanism, the model also focuses on other irrelevant regions. At the same time, we find that other attention mechanisms do not fully focus on the to-be-detected region and are not as effective as they should be for the tasks in this paper. Therefore, ECA is selected as the attention mechanism in this paper.

3.3. Experimental Comparison of Different Resolutions

To further analyze the effect on the model under different input sizes, this section tests the model with five different resolutions of input images: 224 × 224, 416 × 416, 640 × 640, 800 × 800, and 1024 × 1024.Higher-resolution images usually contain more detailed information, so the model can obtain richer visual features from them. In contrast, low-resolution images may lose some subtle features, leading to a decrease in model performance. By evaluating these images at different resolutions, we can gain insight into the model’s performance under different input conditions and determine its adaptability and robustness to images of different sizes. This careful analysis helps to optimize the design of the model and improve its performance and reliability in various application scenarios. The specific experimental results are shown in Table 3.

By analyzing the experimental results in Table 2, we find that the average precision (AP) of the model shows a growing trend as the resolution of the input image increases. Higher-resolution images provide more detail and clarity, allowing the model to capture the features and shape of the target more accurately. Therefore, ideally, higher resolution usually leads to higher detection accuracy. However, increased resolution is accompanied by increased computational costs, including model inference time and required computational resources. On the contrary, at lower resolutions, the computational effort of the model is reduced and the inference speed is accelerated, but the low resolution of the input image results in the loss of image detail information, which in turn affects the model’s detection accuracy of fire and smoke. Nevertheless, since smoke occupies a larger pixel value on low-resolution images, while forest fires usually contain many small targets, the decrease in smoke accuracy is smaller compared to forest fires. In summary, low-resolution images are more suitable for embedded devices and resource-constrained environments. To make a choice based on the trade-off between detection accuracy and detection speed, an input image resolution of 640 × 640 is selected for training in this paper to ensure that real-time performance is maintained while maintaining high detection accuracy.

3.4. Ablation Experiments

To validate the importance of ShuffleNetv2, FPN, PAN, ECA, RFB, and DySample on the forest fire smoke detection task and to explore the impact of these modules on the model performance, this section evaluates the model performance by adding step-by-step ShuffleNetv2, FPN, PAN, RFB, DySample, and ECA modules. By introducing these modules step-by-step, we can analyze their contribution to the model performance and understand their role in improving the detection results. This step-by-step evaluation approach helps us to gain a deeper understanding of the impact of each module on the model performance and provides useful guidance for further optimization and improvement of the model. The ablation experiment results are shown in Table 4.

As can be seen from the ablation experimental data in the table, with the gradual improvement in the model structure, the individual performance metrics (AP_all, AP_fire, and AP_smoke) are significantly improved, while the GFLOPs increase accordingly. The performance of the base model ShuffleNetv2 is low, with AP_all, AP_fire, and AP_smoke of 80.7, 76.8, and 84.6, respectively, and GFLOPs of 6.6. The addition of FPN and PAN improves the performance, with AP_all increasing to 81.9, and AP_fire and AP_smoke increasing to 77.7 and 86.1, and the computational complexity increases to 7.3. Further addition of the RFB module increases AP_all to 83.9, AP_fire and AP_smoke to 78.4 and 89.4, respectively, and the computational complexity increases to 7.9. The addition of the DySample module results in a significant increase in AP_all to 85.6, AP_fire to 81.6, and AP_smoke increased slightly to 89.6, and GFLOPs were 8.3. The final addition of the ECA module resulted in the highest performance, with AP_all, AP_fire, and AP_smoke at 86.2, 82.3, and 90, respectively, and a slight increase in computational complexity to 8.4. The addition of each of the modules improved the model performance to varying degrees, and the final combination achieved the optimal performance, indicating that these modules have a significant effect on the improvement of the comprehensive model performance.

These ablation experimental results show that the gradual introduction of new components and techniques can effectively improve the performance of the forest fire smoke detection model, and the final model combines a variety of advanced techniques to provide a reliable solution for real application scenarios.

3.5. Detection Effect under Different Data Augmentation

The dataset used in this paper has been processed by three data enhancement methods: WDOA, ICMS, and PEVS. WDOA simulates the effects of wind on the target object and the changes in the target object in different directions and angles in the image brought by different shooting angles. ICMS simulates the fire scene under different lighting conditions so that the model can better adapt to different lighting environment. PEVS simulates the image blurring situation caused by motion or instability factors during UAV shooting to increase the robustness of the model to motion blur. Through these data enhancement methods, we improve the generalization ability of the model for diverse scenarios, which makes its application in real environments more robust and reliable. Figure 11 shows the effect of forest fire smoke detection of the model trained with data enhancement in the above three different scenarios, which further validates the effectiveness of data augmentation and the generalization ability of the model.

3.6. Forest Fire and Smoke Detection Performance Analysis

In analyzing the experimental results, we observed the detection performance under different scenarios. In Figure 12a, the situation of smoke-only detection is shown, and we can observe the model’s effective detection of smoke regions. However, if we focus only on fire detection, it might be difficult to detect forest fires in this scenario. In the context of Figure 12b, smoke could potentially act as a visual barrier to the fire, making it difficult to detect fire. If a single-fire detection strategy is used, there is a possibility that fire incidents could be missed. In addition, we considered the detection performance of small targets. Figure 12c,d illustrates the detection results of tiny fires, with the model accurately pinpointing the positions of these small fires. The experimental results unequivocally showcase the outstanding detection capabilities of our proposed model across diverse scenarios. It can skillfully discriminate between smoke and fire and can achieve accurate detection even in cases involving small target objects.

4. Discussion

In this study, we present SmokeFireNet, an efficient and accurate network for forest fire smoke detection. By adopting the lightweight ShuffleNetV2 backbone network, multi-scale feature fusion, RFB module, ECA mechanism and DySample up-sampling operation, SmokeFireNet achieves accurate identification of fire and smoke, while considering the lightweight design of the model to satisfy the real-time detection requirements. In addition, the application of data enhancement methods further improves the robustness and adaptability of the model.

The SmokeFireNet model proposed in this paper performs well in forest fire and smoke detection and has obvious advantages over other mainstream target detection algorithms in terms of AP, FPS, and GFLOPs. SmokeFireNet achieves the highest values in the three metrics, namely, 86.2%, 82.3%, and 90%, for AP_all, AP_fire, and AP_smoke, respectively, proving its accuracy in forest fire and smoke detection. SmokeFireNet achieves the highest values in AP_all, AP_fire, and AP_smoke with 86.2%, 82.3%, and 90%, respectively, proving its accuracy in fire and smoke detection. In addition, the FPS of SmokeFireNet reaches 114, which is second only to ShuffleNetv2 and much higher than other models and meets the demand of real-time detection. The GFLOPs of the model are 8.4, which is much lower than that of Faster R-CNN and comparable to that of ShuffleNetv2 and MobileNetv3, which ensures high performance while considering the lightweight design of the model.

The SmokeFireNet model has achieved remarkable results in forest fire and smoke detection, but still has some limitations. Firstly, the model relies on high-quality datasets and its adaptability in complex scenarios needs to be improved. In addition, the model still has room for optimization in terms of weight and real-time performance. To further enhance the usefulness of SmokeFireNet in the field of forest fire prevention, future improvements can be made in the following areas. Firstly, the performance of the model in different geographical regions and fire conditions is crucial for its practical application. Although our results show high accuracy, future research should focus on the robustness of the model in various environments, such as dense forests, arid areas, and different climatic conditions. This may require testing on datasets for these specific conditions. Secondly, the adaptability of SmokeFireNet in complex scenarios, such as nighttime detection, detection in dense fog conditions, or smoke recognition in the presence of tree cover, still needs to be improved. This can be achieved by employing more advanced data enhancement techniques and introducing domain adaptation methods to fine-tune the model for specific environmental conditions.

5. Conclusions

In this paper, we propose a model for the detection of forest fire and smoke. A joint fire and smoke detection strategy is used to simultaneously detect fire and smoke to improve the accuracy and reliability of forest fire smoke detection. ShuffleNetV2 is used as the backbone network to realize the lightweight design of the model. Second, the RFB module is introduced to strengthen the integration of multi-scale features within the network, which improves the accuracy of the model in forest fire and smoke detection tasks through spatial pyramid pooling and multi-scale feature fusion. We introduce the ECA mechanism to enhance the information interaction between different channels in the feature map, which improves the model’s ability to discriminate target features.

The experimental results show that this research provides an efficient and accurate solution in the field of forest fire and smoke detection. The SmokeFireNet network architecture and the adopted technological strategies bring new ideas and methods for realizing real-time and efficient target detection tasks. In order to improve the practical application value of the model, future research should focus on the following areas: exploring in-depth data enhancement techniques to improve the model generation capability; adopting domain adaptation strategies to improve the performance of the model in new environments; continually optimizing the model architecture to achieve a more efficient and lightweight design; exploring the potential of multi-task learning, such as fire severity and smoke density assessment; conducting extensive field testing to validate the model’s performance in real-world operational environments and collect data to facilitate continuous model improvement. Through these efforts, we aim to establish SmokeFireNet as a key technology tool in the field of forest fire prevention and management.

Author Contributions

Y.C. was responsible for program design and drafting the initial manuscript. F.W. designed the project and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (32101535) and the Start-up Fund for New Talented Researchers of Nanjing Vocational University of Industry Technology (Grant No. YK22-05-01).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, X.; Li, F.; Lin, Z.; Song, X. Holocene fire history in China: Responses to climate change and human activities. Sci. Total Environ. 2021, 753, 142019. [Google Scholar] [CrossRef] [PubMed]
Yuan, D.; Chang, X.; Huang, P.-Y.; Liu, Q.; He, Z. Self-supervised deep correlation tracking. IEEE Trans. Image Process. 2020, 30, 976–985. [Google Scholar] [CrossRef] [PubMed]
Verma, S.; Purswani, E.; Khan, M.L. Collaborative Governance and Nonmonetary Compensation Mechanisms for Sustainable Forest Management and Forest Fire Mitigation. In Anthropogenic Environmental Hazards: Compensation and Mitigation; Springer: Berlin/Heidelberg, Germany, 2023; pp. 223–244. [Google Scholar]
Avudaiammal, R.; Rajangam, V.; Durai Raji, V.; Senthil Kumar, S. Color Models Aware Dynamic Feature Extraction for Forest Fire Detection Using Machine Learning Classifiers. Autom. Control Comput. Sci. 2023, 57, 627–637. [Google Scholar] [CrossRef]
Sheng, D.; Deng, J.; Zhang, W.; Cai, J.; Zhao, W.; Xiang, J. A statistical image feature-based deep belief network for fire detection. Complexity 2021, 2021, 5554316. [Google Scholar] [CrossRef]
Bakri, N.S.; Adnan, R.; Ruslan, F.A. A methodology for fire detection using colour pixel classification. In Proceedings of the 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), Pulau Pinang, Malaysia, 9–10 March 2018; pp. 94–98. [Google Scholar]
Han, Z.; Tian, Y.; Zheng, C.; Zhao, F. Forest Fire Smoke Detection Based on Multiple Color Spaces Deep Feature Fusion. Forests 2024, 15, 689. [Google Scholar] [CrossRef]
Chen, Z.; Zhou, H.; Lin, H.; Bai, D. TeaViTNet: Tea Disease and Pest Detection Model Based on Fused Multiscale Attention. Agronomy 2024, 14, 633. [Google Scholar] [CrossRef]
Yao, X.; Lin, H.; Bai, D.; Zhou, H. A Small Target Tea Leaf Disease Detection Model Combined with Transfer Learning. Forests 2024, 15, 591. [Google Scholar] [CrossRef]
Wang, G.; Bai, D.; Lin, H.; Zhou, H.; Qian, J. FireViTNet: A hybrid model integrating ViT and CNNs for forest fire segmentation. Comput. Electron. Agric. 2024, 218, 108722. [Google Scholar] [CrossRef]
Lin, H.; Qian, J.; Di, B. Learning for Adaptive Multi-Copy Relaying in Vehicular Delay Tolerant Network. IEEE Trans. Intell. Transp. Syst. 2023, 25, 3054–3063. [Google Scholar] [CrossRef]
Zhang, L.; Wang, M.; Ding, Y.; Bu, X. MS-FRCNN: A Multi-Scale Faster RCNN Model for Small Target Forest Fire Detection. Forests 2023, 14, 616. [Google Scholar] [CrossRef]
Huang, J.; He, Z.; Guan, Y.; Zhang, H. Real-time forest fire detection by ensemble lightweight YOLOX-L and defogging method. Sensors 2023, 23, 1894. [Google Scholar] [CrossRef] [PubMed]
Avula, S.B.; Badri, S.J.; Reddy, G. A novel forest fire detection system using fuzzy entropy optimized thresholding and STN-based CNN. In Proceedings of the 2020 International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, India, 7–11 January 2020; pp. 750–755. [Google Scholar]
Xue, Z.; Lin, H.; Wang, F. A small target forest fire detection model based on YOLOv5 improvement. Forests 2022, 13, 1332. [Google Scholar] [CrossRef]
Li, R.; Hu, Y.; Li, L.; Guan, R.; Yang, R.; Zhan, J.; Cai, W.; Wang, Y.; Xu, H.; Li, L. SMWE-GFPNNet: A high-precision and robust method for forest fire smoke detection. Knowl.-Based Syst. 2024, 289, 111528. [Google Scholar] [CrossRef]
Chen, Y.; Li, J.; Sun, K.; Zhang, Y. A lightweight early forest fire and smoke detection method. J. Supercomput. 2024, 80, 9870–9893. [Google Scholar] [CrossRef]
Bahhar, C.; Ksibi, A.; Ayadi, M.; Jamjoom, M.M.; Ullah, Z.; Soufiene, B.O.; Sakli, H. Wildfire and Smoke Detection Using Staged YOLO Model and Ensemble CNN. Electronics 2023, 12, 228. [Google Scholar] [CrossRef]
Sathishkumar, V.E.; Cho, J.; Subramanian, M.; Naren, O.S. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol. 2023, 19, 9. [Google Scholar] [CrossRef]
Zhang, W.; Kinoshita, Y.; Kiya, H. Image-enhancement-based data augmentation for improving deep learning in image classification problem. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), Taoyuan, Taiwan, 28–30 September 2020; pp. 1–2. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Koonce, B. MobileNetV3. In Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Apress: New York, NY, USA, 2021; pp. 125–144. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]

Figure 1. Representative images of forest fire smoke dataset.

Figure 2. Images after data augmentation: (a) original image; (b) wind dynamic orientation adjustment; (c) illumination condition modulation simulation; (d) photographic equipment vibration simulation.

Figure 3. Structure of SmokeFireNet model.

Figure 4. (a) The basic ShuffleNet unit; (b) the ShuffleNet unit for spatial downsampling.

Figure 5. Structure of backbone.

Figure 6. Structure of neck.

Figure 7. Structure of RFB.

Figure 8. Diagram of ECA module.

Figure 9. Boundary box loss schematic.

Figure 10. Visualization of attention mechanism feature map.

Figure 11. Visualization of different data augmentation. The figure uses red boxes for fire and blue boxes for smoke.

Figure 12. Detection results in different scenarios. The figure uses red boxes for fire and blue boxes for smoke. (a) shows the detection of smoke only, (b) shows the coexistence of flame and smoke, and (c,d) illustrates the results of small fire.

Table 1. Target numbers in dataset.

Distribution	Training	Validation
Images	3200	800
Fire	2357	589
Smoke	2113	511

Table 2. Experimental comparison of different models.

Model	AP_all	AP_fire	AP_smoke	FPS	GFLOPs
Faster R-CNN	85.2	82.1	88.3	98	24.5
YOLOv5	84.4	81.5	87.3	109	17.16
YOLOv7-tiny	83.5	80.7	86.3	112	13.2
ShuffleNetv2	84.2	81.8	86.6	121	5.6
MobileNetv3	84.1	81.7	86.5	106	18.22
SmokeFireNet (Ours)	86.2	82.3	90	114	8.4

AP: Measures the model’s accuracy in localization and classification across different target classes. FPS: Represents the model’s inference speed. GFLOPs: Measures the computational complexity of the model.

Table 3. Experimental results at different resolutions.

Input Resolution	AP_all	AP_fire	AP_smoke	FPS	GFLOPs
224 × 224	81.1	74.3	87.8	168	1.6
416 × 416	84.9	79.8	90	121	6.4
640 × 640 (ours)	86.2	82.3	90	114	8.4
800 × 800	86.8	83.4	90.2	97	21.3
1024 × 1024	86.9	83.6	90.1	63	36.3

AP: Measures the model’s accuracy in localization and classification across different target classes. FPS: Represents the model’s inference speed. GFLOPs: Measures the computational complexity of the model.

Table 4. Results of ablation experiments.

Model	AP_all	AP_fire	AP_smoke	GFLOPs
ShuffleNetv2 (Baseline)	80.7	76.8	84.6	6.6
Baseline + FPN + PAN	81.9	77.7	86.1	7.3
Baseline + FPN + PAN + RFB	83.9	78.4	89.4	7.9
Baseline + FPN + PAN + RFB + DySample	85.6	81.6	89.6	8.3
Baseline + FPN + PAN + RFB + DySample + SIoU	85.9	82.2	89.6	8.4
Baseline + FPN + PAN + RFB + DySample + SIoU + ECA (Ours)	86.2	82.3	90	8.4

AP: Measures the model’s accuracy in localization and classification across different target classes. FPS: Represents the model’s inference speed. GFLOPs: Measures the computational complexity of the model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Wang, F. SmokeFireNet: A Lightweight Network for Joint Detection of Forest Fire and Smoke. Forests 2024, 15, 1489. https://doi.org/10.3390/f15091489

AMA Style

Chen Y, Wang F. SmokeFireNet: A Lightweight Network for Joint Detection of Forest Fire and Smoke. Forests. 2024; 15(9):1489. https://doi.org/10.3390/f15091489

Chicago/Turabian Style

Chen, Yi, and Fang Wang. 2024. "SmokeFireNet: A Lightweight Network for Joint Detection of Forest Fire and Smoke" Forests 15, no. 9: 1489. https://doi.org/10.3390/f15091489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

SmokeFireNet: A Lightweight Network for Joint Detection of Forest Fire and Smoke

Abstract

1. Introduction

2. Materials and Methods

2.1. Forest Fire Smoke Dataset

2.2. Data Enhancement

2.3. SmokeFireNet

2.3.1. Backbone

2.3.2. Neck

2.3.3. Loss Function

2.4. Training

2.5. Evaluation Metrics

3. Results

3.1. Comparison of Different Detection Models

3.2. Performance Comparison of Different Attention Mechanism Models

3.3. Experimental Comparison of Different Resolutions

3.4. Ablation Experiments

3.5. Detection Effect under Different Data Augmentation

3.6. Forest Fire and Smoke Detection Performance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI