Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices

Li, Shouliang; Han, Jiale; Chen, Fanghui; Min, Rudong; Yi, Sixue; Yang, Zhen

doi:10.3390/rs16152846

Open AccessArticle

Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices

by

Shouliang Li

,

Jiale Han

,

Fanghui Chen

,

Rudong Min

,

Sixue Yi

and

Zhen Yang

^*

School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(15), 2846; https://doi.org/10.3390/rs16152846

Submission received: 27 June 2024 / Revised: 27 July 2024 / Accepted: 30 July 2024 / Published: 2 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Forest fires pose a catastrophic threat to Earth’s ecology as well as threaten human beings. Timely and accurate monitoring of forest fires can significantly reduce potential casualties and property damage. Thus, to address the aforementioned problems, this paper proposed an unmanned aerial vehicle (UAV) based on a lightweight forest fire recognition model, Fire-Net, which has a multi-stage structure and incorporates cross-channel attention following the fifth stage. This is to enable the model’s ability to perceive features at various scales, particularly small-scale fire sources in wild forest scenes. Through training and testing on a real-world dataset, various lightweight convolutional neural networks were evaluated on embedded devices. The experimental outcomes indicate that Fire-Net attained an accuracy of 98.18%, a precision of 99.14%, and a recall of 98.01%, surpassing the current leading methods. Furthermore, the model showcases an average inference time of 10 milliseconds per image and operates at 86 frames per second (FPS) on embedded devices.

Keywords:

forest fire; UAV; real-time

1. Introduction

Forests are essential to Earth’s ecosystem [1,2], and have experienced substantial damage from the increasing frequency of wildfires [3,4]. This damage includes vegetation loss [5], reduced biodiversity [6], and soil and water degradation [7]. Studies indicate that wildfires destroy habitats and impede ecosystem recovery by altering soil structures and disrupting water cycles [8,9]. In 2023, wildfires in Canada resulted in billions of dollars in property damage [10]. Atmospheric circulation transported smoke and dust from these fires to China and Europe, significantly impacting the global ecosystem [11,12]. According to the United Nations Environment Programme (UNEP), wildfires are intensifying and expanding worldwide, causing severe damage to the environment, wildlife, and human health [13]. The best tactic to control wildfires is early detection and rapid response [14]. However, delayed wildfire detection poses significant challenges to firefighting efforts and can result in unforeseen disasters [15]. As a result, the prompt and precise identification of fire outbreaks is essential for mitigating the risks associated with forest fires.

The causes of forest fires are complex and unpredictable. Typically manifesting in remote areas with intricate terrains [16], the challenging landscapes and dangerous conditions during a fire pose hindrances to timely detection [17]. Fortunately, the advent of drone technology provides a potentially effective monitoring method [18]. Firstly, drones offer a more cost-effective and safer alternative compared to manual surveillance and helicopter patrols, and they do not expose personnel to danger. Secondly, drones have versatile flight capabilities, enabling effective operations across diverse terrains [19] and adverse weather conditions, thereby facilitating efficient monitoring of expansive forest regions. Additionally, drones equipped with fire-extinguishing devices may be deployed to extinguish early-stage forest fires.

The advent of artificial neural networks has noticeably advanced the field of image processing. Notable strides have been achieved through the comprehensive analysis of forest fire images taken by drones utilizing convolutional neural networks [20]. Zhang et al. [21] employed the classical large-scale ResNet 50 to detect forest fires. By introducing transfer learning, they reached a 79.48% accuracy on the test set. Guan et al. [22] focused on a method for the semantic segmentation of forest fires utilizing a MaskSU R-CNN model, which delineates the fire region within UAV images. Zhang et al. [23] proposed MS-FRCNN for discerning small targets in forests, achieving an 82.9% accuracy on FLAME [24]. Rui et al. [25] developed a forest wildfire detection method using two parallel decoders that combined RGB and thermal imaging (RGB-Thermal), enabling effective day and night detection. However, its high computational load prevents real-time inference on drone-embedded devices. Barmpoutis et al. [26] created the FIRE-mDT system, where drones were equipped with 360-degree cameras, ResNet50, and a multi-head attention mechanism for fire monitoring, achieving a 91.6 F1 score during validation. However, the operational altitude range of the 360-degree cameras (18–28 m) poses a risk of drone damage from flames. Lin et al. [27] utilized the Swin Transformer backbone network and PAFPN to distinguish targets at various scales. Other works, such as YOLOv5s-CCAB [28] and Improved Deformable Transformer [29], also adopt Vision Transformer or similar structures [30,31] for excellent recognition ability. These models serve as the wings of a tiger for UAV remote sensing, which redefined the sensing ability of UAVs. However, most current neural networks tend to focus on the accuracy of executing tasks while overlooking the limited computing resources of embedded devices.

To address the limitations mentioned above, we propose Fire-Net. This model ensures high accuracy and real-time forest fire detection through the re-parameterizable MobileOne block and cross-channel attention module. Specifically, we first incorporate the cross-channel attention (CCA) module to ensure that the model focuses on valuable information and suppresses irrelevant features. Then, we employ the cut-out technique during the training to simulate the flames shaded by plants to strengthen the model’s generalization ability. Since the inference will be executed on the embedded devices, the above strategy will not induce a significant increase in computational complexity. Finally, to evaluate the model’s real-time inference capabilities on embedded devices, we deploy Fire-Net, along with several currently advanced lightweight convolutional neural networks, on the Jetson TX2 NX using Tensor RT [32], and conduct a comparative analysis. As shown in Figure 1, Fire-Net significantly surpasses other models in both recognition accuracy and inference speed, making it especially suitable for monitoring forest fires.

Our paper’s contributions are outlined below:

(1): A real-world forest fire image dataset is enriched in this manuscript. We include various types of fire images as well as those that could be easily confused. This facilitates the networks to learn more subtle features of fires, thereby enhancing the model’s credibility for practical applications;
(2): In this paper, a novel Fire-Net is introduced for forest fire recognition, which uses the CCA module to perceive more fire-related information while ignoring the irrelevant. In addition, the model is accelerated via Tensor RT for embedded deployment, thus enabling the UAV monitor system to detect forest fires in their early stages.

The remainder of the paper is structured as follows: Section 2 details the dataset, the construction of Fire-Net, and the model’s lightweight technique for embedded plant implementation. Section 3 presents the experimental results and performance analysis. Section 4 offers a discussion, and Section 5 presents the paper’s conclusion.

2. Materials and Methods

2.1. Datasets

High-quality training datasets are the foundation of effective pattern recognition with deep learning. In the context of wildfire management, the lack of standard datasets is one of the most significant challenges. Therefore, we made several improvements to the FLAME aerial forest fire dataset [24]. First, redundant fire images were removed through interval sampling. Second, different types of fire images and some easily ‘confusable’ images were downloaded from the internet (ensuring no copyright issues) and integrated into the dataset, thereby enhancing the diversity of the flame samples.

Diverse fire images are incorporated to recreate a more comprehensive fire dataset, which involves some easily confused images to facilitate the model in learning intricate features. Several sample images are presented in Figure 2. Ultimately, our dataset consists of 7666 original images, with a 2:3 ratio of positive to negative samples. The dataset is partitioned into training, validation, and test sets, at a proportion of 8:1:1.

2.2. Fire-Net Detail

To ensure precise forest fire recognition, we developed a multi-stage model structure for classifying aerial images of fires, as illustrated in Figure 3. The model is primarily divided into nine stages, with its main components being the MobileOne block and the cross-channel attention mechanism. The main advantage of this structure is its hierarchical feature extraction ability, which systematically captures both lower-level and higher-level features from input images across various stages. This approach effectively captures intricate patterns present in fire images, thereby enhancing classification accuracy. Moreover, the multi-stage design simplifies model tuning and optimization processes.

Our model exhibits significant innovations in several areas: Firstly, by sequentially extracting low-, mid-, and high-level features, the model progressively captures intricate patterns and structures in the images, thereby improving classification accuracy and robustness. Secondly, we introduced a cross-channel attention mechanism in the sixth stage, effectively integrating inter-channel information, thereby improving fire image recognition capabilities. Finally, in the classification stage, we adopt AvgPool to reduce computational complexity and retain critical feature information. Moreover, the multi-stage design allows for flexible adjustments and optimizations of each stage’s structure and parameters, enhancing the model’s adaptability and scalability.

Table 1 provides detailed components of each stage in the Fire-Net model. During the model’s design phase, we segment the overall process into nine stages. Stages 1–2 focus on primary feature extraction with minimal convolutional operations, aimed at extracting fundamental features like edges, corners, and textures from the input images to provide rich details for subsequent stages. Stages 3–5 are dedicated to intermediate feature extraction, utilizing more convolutional layers to capture complex and abstract features such as shapes, patterns, and regions, thereby enhancing the model’s understanding and recognition capabilities of fire images. Stages 6–7 represent the deep feature extraction phase, where a channel fusion attention mechanism is designed to integrate multi-scale inter-channel information, forming highly abstract feature representations to identify key features within images. Finally, in the classification stage, to reduce computational complexity, we perform global average pooling on feature vectors followed by classification using linear layers. Through these innovative designs, our model demonstrates significant advantages in accuracy, efficiency, and robustness in fire detection, providing an efficient and reliable solution for drone-based aerial fire classification.

2.2.1. MobileOne Block

The MobileOne block, introduced by Vasu et al. [33] in 2023, is a novel lightweight convolutional module with a reparameterized skip connection and batch normalization. These operations enhance the module’s feature extraction capabilities by increasing the model’s complexity with minimal additional parameters. The incorporation of branches with batch normalization and repetitive structures is illustrated in Figure 4. The block also integrates 1 × 1 and 3 × 3 depthwise convolutions, followed by pointwise convolutions, similar to the structure of depthwise separable convolutions [34].

During inference, the convolution and batch normalization (BN) computations within the multiple branches on the left-hand side of Figure 4 are first combined into a single convolution through a linear transformation. Subsequently, we add a convolutional layer at the BN skip connection point, which does not alter the output. Then, we combine the two into a single convolutional operation. Finally, multiple convolution branches are unified into a single branch structure [35,36,37]. As shown in Figure 4, this approach compacts multiple computations into a single operation, thereby elevating the model’s inference speed.

2.2.2. Cross-Channel Attention Mechanism

We introduce a cross-channel attention mechanism to extract essential feature information from forest fire images, building upon the ECA (efficient channel attention) [38], and integrate it into our Fire-Net.

As illustrated in Figure 5, the input feature passes through global average pooling to generate a tensor of size 1 × 1 × C. Subsequently, a bifurcated one-dimensional convolution captures inter-channel interactions, followed by the aggregation of feature information. The channel attention weights are then generated from the convolution output employing a Sigmoid. Finally, these weights are applied through element-wise multiplication with the original feature tensor, and the model effectively captures critical information within each channel.

Compared to the traditional ECA module, our designed cross-channel attention mechanism employs two branches with convolution kernel sizes of 3 and 5. The reason for selecting these kernel sizes is that they capture feature information at different scales. The small branch with a kernel size of 3 primarily captures local details, while the large branch with a kernel size of 5 captures features over a larger area. This design allows our mechanism to integrate multi-scale inter-channel information, leading to a more comprehensive understanding and representation of features across different channels. This multi-scale information capture ability is especially crucial in forest fire detection tasks, as fire features such as flames and smoke exhibit different scales across channels. Through this multi-branch structure, we can more effectively extract and integrate these features, enhancing the model’s accuracy in detecting fires.

2.3. Data Processing

To improve the network’s adaptability, we employ various data augmentation techniques, including random rotation, horizontal flipping, and color enhancement, as illustrated in Figure 6. Additionally, we adopt cut-out [39] to simulate the scene in which flames are overshadowed by plants in the forest, as shown in Figure 7. These methods can further improve the model’s recognition capabilities [40].

2.4. Embedded Device

For the experiment, we chose a high-performance embedded device, Jetson TX2 NX, an edge-computing GPU launched by NVIDIA in 2021; Table 2 presents detailed information about it [41]. The Jetson TX2 NX was utilized along with its corresponding developer kit, running on Ubuntu 18.04. Furthermore, we used Python 3.6 and PyTorch 1.9.

2.5. Model Quantization

In the training phase, which does not require real-time performance, the higher precision of data representation for gradient updates is typically preferred, making FP32 the preferred choice. However, during the inference phase, the precision requirements for model outputs are relatively low and only half-precision (FP16) is enough.

Figure 8 illustrates the structural compositions of FP32 and FP16 for data representations, respectively. When performing the weight conversion from FP32 to FP16, the primary step involves copying the sign bit.

S_{FP 16} = S_{FP 32}

(1)

The subsequent step includes adjusting the exponent. Specifically, this entails subtracting an offset from the FP32 exponent and storing the computed result in the FP16 exponent, which can be described as follows:

E_{FP 16} = round (E_{FP 32} - 127 + 15)

(2)

Then, for the mantissa, truncating the 23-bit mantissa of FP32 to 10 Bits, and rounding the result, we have the following:

M_{FP 16} = round (\frac{M_{FP 32}}{2^{13}})

(3)

Finally, the sign bit, exponent, and mantissa obtained via the above process are consolidated into the FP16 format, as follows:

FP 16_Result = S_{FP 16} | E_{FP 16} | M_{FP 16}

(4)

The merits of FP16 in deep learning encompass reduced memory consumption, faster computation speed, and lower power consumption. These make it well-suited for training and inference for large-scale models, particularly in resource-constrained hardware.

2.6. Evaluation Metrics

The performance metrics used in this study include accuracy, recall, precision, and F1 score. Accuracy is the ratio of correctly predicted instances to the total cases. Recall, also known as sensitivity, is the ratio of correctly predicted positive observations to all the observations in the actual class. Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both concerns. These indicators are defined as follows:

Accuracy = \frac{T P + T N}{T P + F P + F N + T N}

(5)

Recall = \frac{T P}{T P + F N}

(6)

Precision = \frac{T P}{T P + F P}

(7)

F 1 score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(8)

where,

T P

represents a true positive, meaning it is correctly identified as a fire when a fire occurs;

T N

represents a true negative, indicating it is not falsely identified as a fire during a fire outbreak;

F P

represents a false positive, signifying it is erroneously identified as a fire in the absence of a fire; and

F N

represents a false negative, meaning it fails to correctly recognize a fire in the advent of a significant fire.

Moreover, for time-sensitive applications, frames per second (FPS) is a foundation indicator used to measure the recognition system. It depicts the model’s speed on processing video frames and relies heavily on the amount of the model’s parameters as well as the computational loads (FLOPs, float point operations).

2.7. Training Environment

The training process was executed on a computer with Windows 10, using Python 3.8 and PyTorch 1.9, with an NVIDIA GeForce GTX 1050Ti GPU and CUDA version 11.6. and Table 3 presents detailed parameter settings for the training of the models.

3. Experimental Results

3.1. Comparison of Experimental Results under Different Dataset Compositions

To appraise the impact of introducing ambiguous images and various fire images to the dataset, as discussed in Section 2.1, we separately incorporated ambiguous images and diverse fire images into the original dataset and tested them on the same test set. Subsequently, we conducted multiple trainings on this modified dataset without using data augmentation and assessed the model performance. The experimental findings are presented in Figure 9.

Figure 9 shows the impacts of different dataset compositions on the performance metrics of Fire-Net. The model attained a 93.45% accuracy when trained on the original dataset. Incorporating ambiguous images increased the accuracy to 94.62%, while including various fire images further improved it to 94.15%. The highest accuracy of 94.18% was observed when both ambiguous and various fire images were added. Similar trends were noted for precision and recall, with the highest values achieved when both types of images were included. This indicates that a diverse training set, enriched with challenging and varied examples, enables our forest fire detection model to learn deeper fire-related features, thereby enhancing its performance across all metrics.

3.2. Comparison of Data Augmentation Techniques

To highlight the advantage of the cut-out augmentation method we implemented, MixUp [42] and CutMix [43] are separately embedded in the proposed network as the control group. Figure 10 depicts the differences in the views of accuracy, precision, and recall ratio.

The accuracy of the network with cut-out is about 98.18%, which is far above that of others. The precision of it is up to 99.14%, which is a little more than CutMix, and outperforms the other two. Our model achieves the highest recall, exceeding 98.01%. This superior performance is attributed to the cut-out technique, which effectively simulates the occlusion effect of trees blocking firelight during drone surveillance of forest fires. In contrast, MixUp and CutMix are not suitable for the application of forest fire detection. These methods might result in semantic confusion, degradation of spatial details, feature distortion, and an inability to capture the specific visual characteristics and complexities of fires. In this specific monitoring task, maintaining the original semantic integrity and spatial context of the images is crucial.

In addition, we employ Smooth Grad-CAM++ [44,45], which is a network-visualizing tool used to highlight the parts of the model most concerned with an input image. As shown in Figure 11, there is a significant difference in feature capture among models using various augmentation methods. Although these augmentation techniques do boost the model’s focus on the target object, the network with cut-out concentrates more on the principal features of a fire image compared to other methods. Cut-out is the most suitable candidate for augmenting forest fire image data.

3.3. Attention Mechanisms

In this section, to bring out the CCA module, four modules—squeeze-and-excitation (SE) [46], triplet [47], global attention mechanism (GAM) [48], and coordinate attention (CA) [49]—are inserted in the network for comparisons, respectively. The accuracy, precision, and recall are measured. Furthermore, several experiments are executed on Jetson TX2 NX to evaluate the influence of these modules on inference times. The workflow diagram for the embedded device is illustrated in Figure 12.

The network with CCA and the other four modules were trained on the same computer, respectively. The weight files were converted to a quantized .trt format using TensorRT 8.2.1, and imported into the testing environment in PyTorch. Subsequently, the test set was used to calculate inference times on Jetson TX2 NX. We conducted evaluations on both single-image and batch input models, with a batch size set to 3, and recorded the inference times. The test results on Jetson TX2 NX are presented in Table 4.

The proposed Fire-Net (i.e., CCA-MobileOne in the table) exhibits superior performance compared to other benchmark models in evaluating forest fire accuracy via four metrics. Simultaneously, no matter the single-image time or video frame inference, Fire-Net also demonstrates optimal performance.

3.4. Comparison with Existing Models

In this section, we compare Fire-Net with state-of-the-art lightweight models. All experiments were executed on Jetson TX2 NX. In addition to the testing metrics mentioned earlier, additionally, we generated receiver operating characteristic (ROC) [50] curves for the models. The experimental results are presented in Table 5.

Figure 13 displays the ROC curves of nine models. This curve depicts the classifier’s performance at different thresholds by plotting TPR against FPR. The AUC quantifies classifier performance; higher values indicate better performance. In this comparison, Fire-Net achieves the peak AUC value of 0.98. In addition, our model excels in terms of inference time compared to others. This indicates that when considering the balance between real-time responsiveness and accuracy, Fire-Net has a significant advantage in forest fire detection.

3.5. Field Test Results

To gauge our model’s generalization capability, we deployed Fire-Net on a drone and conducted field tests in Lanzhou, China. During the testing period, the weather was overcast, the temperature was between 19 and 24 °C, with a northwesterly wind at 1 level, and a relative humidity of 60%. The test was conducted at 10:30 a.m. China standard time. Due to regional regulations, we were unable to conduct ignition tests in an actual forest environment. To simulate early-stage forest fires, we ignited a small pile of combustible materials, ensuring the flames resembled those of early forest fires, with the drone flying at an altitude of 15 m. Additionally, we utilized Smooth Grad-CAM++ to visualize the final layer of the model and observe Fire-Net’s focus on key areas. Some experimental results are presented in Figure 14.

We conducted 50 tests using drones from different angles to examine bonfires, successfully identifying 47 instances of fires. Our outdoor testing achieved a recognition accuracy of 94%. Furthermore, observing visualizations from Smooth Grad-CAM++ in Figure 13 revealed that the model’s focal areas (highlighted in red hues in the images) prominently covered the bonfire locations, slightly expanding these regions. This phenomenon aims to ensure the safety of forest fire detection, as the model tends to enlarge potential fire zones to avoid overlooking any possible flames. Although this strategy may lead to false positives in non-fire areas, it maximizes the accurate detection of actual fires, thereby reducing the risk of missed detections. Overall, comprehensive testing results demonstrate that our rapid detection algorithm, Fire-Net, effectively and accurately detects forest fires, facilitating timely implementation of preventive measures to mitigate fire-related losses.

4. Discussion

Forests are integral components of the Earth’s ecosystems [2]. However, the rising frequency of forest fires has led to severe ecological damage [4]. Timely/accurate identification is crucial for firefighting [14]. In recent years, UAVs have become indispensable tools for forest fire monitoring due to their ability to traverse difficult terrains and reduce the risk to human operators [56]. Researchers have focused on using images taken by UAVs to detect forest fires. Unfortunately, most existing studies emphasize detection accuracy, often by significantly increasing model complexity, while neglecting the severe latency issues this complexity introduces on UAV-embedded devices. This oversight compromises the ability for real-time detection, which is essential for prompt firefighting response.

In fact, the fire spreads dynamically rather than statically. Real-time monitoring of wildfires often encounters various scales of fires, smoke, and complex backgrounds. The proposed Fire-Net comprises MobileOne and CCA modules. The MobileOne module employs depthwise separable convolutions, utilizing 1 × 1 and 3 × 3 convolutional kernels to effectively capture fine details of flame edges and subtle variations in smoke while reducing the number of parameters and computational load. Additionally, it incorporates re-parameterizable skip-connection branches to mitigate the vanishing gradient problem, facilitating model training and enhancing the performance of deep networks. During inference, the MobileOne module transforms into a single-branch structure through reparameterization to ensure high efficiency. The CCA module leverages 1D convolutional branches with kernel sizes of 3 to capture local inter-channel relationships, aiding in the detection of fine variations and regional patterns in flame and smoke features, thereby increasing the model’s sensitivity to small-scale features. Conversely, 1D convolutional branches with a kernel size of 5 ensure comprehensive and accurate capture and integration of flame and smoke features at different scales. By combining 1D convolutional kernels of sizes 3 and 5, the CCA module can simultaneously capture local and global features, comprehensively understanding and representing multi-scale flame and smoke features across different channels, thereby enhancing the overall model accuracy in forest fire detection. Additionally, to enhance the generalization performance and avoid overfitting of the model, we employed various data augmentation techniques to guarantee flame recognition capabilities without significantly increasing computational load.

Although Fire-Net demonstrates robust performance in controlled environments, its efficacy in real-world applications may be constrained by substantial variations in environmental conditions and fire behaviors. Adaptive convolutional kernels show promise in addressing fires of varying scales and complexities through real-time adjustments. Nevertheless, current technical limitations, including insufficient computational resources in the embedded devices on drones and the lack of efficient real-time adaptive algorithms, impede their practical implementation. Future advancements in hardware acceleration technologies and the development of more sophisticated adaptive algorithms may facilitate the realization of adaptive convolutional kernels. Moreover, the limited endurance of current drones further restricts the monitoring range. Future research should focus on integrating climate features and environmental characteristics to predict and monitor key areas in real-time, thereby enhancing the effectiveness of fire monitoring.

5. Conclusions

In this paper, we propose Fire-Net for forest fire monitoring. We established a fire image dataset that includes various fire scenarios and those prone to be confused, then we leveraged the reparameterized MobileOne module for model architecture and introduced the cross-channel attention (CCA) mechanism to efficiently extract critical information. In the training phase, the utilization of the cut-out technique further elevated the model’s proficiency in identifying flames.

The experimental results show that Fire-Net achieved notable achievement, with an accuracy of 98.18%, precision of 99.14%, and recall of 98.01%. Notably, Fire-Net exhibits an average inference time of only 10 ms per single image on embedded devices, which is capable of monitoring forest fires in real-time. This research enhances model detection speed and reduces model complexity without compromising accuracy, providing new insights for research on lightweight networks. This advancement benefits the development of edge artificial intelligence and contributes to the creation of application devices for various fields, such as environmental monitoring, pest management, and agricultural assessment. Consequently, it offers valuable guidance for timely decision-making in critical scenarios.

Author Contributions

Z.Y. and S.L.: conceptualization; Z.Y. and J.H.: methodology, software; F.C. and R.M.: validation, visualization; Z.Y. and S.L.: formal analysis, investigation; J.H.: resources, data curation; Z.Y. and J.H.: writing—original draft preparation, Z.Y., J.H., S.Y., and S.L.: writing—review and editing; Z.Y. and S.L.: supervision; Z.Y.: project administration, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the Fundamental Research Funds for the Central Universities of China (no. lzujbky-2022-pd12), the Gansu Key Laboratory of Cloud Computing open program (no. 20240088), and by the Natural Science Foundation of Gansu Province, China (no. 22JR5RA492).

Data Availability Statement

The FLAME dataset used in this study can be found at http://ieee-dataport.org/open-access/flame-dataset-aerial-imagery-pile-burn-detection-using-drones-uavs, accessed on 16 April 2021.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Menut, L.; Cholakian, A.; Siour, G.; Lapere, R.; Pennel, R.; Mailler, S.; Bessagnet, B. Impact of Landes forest fires on air quality in France during the 2022 summer. Atmos. Chem. Phys. 2023, 23, 7281–7296. [Google Scholar] [CrossRef]
Sun, Y.; Jiang, L.; Pan, J.; Sheng, S.; Hao, L. A satellite imagery smoke detection framework based on the Mahalanobis distance for early fire identification and positioning. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103257. [Google Scholar] [CrossRef]
Yandouzi, M.; Grari, M.; Idrissi, I.; Moussaoui, O.; Azizi, M.; Ghoumid, K.; Elmiad, A.K. Review on forest fires detection and prediction using deep learning and drones. J. Theor. Appl. Inf. Technol. 2022, 100, 4565–4576. [Google Scholar]
Zahed, M.; Bączek-Kwinta, R. The Impact of Post-Fire Smoke on Plant Communities: A Global Approach. Plants 2023, 12, 3835. [Google Scholar] [CrossRef] [PubMed]
Qin, Y.; Xiao, X.; Wigneron, J.P.; Ciais, P.; Canadell, J.G.; Brandt, M.; Li, X.; Fan, L.; Wu, X.; Tang, H.; et al. Large loss and rapid recovery of vegetation cover and aboveground biomass over forest areas in Australia during 2019–2020. Remote Sens. Environ. 2022, 278, 113087. [Google Scholar] [CrossRef]
Feng, X.; Merow, C.; Liu, Z.; Park, D.S.; Roehrdanz, P.R.; Maitner, B.; Newman, E.A.; Boyle, B.L.; Lien, A.; Burger, J.R.; et al. How deregulation, drought and increasing fire impact Amazonian biodiversity. Nature 2021, 597, 516–521. [Google Scholar] [CrossRef] [PubMed]
Mataix-Solera, J.; Cerdà, A.; Arcenegui, V.; Jordán, A.; Zavala, L. Fire effects on soil aggregation: A review. Earth-Sci. Rev. 2011, 109, 44–60. [Google Scholar] [CrossRef]
Ferreira, A.; Coelho, C.d.O.; Ritsema, C.; Boulet, A.; Keizer, J. Soil and water degradation processes in burned areas: Lessons learned from a nested approach. Catena 2008, 74, 273–285. [Google Scholar] [CrossRef]
Laurance, W.F. Habitat destruction: Death by a thousand cuts. Conserv. Biol. All 2010, 1, 73–88. [Google Scholar]
MacCarthy, J.; Tyukavina, A.; Weisse, M.J.; Harris, N.; Glen, E. Extreme wildfires in Canada and their contribution to global loss in tree cover and carbon emissions in 2023. Glob. Change Biol. 2024, 30, e17392. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Z.; Zou, Z.; Chen, X.; Wu, H.; Wang, W.; Su, H.; Li, F.; Xu, W.; Liu, Z.; et al. Severe global environmental issues caused by Canada’s record-breaking wildfires in 2023. Adv. Atmos. Sci. 2024, 41, 565–571. [Google Scholar] [CrossRef]
Pelletier, F.; Cardille, J.A.; Wulder, M.A.; White, J.C.; Hermosilla, T. Revisiting the 2023 wildfire season in Canada. Sci. Remote Sens. 2024, 10, 100145. [Google Scholar] [CrossRef]
Kurvits, T.; Popescu, A.; Paulson, A.; Sullivan, A.; Ganz, D.; Burton, C.; Kelley, D.; Fernandes, P.; Wittenberg, L.; Baker, E.; et al. Spreading Like Wildfire: The Rising Threat of Extraordinary Landscape Fires; UNEP: Nairobi, Kenya, 2022. [Google Scholar]
Li, R.; Hu, Y.; Li, L.; Guan, R.; Yang, R.; Zhan, J.; Cai, W.; Wang, Y.; Xu, H.; Li, L. SMWE-GFPNNet: A High-precision and Robust Method for Forest Fire Smoke Detection. Knowl.-Based Syst. 2024, 248, 111528. [Google Scholar] [CrossRef]
Lucas-Borja, M.E.; Zema, D.A.; Carrà, B.G.; Cerdà, A.; Plaza-Alvarez, P.A.; Cózar, J.S.; Gonzalez-Romero, J.; Moya, D.; de las Heras, J. Short-term changes in infiltration between straw mulched and non-mulched soils after wildfire in Mediterranean forest ecosystems. Ecol. Eng. 2018, 122, 27–31. [Google Scholar] [CrossRef]
Ertugrul, M.; Varol, T.; Ozel, H.B.; Cetin, M.; Sevik, H. Influence of climatic factor of changes in forest fire danger and fire season length in Turkey. Environ. Monit. Assess. 2021, 193, 28. [Google Scholar] [CrossRef] [PubMed]
North, M.P.; Stephens, S.L.; Collins, B.M.; Agee, J.K.; Aplet, G.; Franklin, J.F.; Fulé, P.Z. Reform forest fire management. Science 2015, 349, 1280–1281. [Google Scholar] [CrossRef]
Sudhakar, S.; Vijayakumar, V.; Sathiya Kumar, C.; Priya, V.; Ravi, L.; Subramaniyaswamy, V. Unmanned Aerial Vehicle (UAV) based Forest Fire Detection and monitoring for reducing false alarms in forest-fires. Comput. Commun. 2020, 149, 1–16. [Google Scholar] [CrossRef]
Zhang, Y.; Fang, X.; Guo, J.; Wang, L.; Tian, H.; Yan, K.; Lan, Y. CURI-YOLOv7: A Lightweight YOLOv7tiny Target Detector for Citrus Trees from UAV Remote Sensing Imagery Based on Embedded Device. Remote Sens. 2023, 15, 4647. [Google Scholar] [CrossRef]
Namburu, A.; Selvaraj, P.; Mohan, S.; Ragavanantham, S.; Eldin, E.T. Forest Fire Identification in UAV Imagery Using X-MobileNet. Electronics 2023, 12, 733. [Google Scholar] [CrossRef]
Zhang, L.; Wang, M.; Fu, Y.; Ding, Y. A Forest Fire Recognition Method Using UAV Images Based on Transfer Learning. Forests 2022, 13, 975. [Google Scholar] [CrossRef]
Guan, Z.; Miao, X.; Mu, Y.; Sun, Q.; Ye, Q.; Gao, D. Forest Fire Segmentation from Aerial Imagery Data Using an Improved Instance Segmentation Model. Remote Sens. 2022, 14, 3159. [Google Scholar] [CrossRef]
Zhang, L.; Wang, M.; Ding, Y.; Bu, X. MS-FRCNN: A Multi-Scale Faster RCNN Model for Small Target Forest Fire Detection. Forests 2023, 14, 616. [Google Scholar] [CrossRef]
Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P.Z.; Blasch, E. Aerial imagery pile burn detection using deep learning: The FLAME dataset. Comput. Netw. 2021, 193, 108001. [Google Scholar] [CrossRef]
Rui, X.; Li, Z.; Zhang, X.; Li, Z.; Song, W. A RGB-Thermal based adaptive modality learning network for day–night wildfire identification. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103554. [Google Scholar] [CrossRef]
Barmpoutis, P.; Kastridis, A.; Stathaki, T.; Yuan, J.; Shi, M.; Grammalidis, N. Suburban Forest Fire Risk Assessment and Forest Surveillance Using 360-Degree Cameras and a Multiscale Deformable Transformer. Remote Sens. 2023, 15, 1995. [Google Scholar] [CrossRef]
Lin, J.; Lin, H.; Wang, F. STPM_SAHI: A Small-Target forest fire detection model based on Swin Transformer and Slicing Aided Hyper inference. Forests 2022, 13, 1603. [Google Scholar] [CrossRef]
Chen, G.; Zhou, H.; Li, Z.; Gao, Y.; Bai, D.; Xu, R.; Lin, H. Multi-Scale Forest Fire Recognition Model Based on Improved YOLOv5s. Forests 2023, 14, 315. [Google Scholar] [CrossRef]
Huang, J.; Zhou, J.; Yang, H.; Liu, Y.; Liu, H. A Small-Target Forest Fire Smoke Detection Model Based on Deformable Transformer for End-to-End Object Detection. Forests 2023, 14, 162. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Touvron, H.; Cord, M.; Sablayrolles, A.; Synnaeve, G.; Jegou, H. Going deeper with Image Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 32–42. [Google Scholar]
Shafi, O.; Rai, C.; Sen, R.; Ananthanarayanan, G. Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices. In Proceedings of the 2021 IEEE International Symposium on Workload Characterization (IISWC), Storrs, CT, USA, 7–9 November 2021; pp. 226–237. [Google Scholar]
Vasu, P.K.A.; Gabriel, J.; Zhu, J.; Tuzel, O.; Ranjan, A. MobileOne: An Improved One millisecond Mobile Backbone. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7907–7917. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse Branch Block: Building a Convolution as an Inception-like Unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10886–10895. [Google Scholar]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Choe, C.; Choe, M.; Jung, S. Run Your 3D Object Detector on NVIDIA Jetson Platforms:A Benchmark Analysis. Sensors 2023, 23, 4005. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. MixUp: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Yun, S.; Han, D.; Chun, S.; Oh, S.J.; Yoo, Y.; Choe, J. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
Omeiza, D.; Speakman, S.; Cintas, C.; Weldermariam, K. Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models. arXiv 2019, arXiv:1908.01224. [Google Scholar]
Fernandez, F.G. TorchCAM: Class Activation Explorer. 2020. Available online: https://github.com/frgfm/torch-cam (accessed on 24 March 2020).
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Zhou, H.; Li, J.; Peng, J.; Zhang, S.; Zhang, S. Triplet attention: Rethinking the similarity in transformers. In Proceedings of the the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual, 14–18 August 2021; pp. 2378–2388. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Zhu, M.; Chen, A.Z.L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Zhou, D.; Hou, Q.; Chen, Y.; Feng, J.; Yan, S. Rethinking bottleneck structure for efficient mobile network design. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 680–697. [Google Scholar]
Tan, M.; Le, Q.V. Mixconv: Mixed depthwise convolutional kernels. arXiv 2019, arXiv:1907.09595. [Google Scholar]
De la Fuente, R.; Aguayo, M.M.; Contreras-Bolton, C. An optimization-based approach for an integrated forest fire monitoring system with multiple technologies and surveillance drones. Eur. J. Oper. Res. 2024, 313, 435–451. [Google Scholar] [CrossRef]

Figure 1. Accuracy vs. latency on Jetson TX2 NX.

Figure 2. Images (a–e) represent examples of forest fires captured by a drone, (f–h) depict non-fire conditions, (i,j) show various fire types included in the dataset, and (k,l) display examples of confusing samples.

Figure 3. Schematic representation of the overall architecture of Fire-Net.

Figure 4. The MobileOne block shows distinct structures in training and testing. It features over-parameterized branches during training, which are subsequently re-parameterized during inference.

Figure 5. Cross-channel attention module.

Figure 6. Illustrations depicting three image enhancement techniques: (a) original image; (b) random rotation; (c) horizontal flipping; (d) color enhancement.

Figure 7. Two examples of images with random occlusion using the cut-out technique.

Figure 8. Schematic representation of the FP32 and FP16 floating-point formats. (a) FP32 format with a 1-bit sign (S), 8-bit exponent (E), and 23-bit mantissa (M). (b) FP16 format with a 1-bit sign (S), 5-bit exponent (E), and 10-bit mantissa (M).

Figure 9. Comparison of experimental results with varied dataset compositions.

Figure 10. Comparison of Different Data Augmentation Methods.

Figure 11. Exploring the Influence of Augmentation Methods: A Perspective through Smooth Grad-CAM++.

Figure 12. Testing process flowchart for embedded devices.

Figure 13. Comparison of ROC curves across models.

Figure 14. Image inference and visualization of outdoor fire testing in Lanzhou, China.

Table 1. Detailed description of each stage in the Fire-Net architecture.

Stage	Input Size	Blocks	Stride	Block Type	Input Channels	Output Channels
1	224 × 224	1	2	MobileOne block	3	96
2	112 × 112	2	2	MobileOne block	96	96
3	56 × 56	8	2	MobileOne block	96	192
4	28 × 28	5	2	MobileOne block	192	512
5	14 × 14	5	1	MobileOne block	512	512
6	14 × 14	5	1	CCA Module	512	512
7	14 × 14	1	2	MobileOne block	512	1280
8	7 × 7	1	1	AvgPool	-	-
9	1 × 1	1	1	Linear	1280	1

Table 2. Specifications for the Jetson TX2 NX.

Computational Capability	GPU	CPU
1.33 TFLOPs	NVIDIA Pascal™ Architecture GPU equipped with 256 CUDA cores	Dual-core 64-bit NVIDIA Denver 2 CPU and a quad-core ARM A57 complex

Table 3. Training parameters.

Parameters	Detail
Batch Size	24
Epoch	100
Input Size	224 × 224
Initial rate of learning	0.001
Epoch	100
Optimization technique	Adam
Loss	Binary Cross-Entropy loss

Table 4. Impact of various modules.

Model	Accuracy	Precision	Recall	F1 Score	Weights (M)	Inference Time (ms)	Batch Inference Time (ms)	FPS
CCA-MobileOne	98.18%	99.14%	98.01%	0.9857	8.2	10.35	4.63	86
SE-MobileOne	97.22%	98.25%	95.26%	0.9673	8.4	11.94	5.04	85
CA-MobileOne	97.21%	98.84%	96.87%	0.9785	8.2	11.22	4.98	76
Triplet-MobileOne	96.32%	96.27%	95.27%	0.9577	8.2	10.71	4.95	86
GAM-MobileOne	96.18%	98.53%	95.44%	0.9696	21.3	15.23	5.25	72

Table 5. Comparison of experiment results.

Model	Accuracy	Precision	Recall	F1 Score	AUC	FLOPs (M)	Params (M)	Inference Time (ms)	Batch Inference Time (ms)
Our Model	98.18%	99.14%	98.01%	0.9857	0.98	825	3.4	10	4.6
MobileNet-V3-s [51]	97.27%	97.19%	98.58%	0.9788	0.97	56	2.6	14	6.2
ShuffleNet-V2-1.0 [52]	97.45%	97.73%	98.29%	0.9802	0.97	146	2.3	16	8.6
MobileNet V2 [53]	97.27%	99.70%	96.58%	0.9812	0.96	300	3.4	24	8.4
MobileNeXt [54]	96.91%	98.83%	96.30%	0.9755	0.97	311	4.1	21	10.2
MixNet [55]	96.91%	97.71%	97.44%	0.9757	0.97	256	3.4	14	10.5
RepVGG-A0 [37]	95.27%	98.51%	94.02%	0.9621	0.96	1400	8.3	18	6.0
RepVGG-A1 [37]	95.09%	97.65%	94.59%	0.9610	0.94	2400	12.8	20	7.2
RepVGG-B0 [37]	96.73%	97.98 %	96.87%	0.9742	0.96	3100	14.3	21	6.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, S.; Han, J.; Chen, F.; Min, R.; Yi, S.; Yang, Z. Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices. Remote Sens. 2024, 16, 2846. https://doi.org/10.3390/rs16152846

AMA Style

Li S, Han J, Chen F, Min R, Yi S, Yang Z. Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices. Remote Sensing. 2024; 16(15):2846. https://doi.org/10.3390/rs16152846

Chicago/Turabian Style

Li, Shouliang, Jiale Han, Fanghui Chen, Rudong Min, Sixue Yi, and Zhen Yang. 2024. "Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices" Remote Sensing 16, no. 15: 2846. https://doi.org/10.3390/rs16152846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Fire-Net Detail

2.2.1. MobileOne Block

2.2.2. Cross-Channel Attention Mechanism

2.3. Data Processing

2.4. Embedded Device

2.5. Model Quantization

2.6. Evaluation Metrics

2.7. Training Environment

3. Experimental Results

3.1. Comparison of Experimental Results under Different Dataset Compositions

3.2. Comparison of Data Augmentation Techniques

3.3. Attention Mechanisms

3.4. Comparison with Existing Models

3.5. Field Test Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI