A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model

Ma, Shuangbao; Li, Wennan; Wan, Li; Zhang, Guoqin

doi:10.3390/app14166878

Open AccessArticle

A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model

¹

Hubei Key Laboratory of Digital Textile Equipment, Wuhan Textile University, Wuhan 430200, China

²

School of Mechanical Engineering and Automation, Wuhan Textile University, Wuhan 430200, China

³

School of Economics, Wuhan Textile University, Wuhan 430200, China

⁴

School of Electronics and Electrical Engineering, Wuhan Textile University, Wuhan 430200, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 6878; https://doi.org/10.3390/app14166878

Submission received: 9 July 2024 / Revised: 29 July 2024 / Accepted: 29 July 2024 / Published: 6 August 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aiming at solving the issues that fire detection is prone to be affected by environmental factors, and the accuracy of flame and smoke detection remains relatively low at the incipient stage of fire, a fire detection algorithm based on GCM-YOLO is put forward. Firstly, GhostNet is introduced to optimize the backbone network, enabling the model to be lightweight without sacrificing model accuracy. Secondly, the upsampling module is reorganized with content-aware features to enhance the detail capture and information fusion effect of the model. Finally, by incorporating the mixed local channel attention mechanism in the neck, the model can enhance the processing capability of complex scenes. The experimental results reveal that, compared with the baseline model YOLOv8n, the GCM-YOLO model in fire detection increases the mAP@0.5 by 1.2%, and the number of parameters and model size decrease by 38.3% and 34.9%, respectively. The GCM-YOLO model can raise the accuracy of fire detection while reducing the computational burden and is suitable for deployment in practical application scenarios such as mobile terminals.

Keywords:

object detection; fire detection; lightweight; ghostnet; CARAFE

1. Introduction

Timely detection and subsequent intervention during the incipient stages of a fire are of paramount significance in restricting its propagation and minimizing potential casualties and property damage [1,2,3,4]. Typically, early fire detection utilizes methods such as smoke sensors, including photoelectric and ionization smoke detectors [5,6]. Nevertheless, these sensors are prone to environmental conditions and demonstrate low monitoring efficiency, restricting their efficacy in detecting delicate smoke or concealed flames in the initial phases of the fire.

Advancements in traditional image processing techniques have contributed to the enhancement of fire detection accuracy, particularly through the analysis of color, texture, and shape features [7]. For instance, Singh et al. [8] utilized the YCbCr color model to detect fire regions from video image frames. Their study distinguishes flame pixels from other high-intensity pixels using specific parameters in the YCbCr color space. By analyzing video image frames, an effective method for detecting flame regions was proposed. Xiong et al. [9] proposed a novel superpixel synthesis algorithm and enhanced existing horizon detection algorithms by employing support vector machines for superpixel classification. However, these methods often struggle with variations in lighting conditions and background interference [10], thereby limiting their effectiveness in complex environments. In recent years, object detection methods have emerged as promising approaches for fire and smoke detection. Farasin [11] proposed a deep learning-based “Double-Step U-Net” method that combines classification and regression algorithms to predict the damage/severity levels of sub-regions within affected areas by processing satellite images following a single fire incident. Bahhar et al. [12] presented an efficacious wildfire and smoke detection model integrating the YOLO architecture with a voting ensemble of convolutional neural network (CNN) architectures. This model adopts an ensemble of multiple CNN architectures and functions in two stages for the classification and detection of fire and smoke. Kim et al. [13] proposed a domain-independent fire detection algorithm grounded on the YOLOv5 framework, integrating linear attention and a Gated Temporal Pool (GTP) to extract the spatial and temporal features of fires. Yang et al. [14] proposed a lightweight fire detection algorithm based on YOLOv5, incorporating Ghost modules and the CA attention mechanism, alongside feature fusion weight parameters derived from the Path Aggregation Network structure, to enhance detection speed.

Currently, fire detection algorithms often require increasing the number of network layers to enhance accuracy [15], which demands substantial computational resources and time. Moreover, their robustness and real-time performance in complex environments still need improvement. To address the above problems, this paper proposes a lightweight fire detection network GCM-YOLO based on an improved feature extraction network and attention mechanism. Firstly, a phantom network is introduced to achieve the optimization of the backbone network, which reduces the number of parameters and the amount of computation of the model without losing the accuracy of the model; secondly, the CARAFEs upsampling module is added to enhance the feature information extraction capability, so that the model can better adapt to the targets of different scales; finally, the hybrid local channel attention mechanism is added at the neck to improve the accuracy of the model in detecting targets in complex environments.

2. Model and Methods

2.1. Improved YOLOv8 Model

YOLOv8 (You Only Look Once version 8, YOLOv8) emerges as a real-time target detection model founded upon a deep convolutional neural network, initiated by Ultralytics. Renowned for its outstanding performance, versatility, and efficiency, this model incorporates several key architectural innovations. Its backbone network segment employs the C2f structure, based on the CSPNet framework, facilitating the extraction and prioritization of crucial information by mapping channel features to focal features. In the neck segment, YOLOv8 integrates the PANet structure, enabling the seamless transfer and aggregation of information through multiple pathways, effectively integrating global and local feature insights. The Detection Head segment adopts the Anchor-Free methodology, enabling the direct prediction of target location and size from the feature map, without reliance on fixed anchor frame scales and aspect ratios. Furthermore, the loss function incorporates the Task Aligned Assigner positive sample allocation strategy, along with Distribution Focal Loss [16], which effectively alleviates issues related to sample imbalance and category distribution disparities.

Currently, fire detection algorithms typically demand substantial computational resources to process complex image data and model structures, relying on high-performance devices such as GPUs with robust computing power. Therefore, this paper puts forward a lightweight fire detection network based on an improved YOLOv8 model, named GCM-YOLO after the modifications. The structure of the improved model is shown in Figure 1.

2.2. GhostNet

Traditional convolutional modules use fixed-size kernels for convolutional operations in each layer, leading to a fixed receptive field and a large number of parameters, which affects the efficiency of the model. To reduce the training time and number of parameters, while maintaining accuracy and improving performance on low-computation edge devices, this paper incorporates Ghost Convolution and Ghost Bottleneck from GhostNet (Ghost Network, GhostNet) into YOLOv8 [17]. These modifications aim to boost computational efficiency and detection accuracy.

The core innovation of GhostNet is GhostConv. Traditional residual blocks produce feature maps with many similar features, which do not all need to be obtained through convolutions. These similar features can be generated through more efficient operations. GhostConv compresses channels using standard convolutions to obtain intrinsic feature maps, requiring fewer convolution kernels than conventional convolutions, thereby reducing computational cost. Then, it generates Ghost feature maps through grouped convolutions for each feature map. Finally, the feature maps from the standard convolutions and the grouped convolutions are concatenated to form the output feature map. The structure of GhostConv is shown in Figure 2.

The Ghost Bottleneck, constructed by stacking GhostConv layers in a ResNet-like fashion, replaces the C2f module in YOLOv8. The Ghost Bottleneck comprises mainly an expansion layer and a compression layer. The first Ghost module functions as the expansion layer, generating some feature maps using standard convolution, and then expanding these into more Ghost feature maps through linear transformations, thus increasing the feature dimensions. The second Ghost module is responsible for reducing the feature dimensions. Additionally, a downsampling shortcut path, encompassing depthwise separable convolutions and standard convolution layers, is introduced. This constitutes the complete Ghost Bottleneck structure. The structure of Ghost Bottleneck is depicted in Figure 3.

2.3. The Content-Aware ReAssembly of Features Module

The structural diagram of the CARAFE (Content-Aware ReAssembly of Features, CARAFE) module, as depicted in Figure 4, addresses the deficiencies of traditional upsampling methods in preserving fine details and reconstructing semantic information. By reassembling low-resolution feature maps to generate high-resolution equivalents, the CARAFE model effectively mitigates these shortcomings. Its adaptive convolutional kernel mechanism enables flexible adaptation to various scenes and scales for feature reconstruction, thereby maintaining model lightweightness while enhancing its generalization capabilities.

The CARAFE upsampling process is primarily divided into two modules: the upsampling kernel prediction module and the content-aware feature reassembly module. In the kernel prediction module, the model’s parameter count is reduced by compression based on the number of channels in the input low-resolution feature map. Specifically, if the feature map has the dimensions

H \times W \times C_{m}

, a

k_{e n c o d e r} \times k_{e n c o d e r}

, then the convolutional kernel prediction module processes the compressed input feature map, where the input channel number is

C_{m}

and the output channel number is

σ^{2} k_{u p}^{2}

. As a result, the upsampling kernel size becomes

σ H \times σ W \times k_{u p}^{2}

. To ensure feature balance and stability during feature reassembly, the softmax function is used to normalize the weights of each upsampling kernel at each position. This transformation converts the original scores of the upsampling kernel to non-negative values, ensuring that the sum of weights is 1. In the feature reassembly module, regions centered around target positions in the output feature map with the dimensions

k_{u p} \times k_{u p}

are extracted. Then, the predicted upsampling kernel at that point is used for dot product operation with the extracted region. The resulting feature is mapped back to the input feature map, resulting in a feature map of size

σ H \times σ W \times C

.

To address challenges such as insufficiently capturing fire and smoke details, narrow perception range, and underutilization of fire and smoke information in complex backgrounds during fire and smoke detection, CARAFE upsampling is introduced into the YOLOv8 model. This enhancement aims to improve the upsampling module in the original feature pyramid structure, thereby enhancing the capability of the model to extract fire and smoke features.

2.4. Mixed Local Channel Attention

This paper enhances the feature extraction capabilities of the model by introducing MLCA (mixed local channel attention, MLCA) in the neck. MLCA integrates channel, spatial, local channel, and global channel information to balance detection accuracy, speed, and model parameter count. Initially, the input feature maps are processed through local average pooling and global average pooling, generating feature maps with varying spatial resolutions and information representations. At the local level, MLCA captures detailed information using a sliding window approach and computes local spatial relationships to generate corresponding attention weights. Global attention acquires global information by computing relationships between channels across the entire feature map. The features obtained from local and global pooling are then passed through convolutional layers for channel compression. The output features are rearranged and subjected to multiplication and addition operations, which enhance the focus on relevant features while incorporating global context. Finally, the feature maps processed by local and global attention are restored to their original resolution through unpooling. Consequently, MLCA preserves local detail features, suppresses irrelevant or noisy features, and captures global contextual information, significantly enhancing the ability of the model to understand and manage complex scenes. The structure of MLCA is shown in Figure 5.

3. Experiment and Discussion

3.1. Materials

Due to the current absence of authoritative publicly available datasets on fire and smoke, this study collected on-site images of fire incidents to create its own experimental dataset. These images were sourced from publicly available datasets, internet images, and screenshots from internet videos. Using the LabelImg v1.8.1 tool for object detection annotation, previously unlabeled images were annotated, focusing mainly on two labels: fire and smoke. To enrich the dataset, various data augmentation techniques were employed, including rotation, grayscale conversion, random scaling, and the addition of Gaussian noise. The final dataset comprises a total of 8751 images depicting fire and smoke, capturing a wide range of environments such as indoor settings, forests, residential buildings, and low-light conditions. Selected examples from the dataset are shown in Figure 6.

3.2. Training Equipment and Parameter Setting

The experiment was conducted on a 64-bit Windows 10 version 22H2 operating system, using an NVIDIA Quadro P6000 GPU (NVIDIA, Santa Clara, CA, USA) with 24 GB of VRAM. Python 3.11.5 was the programming language used, and GPU acceleration was achieved with CUDA v11.8. Training was performed using the PyTorch 2.1.1 deep learning framework. In the experiment, the dataset was divided into training, validation, and test sets in an 8:1:1 ratio. Key model parameters are listed in Table 1.

3.3. Evaluation Indicators

In order to assess the detection performance of the GCM-YOLO model, two evaluation indexes, namely recall and mean average precision (mAP), are utilized.

The recall rate, expressed as a proportion, indicates the proportion of samples that are correctly recognized as positive cases among all samples that are actually positive cases. This metric is employed to evaluate the model’s ability to recognize positive cases. The calculation formula for this metric is presented in Equation (1).

R e c a l l = \frac{T P}{T P + F N}

(1)

In the above formula, TP represents True Positive and FN stands for False Negative.

Average precision (AP) is the area under the precision–recall curve (PR curve) and reflects the average level of precision of the model under different recall rates. The precision–recall curve of each category is calculated based on the prediction results of the model. The area under the curve is then calculated to obtain the average precision of each category. Subsequently, these averages are combined to obtain an overall mAP value using the equations shown in Equations (2) and (3).

A P = \int_{0}^{1} P (R) d R

(2)

m A P = \frac{\sum_{i = 1}^{n} A P (i)}{n}

(3)

The evaluation metrics also include measures such as the amount of model computations (FLOPs), the number of model parameters (parameters), and model size. The amount of computation refers to the floating-point operations required for the forward propagation process within a given model; it serves as an indicator for assessing computational complexity. The number of parameters denotes the total trainable parameters within a given model, while model size refers to the storage space required for storing said models, typically measured in megabytes (MB).

3.4. Comparison of Ablation Experiments

To systematically analyze and validate the influence of each constituent part of the model on its final performance, ablation experiments were conducted on a custom fire dataset. Each experiment within the groups maintained consistent training environments and related training parameters. Refer to Table 2 for detailed experimental results.

As shown in Table 2, incorporating GhostConv and Ghost Bottleneck structures from GhostNet into the YOLOv8n baseline model optimizes the feature extraction network, significantly reducing the number of parameters and computational load. Compared to the baseline model, the parameters and FLOPs are reduced by 2.8 M and 2.5 G, respectively, while the mAP@0.5 (mean average precision at IoU threshold of 0.5) of this model only decreases by 0.1 percentage points. Adding the CARAFE upsampling detection module to the baseline model increases the attention to global feature information, improving the mAP@0.5 by 1.0%. Model 3 introduces the MLCA attention mechanism into the baseline model, enhancing the network’s perception of both local and global information, which boosts the mAP@0.5 by 0.3% without additional computational cost. Model 4, which combines GhostNet and CARAFE modules, reduces computational load while enhancing feature resampling effectiveness, achieving an mAP@0.5 of 82.3%. This model shows a 38.3% reduction in parameters and a 34.6% reduction in FLOPs compared to the baseline model. Finally, the GCM-YOLO model integrates GhostNet, CARAFE, and MLCA, reaching an mAP@0.5 of 82.9% and a recall rate of 76.9%. It maintains a low level of parameters and FLOPs, demonstrating optimal overall performance. The synergistic interaction between these three modules maximizes the model’s detection effectiveness while minimizing computational resource consumption.

3.5. Comparison Experiment

To validate the lightweight effectiveness of the GhostNet backbone, comparative experiments were conducted using mainstream lightweight networks, FasterNet, and HGNetv2. The results are presented in Table 3.

As shown in the table, introducing FasterNet as the backbone network significantly reduces the number of parameters and FLOPs. However, this reduction in model size comes at the cost of less effective feature extraction compared to the baseline model, resulting in a 0.5 percentage point decrease in the mAP@0.5. Using HGNetv2 (Hierarchical Geometry Network version 2) as the backbone network also reduces the parameters and FLOPs compared to the baseline model, but the improvements in lightweight performance are not as pronounced as those achieved with GhostNet. Incorporating GhostNet as the backbone network markedly reduces both the number of parameters and the computational complexity while incurring only a minimal loss in accuracy, thereby demonstrating superior lightweight performance.

To further validate the effectiveness of the MLCA attention mechanism in fire detection, we conducted comparative experiments with other mainstream attention mechanisms on our custom fire dataset. The results are shown in Table 4.

As shown in Table 4, the YOLOv8n-MLCA model achieves the highest improvement in mean average precision (mAP) compared to the baseline model. Introducing the SEAttention (Squeeze-and-Excitation Attention, SEAttention) module into the baseline model improves the mAP@0.5 to 81.9%, but the recall rate decreases by 0.6%. The SEAttention module uses global average pooling to capture global features, but this reduces the model’s ability to capture detailed features. The CBAM (Convolutional Block Attention Module, CBAM) module includes a dual attention mechanism that computes both channel and spatial attention, resulting in a more complex processing flow. The SimAM (Simple Attention Module, SimAM) module simplifies the attention mechanism by combining linear transformation and dot product operations, avoiding the use of the softmax operation typically employed to compute attention weights. However, this simplification can lead to improper key weight allocation when handling small artifacts. MLCA, on the other hand, employs a multi-level feature fusion mechanism, allowing for the more effective integration of features at different scales. Compared to single-level attention mechanisms, MLCA can more comprehensively capture and utilize feature information. The comparison results indicate that the MLCA module performs best in terms of improving detection accuracy and recall rate, meeting the requirements for practical applications.

To further validate the performance of the GCM-YOLO model in fire detection, comparative analyses were conducted using different versions of the YOLO model on the custom dataset. The results are shown in Table 5.

Table 5 indicates that both YOLOv8n and YOLOv8s achieved higher average precision than the YOLOv5 version. Despite YOLOv8s showing slightly better accuracy, its parameter count and model size are approximately three times larger than those of YOLOv8n, which limits its suitability for real-time detection in resource-constrained environments. In contrast, the proposed GCM-YOLO model demonstrates significant advantages across all metrics, achieving an mAP@0.5 of 82.9% and a recall rate of 76.9%. Moreover, it maintains the lowest parameter count (1.9 M), FLOPS (5.3 G), and smallest model size (4.1 MB). These results indicate that the enhanced algorithm improves accuracy, detection efficiency, and compactness for fire detection tasks, making it well suited for deployment on devices with limited computational resources.

3.6. Results and Analysis

Comparing the GCM-YOLO model with the YOLOv8 baseline network, there is an improvement of 1.2% in the mAP@0.5 and 1.1% in the recall rate. At the same time, the parameter count and model size are reduced by 38.3% and 34.9%, respectively, which indicates that the enhanced model effectively reduces resource consumption while maintaining high accuracy.

During the model training process, as shown in Figure 7a, the mAP@0.5 of GCM-YOLO remains stable and consistently higher than that of the baseline YOLOv8n model after 100 epochs. This demonstrates superior stability and robustness during training. The comparison of loss function curves in Figure 7b reveals that the loss of GCM-YOLO decreases rapidly in the initial stages of training and stabilizes quickly. This indicates faster convergence and, in the later stages of training, the loss remains consistently low and stable, highlighting the stability of the model and efficient parameter updates throughout training.

The GCM-YOLO model demonstrates robust detection performance across diverse environments such as indoor, road, and complex backgrounds, as illustrated in Figure 8. Across various scenarios, it shows an improved ability to recognize flames and smoke, underscoring its adaptability and effectiveness in different settings. Figure 8a highlights the model’s capability to enhance smoke detection accuracy in situations where smoke blends with road-like backgrounds, and it effectively detects smaller flame targets. In Figure 8b, the model’s improvements are evident in reducing false detections, particularly in dimly lit conditions. Figure 8c further showcases GCM-YOLO’s enhanced detection accuracy for flames and smoke amidst complex backgrounds. As shown in Figure 8d, when the image contains artificial light sources such as lamps, the YOLOv8n model mistakenly identifies the light as flames. However, the improved model significantly reduces the false detection rate under artificial light interference, enhancing the overall robustness of the detection.

4. Conclusions

This study confronts the challenges of existing fire detection systems, which often lack robustness in complex environments and have difficulties in accurately detecting fires at an early stage. To improve upon the YOLOv8 baseline model, we propose the GCM-YOLO lightweight fire detection algorithm. GCM-YOLO enhances the YOLOv8n backbone by integrating GhostNet, introducing the CARAFEs upsampling module, and incorporating the MLCA attention mechanism to enhance model precision. GhostNet is a lightweight neural network architecture. It reduces computational cost by employing a small number of actual convolutional kernels to generate the feature maps and then utilizes economical linear operations to generate additional feature maps, which demonstrates significant superiority in terms of model lightweighting when compared with FasterNet and HGNetv2 networks. The CARAFEs upsampling module generates reconfigured weights by exploiting the content information within the input feature maps, allowing the upsampled feature maps to retain more detailed information, thereby improving the detection accuracy of the model. The MLCA attention mechanism integrates channel attention, spatial attention, and local and global information and is capable of distributing attention resources more effectively, enhancing the weights of crucial features, and suppressing irrelevant or noisy features. Compared with SEAttention, CBAM, and SimAM attention mechanisms, it achieves the optimum performance in terms of accuracy. Experimental results suggest that GCM-YOLO attains 82.9% of the mAP@0.5, representing a 1.2% increase in average accuracy compared to YOLOv8n. Additionally, it reduces the model parameters and FLOP by 38.3% and 34.6%, respectively. With the enhanced model efficiency and accuracy, GCM-YOLO proves highly appropriate for deployment in resource-constrained environments and has considerable practical applications in diverse scenarios.

This study contributes novel knowledge to the realm of assertive decision-making in the supervision of security and fire protection systems. It introduces innovative feature extraction methods and attention mechanisms that exhibit practical feasibility in resource-constrained environments. Nevertheless, it is undeniable that although the model performs well in resource-constrained environments, real-time performance constitutes a key metric in practical applications. The performance of the model’s inference speed on different hardware platforms needs further assessment and optimization. Future research should focus on investigating the reliability and security of the developed algorithms and the modernized security system as a whole. Comprehensive reliability tests and security evaluations will ensure the robustness of GCM-YOLO in real-world applications.

Author Contributions

Conceptualization, S.M. and W.L.; methodology, S.M. and W.L.; software, W.L. and L.W.; validation, W.L. and G.Z.; formal analysis, W.L. and L.W.; investigation, W.L.; resources, S.M.; data curation, W.L.; writing—original draft preparation, W.L.; writing—review and editing, S.M.; visualization, W.L.; supervision, S.M.; project administration, G.Z.; funding acquisition, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, Project No. 62103309.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The full dataset cannot be publicly disclosed due to privacy concerns. However, we can provide a subset of the data containing the main features. The data, models, and codes that support the findings of this study can be accessed by contacting the corresponding author upon reasonable request. [https://aistudio.baidu.com/datasetdetail/286868 (accessed on 26 July 2024)].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Barmpoutis, P.; Papaioannou, P.; Dimitropoulos, K.; Grammalidis, N. A review on early forest fire detection systems using optical remote sensing. Sensors 2020, 20, 6442. [Google Scholar] [CrossRef] [PubMed]
Kodur, V.; Kumar, P.; Rafi, M.M. Fire hazard in buildings: Review, assessment and strategies for improving fire safety. PSU Res. Rev. 2020, 4, 1–23. [Google Scholar] [CrossRef]
Gielen, A.C.; Frattaroli, S.; Pollack, K.M.; Peek-Asa, C.; Yang, J.G. How the science of injury prevention contributes to advancing home fire safety in the USA: Successes and opportunities. Inj. Prev. 2018, 24 (Suppl. S1), i7–i13. [Google Scholar] [CrossRef] [PubMed]
Ivanov, M.L.; Chow, W.K. Fire safety in modern indoor and built environment. Indoor Built Environ. 2023, 32, 3–8. [Google Scholar] [CrossRef]
Khan, F.; Xu, Z.; Sun, J.; Khan, F.M.; Ahmed, A.; Zhao, Y. Recent advances in sensors for fire detection. Sensors 2022, 22, 3310. [Google Scholar] [CrossRef] [PubMed]
El-afifi, M.I.; Team, S.S.A.F.R.; Elkelany, M.M. Development of Fire Detection Technologies. Nile J. Commun. Comput. Sci. 2024, 7, 58–66. [Google Scholar] [CrossRef]
Borges, P.V.K.; Izquierdo, E. A probabilistic approach for vision-based fire detection in videos. IEEE Trans. Circuits Syst. Video Technol. 2010, 20, 721–731. [Google Scholar] [CrossRef]
Singh, Y.K.; Deb, D. Detection of fire regions from a video image frames in YCbCr Color Model. Int. J. Recent Technol. Eng. 2019, 8, 6082–6086. [Google Scholar] [CrossRef]
Xiong, D.; Yan, L. Early smoke detection of forest fires based on SVM image segmentation. J. For. Sci. 2019, 65, 150–159. [Google Scholar] [CrossRef]
Alkhatib, A.A. A review on forest fire detection techniques. Int. J. Distrib. Sens. Netw. 2014, 10, 597368. [Google Scholar] [CrossRef]
Farasin, A.; Colomba, L.; Garza, P. Double-step u-net: A deep learning-based approach for the estimation of wildfire damage severity through sentinel-2 satellite data. Appl. Sci. 2020, 10, 4332. [Google Scholar] [CrossRef]
Bahhar, C.; Ksibi, A.; Ayadi, M.; Jamjoom, M.M.; Ullah, Z.; Soufiene, B.O.; Sakli, H. Wildfire and smoke detection using staged YOLO model and ensemble CNN. Electronics 2023, 12, 228. [Google Scholar] [CrossRef]
Kim, S.; Jang, I.S.; Ko, B.C. Domain-free fire detection using the spatial–temporal attention transform of the YOLO backbone. Pattern Anal. Appl. 2024, 27, 45. [Google Scholar] [CrossRef]
Yang, J.; Zhu, W.; Sun, T.; Ren, X.; Liu, F. Lightweight forest smoke and fire detection algorithm based on improved YOLOv5. PLoS ONE 2023, 18, e0291359. [Google Scholar] [CrossRef] [PubMed]
Chai, E.; Ta, L.; Ma, Z.; Zhi, M. ERF-YOLO: A YOLO algorithm compatible with fewer parameters and higher accuracy. Image Vis. Comput. 2021, 116, 104317. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–22101. [Google Scholar]
Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance Cheap Operation with Long-Range Attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]
Guo, A.; Sun, K.; Zhang, Z. A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection. J. Real-Time Image Process. 2024, 21, 49. [Google Scholar] [CrossRef]
Zhang, L.; Zheng, J.; Li, C.; Xu, Z.; Yang, J.; Wei, Q.; Wu, X. CCDN-DETR: A Detection Transformer Based on Constrained Contrast Denoising for Multi-Class Synthetic Aperture Radar Object Detection. Sensors 2024, 24, 1793. [Google Scholar] [CrossRef]
Wei, Z. Fire detection of YOLOv8 model based on integrated se attention mechanism. Front. Comput. Intell. Syst. 2023, 4, 28–30. [Google Scholar] [CrossRef]
Yu, Q.; Liu, H.; Wu, Q. An Improved YOLO for Road and Vehicle Target Detection Model. J. ICT Stand. 2023, 11, 197–216. [Google Scholar] [CrossRef]

Figure 1. GCM-YOLO network architecture.

Figure 2. Structure of GhostConv.

Figure 3. Structure of Ghost Bottleneck.

Figure 4. Structure of the CARAFE upsampling.

Figure 5. Structure of MLCA.

Figure 6. Visualization of fire datasets. (a) Indoor Scene; (b) Forest Scene; (c) Residential Building Scene; (d) Dark Scene.

Figure 7. Comparison curve between YOLOv8n and GCM-YOLO. (a) Comparison Curve of mAP@0.5; (b) Comparison Curve of Training Loss.

Figure 8. Visual comparison of detection performance. (a) Road Background; (b) Dim Indoor Background; (c) Complex Outdoor Background; (d) Artificial Light Interference.

Table 1. Model parameters.

Parameter	Learning Rate	Optimizer	Batch Size	Image Size	Epochs	Momentum Factor	Weight Decay
Value	0.01	SGD	64	640 × 640	150	0.93	0.005

Table 2. Ablation experiment results.

Network	GhostNet	CARAFE	MLCA	mAP@0.5	Recall/%	Parameters	FLOPS/G	Model Size/MB
YOLOv8n	-	-	-	81.7	75.8	3,006,038	8.1	6.3
Module 1	√	-	-	81.6	75.8	1,714,466	5.0	3.8
Module 2	-	√	-	82.7	77.6	3,146,142	8.4	6.6
Module 3	-	-	√	82.0	76.8	3,006,064	8.1	6.3
Module 4	√	√	-	82.3	76.0	1,854,570	5.3	4.1
Module 5	√	-	√	82.2	76.0	1,714,476	5.0	3.8
Module 6	-	√	√	82.6	76.3	3,146,168	8.4	6.6
GCM-OLO	√	√	√	82.9	76.9	1,854,596	5.3	4.1

Note. “√” means that an improvement module has been added to the model, “-” means that this improvement has not been added.

Table 3. Results of comparison experiment with different lightweight backbone networks.

Network	mAP@0.5/%	Recall/%	Parameters	FLOPS/G	Model Size/MB
YOLOv8n (Baseline)	81.7	75.8	3,006,038	8.1	6.3
YOLOv8n-FasterNet [18]	81.2	76.1	2,300,838	6.3	4.9
YOLOv8n-HGNetv2 [19]	81.8	76.3	2,351,290	6.9	5.0
YOLOv8n-GhostNet	81.6	75.8	1,714,466	5.0	3.8

Table 4. Results of comparison experiment with different attention mechanisms.

Network	YOLOv8n (Baseline)	YOLOv8n-SEAttention [20]	YOLOv8n-CBAM [21]	YOLOv8n-SimAM	YOLOv8n-MLCA
mAP@0.5/%	81.7	81.9	81.8	81.6	82.0
Recall/%	75.8	75.2	75.3	76.0	76.8

Table 5. Results of comparison experiment with different networks.

Network	mAP@0.5/%	Recall/%	Parameters	FLOPS/G	Model Size/MB
YOLOv5n	81.5	74.7	2,503,334	7.1	5.3
YOLOv5s	82.1	76.7	9,112,310	23.8	18.5
YOLOv8n	81.7	75.8	3,006,038	8.1	6.3
YOLOv8s	82.2	76.7	11,126,358	28.4	22.5
GCM-YOLO	82.9	76.9	1,854,596	5.3	4.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, S.; Li, W.; Wan, L.; Zhang, G. A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model. Appl. Sci. 2024, 14, 6878. https://doi.org/10.3390/app14166878

AMA Style

Ma S, Li W, Wan L, Zhang G. A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model. Applied Sciences. 2024; 14(16):6878. https://doi.org/10.3390/app14166878

Chicago/Turabian Style

Ma, Shuangbao, Wennan Li, Li Wan, and Guoqin Zhang. 2024. "A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model" Applied Sciences 14, no. 16: 6878. https://doi.org/10.3390/app14166878

APA Style

Ma, S., Li, W., Wan, L., & Zhang, G. (2024). A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model. Applied Sciences, 14(16), 6878. https://doi.org/10.3390/app14166878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model

Abstract

1. Introduction

2. Model and Methods

2.1. Improved YOLOv8 Model

2.2. GhostNet

2.3. The Content-Aware ReAssembly of Features Module

2.4. Mixed Local Channel Attention

3. Experiment and Discussion

3.1. Materials

3.2. Training Equipment and Parameter Setting

3.3. Evaluation Indicators

3.4. Comparison of Ablation Experiments

3.5. Comparison Experiment

3.6. Results and Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI