Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection

Kwon, Heejun; Choi, Sugi; Woo, Wonmyung; Jung, Haiyoung

doi:10.3390/fire8020066

Open AccessArticle

Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection

¹

Department of Fire and Disaster Prevention, Semyung University, 65 Semyung-ro, Jecheon 27136, Republic of Korea

²

Department of Electrical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon 22212, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work as first author.

Fire 2025, 8(2), 66; https://doi.org/10.3390/fire8020066

Submission received: 9 January 2025 / Revised: 28 January 2025 / Accepted: 31 January 2025 / Published: 6 February 2025

(This article belongs to the Special Issue Computer Vision and Artificial Intelligence in Fire and Flame Detection)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid expansion of the electric vehicle (EV) market has raised significant safety concerns, particularly regarding fires caused by the thermal runaway of lithium-ion batteries. To address this issue, this study investigates the real-time fire detection performance of segmentation-based object detection models for EVs. The evaluated models include YOLOv5-Seg, YOLOv8-Seg, YOLOv11-Seg, Mask R-CNN, and Cascade Mask R-CNN. Performance is analyzed using metrics such as precision, recall, F1-score, mAP50, and FPS. The experimental results reveal that the YOLO-based models outperform Mask R-CNN and Cascade Mask R-CNN across all evaluation metrics. In particular, YOLOv11-Seg demonstrates superior accuracy in delineating fire and smoke boundaries, achieving minimal false positives and high reliability under diverse fire scenarios. Additionally, its real-time processing speed of 136.99 FPS validates its capability for rapid detection and response, even in complex fire environments. Conversely, Mask R-CNN and Cascade Mask R-CNN exhibit suboptimal performance in terms of precision, recall, and FPS, limiting their applicability to real-time fire detection systems. This study establishes YOLO-based segmentation models, particularly the advanced YOLOv11-Seg, as highly effective EV fire detection and response systems.

Keywords:

electric vehicle; fire detection; YOLOv11-Seg; segmentation; object detection

1. Introduction

Internal combustion engine vehicles contribute to over 25% of global greenhouse gas emissions, driving the rapid adoption of electric vehicles (EVs) as sustainable alternatives to mitigate environmental challenges [1,2]. However, the increasing prevalence of EVs has introduced new safety concerns, particularly the risk of fire caused by the thermal runaway of lithium-ion batteries. This highlights the urgent need to develop advanced prevention and response technologies [3,4,5,6,7]. According to EV FireSafe data, 511 EV fire incidents were reported globally between 2010 and June 2024 [8]. Although the incidence of EV fires is lower than that of internal combustion engine fires, the frequency of these incidents is steadily increasing. As the EV market continues to grow, the likelihood of fires is expected to increase, necessitating the development of robust detection and response systems. The difficulty in extinguishing EV fires, often owing to thermal runaway and potential battery explosions, underscores the importance of technologies that enable rapid and accurate fire detection to minimize damage [9].

Recent advancements in artificial intelligence (AI)-based object detection have facilitated the development of sophisticated real-time fire detection models that can reduce false alarms and improve detection accuracy. Several well-established object detection models, including R-CNN, Cascade R-CNN, and YOLO [10,11,12,13], have demonstrated notable efficacy in various applications such as safety equipment recognition [14,15], hazardous material detection [16], and forest fire monitoring [17,18,19,20,21]. Despite these advancements, there is limited research on leveraging object detection techniques to enhance fire detection performance, particularly for identifying specific combustibles or targeting environments prone to fire hazards.

The unique challenges associated with fire detection stem from the ambiguous shapes, fluid dynamics, and real-time variability of flames and smoke, which are primary fire indicators. These characteristics render bounding-box-based object detection models insufficient for accurately detecting the spread of fire [22,23]. Segmentation techniques capable of pixel-level detection have been developed to overcome these limitations. However, their application in fire detection remains unexplored.

Despite the proliferation of research in the domain of fire detection technologies, the majority of studies have been confined to generic fire detection scenarios, such as forest fires or structural fires, rather than addressing specific challenges associated with emergent fire hazards, including electric vehicle fires. The thermal runaway phenomena unique to lithium-ion batteries present dynamic and unpredictable fire behaviors, including rapid flame spread and persistent combustion, which existing fire detection technologies fail to address effectively [24]. Moreover, the majority of object detection-based fire detection models rely on bounding box-level detection, which lacks the precision required for accurately identifying the spatial extent of a fire or analyzing its progression. This has created a significant knowledge gap in leveraging advanced segmentation-based deep learning approaches tailored for real-time electric vehicle fire detection.

This study employed a segmentation-based real-time detection approach to rapidly and accurately identify electric vehicle fires and analyze their spread. A dataset comprising 60 electric vehicle fire videos was collected and processed into 3000 frame-by-frame segmented images. Subsequently, the performance of the one-stage segmentation models (YOLOv5-Seg, YOLOv8-Seg, and YOLOv11-Seg) and multistage models (Mask R-CNN and Cascade Mask R-CNN) was evaluated through a comprehensive comparative analysis.

The main contributions to the field are as follows:

A dedicated dataset for electric vehicle (EV) fire detection was constructed and meticulously labeled. It can serve as foundational data for advancing object detection research.
This is the first study to apply segmentation techniques in real time for EV fire detection and conduct a comprehensive performance comparison.
The latest YOLO model, YOLOv11-Seg, was applied for the first time in deep learning-based fire detection research, representing a significant innovation in this domain.

The organization of this report is structured as follows: In Section 2, a comprehensive review of previous studies is presented, focusing on identifying the major contributions of this research. This section also delves into the differences between the bounding box and segmentation models, providing a detailed description of the segmentation models employed in this study. Section 3 outlines the experimental methods and conditions, including the data preparation, model training, and evaluation settings. Section 4 presents the experimental results, accompanied by an in-depth analysis of the findings. Finally, Section 5 summarizes the key results and provides insights and recommendations for future research directions.

2. Literature Review

Conventional fire detection methodologies principally depend on sensor-based approaches, encompassing thermal, electrical, and gas sensors. These systems are typically installed on the ceilings of buildings and are designed to detect fires as soon as smoke or flames reach the range at which they can be detected. Whilst these systems are generally effective in certain scenarios, they are not without their limitations in environments requiring a rapid response. This is due to the fact that the upward movement of smoke and flames can delay detection [25].

This limitation is particularly evident in the case of electric vehicle (EV) fires. Lithium-ion batteries, which are commonly situated in the lower sections of EVs, generate flames and smoke that take longer to reach ceiling-mounted sensors. This delay increases the potential for greater damage before detection occurs.

AI-based fire detection models offer a solution to these challenges by leveraging CCTV systems and advanced algorithms to detect flames and smoke in real time [26]. In contrast to conventional methods, AI-based systems are able to analyze visual data and provide immediate alerts, rendering them indispensable for environments such as EVs, where fires can propagate rapidly due to thermal runaway. These capabilities underscore the critical role of AI-based models in advancing fire detection technologies and addressing the limitations of conventional approaches.

2.1. Existing Fire Detection Research Based on Object Detection

Previous studies on the development and performance improvement of object detection-based fire detection models have employed various strategies such as data augmentation, algorithmic structure modification, and optimization. For instance, Guede-Fernández et al. [27] conducted a comparative performance analysis of Faster R-CNN and RetinaNet models retrained using the Detectron2 platform. They investigated the mitigation of the spread of wildfires to minimize the associated damage through the early detection of smoke plumes. Among the two models, the optimized Faster R-CNN demonstrated superior performance in detecting smoke plumes, achieving an F1-score of approximately 80%, a G-mean of 80%, and a detection rate of 90% on an independent test set. These results underscore the efficacy of this approach in improving detection accuracy and reliability compared to conventional models.

Similarly, Zhang [28] addressed the challenge of missed detections in long-range wildfire monitoring, which is typically caused by the diminishing size of these fires. By modifying the backbone network of the YOLOv5 model, Zhang proposed two enhanced algorithms: DenseM-YOLOv5 and SimAM-YOLOv5. The former achieved notable improvements in accuracy (2.24%), recall (1.2%), and average precision (AP) (1.52%) compared with the original YOLOv5. In addition, SimAM-YOLOv5 exhibited enhanced accuracy (2.07%) and a slight improvement in AP (0.02%).

However, most existing fire detection studies have predominantly focused on wildfires or have employed general-purpose fire image datasets. Studies specifically aimed at improving detection efficacy in fire-prone locations, such as hazardous material storage facilities, factories, and electric vehicles, are relatively sparse [29,30,31,32]. Furthermore, although segmentation techniques capable of detecting objects on a pixel-by-pixel basis offer significant advantages for analyzing the spread path and extent of fires, current fire detection research primarily relies on bounding box methods. The adoption of real-time segmentation technology in fire detection applications is still in its nascent stages, necessitating further exploration and development.

2.2. One-Stage Object Models

2.2.1. YOLOv5-Seg

YOLOv5-Seg is an extended version of YOLOv5 with segmentation capabilities. Its structural framework is illustrated in Figure 1.

The backbone of this model processes input images using convolutional (Conv) modules to generate initial feature maps. These maps are then refined using multiple Conv and C3 modules, resulting in high-dimensional feature maps that capture the unique characteristics of the input image. The extracted feature maps are subsequently transmitted to the neck, where object classification and localization are performed. In the neck, the convolutional module processes the feature maps; the maps then pass through an upsampling module that adjusts the resolution and object size. To enhance prediction accuracy, the Concat module integrates feature maps extracted from different layers, enabling the simultaneous prediction of object classes and locations.

The final output of YOLOv5-Seg includes precise pixel-level segmentation results, enabling the model to rapidly and accurately detect objects across input images of varying dimensions. This capability ensures that YOLOv5-Seg can effectively handle diverse object sizes and shapes while maintaining high detection performance [33,34].

Whilst the YOLOv5-Seg model has been demonstrated to exhibit excellent real-time performance and adaptability across a range of datasets by virtue of its lightweight structure, it is limited in its ability to effectively handle complex scenes characterized by the presence of overlapping objects. Furthermore, in comparison to dedicated segmentation models such as Mask R-CNN, YOLOv5-Seg may exhibit inferior segmentation accuracy. Consequently, YOLOv5-Seg is particularly suited for applications requiring simple real-time object detection and segmentation, such as industrial automation or basic fire monitoring. However, further research is needed to enhance its performance in more demanding scenarios.

2.2.2. YOLOv8-Seg

YOLOv8-Seg extends the YOLOv8 model by incorporating segmentation outputs. Its detailed structure is shown in Figure 2. This model comprises three main components: backbone, neck, and head.

The backbone extracts features from the input image frames. To enhance computational efficiency and reduce complexity, YOLOv8-Seg replaces the C3 module used in YOLOv5 with the C2f module. Furthermore, the initial 6 × 6 convolutional layer is replaced by a 3 × 3 convolutional layer, which significantly improves computational efficiency. The backbone generates feature maps at multiple scales, which are then passed through the neck. The neck integrates these feature maps and employs the spatial pyramid pooling feature (SPPF) to efficiently handle objects of varying scales. This process optimizes the computational efficiency and enables the model to process diverse object sizes effectively. The head predicts object locations and classes using an anchorless architecture to simplify the prediction process and enhance the generalization capabilities of the model. To further optimize the object detection and segmentation performance, YOLOv8-Seg employs a nonmaximum suppression (NMS) process, which facilitates the rapid and efficient processing of candidate detection results.

This approach achieves high performance through its optimized design and efficient gradient propagation path. It is a robust model for a wide range of computer vision tasks, including real-time object detection and segmentation [35].

YOLOv8-Seg represents an enhancement to the YOLOv5-Seg model through the introduction of an updated backbone and a refined network architecture, thereby achieving an enhancement in detection and segmentation performance. While achieving an optimal balance between speed and accuracy and performing well in complex object arrangements, the precision of YOLOv8-Seg may still be inferior to that of dedicated segmentation models. Furthermore, the enhanced structure requires greater hardware resources, which may pose challenges for deployment in resource-constrained devices. Notwithstanding these limitations, YOLOv8-Seg is deemed suitable for more complex scenarios with diverse object types, and ongoing research focuses on further improving its capabilities.

2.2.3. YOLOv11-Seg

YOLOv11-Seg is an extension of the YOLOv11 model with enhanced segmentation capabilities. Its detailed structural framework is shown in Figure 3. The model comprises three key components: backbone, neck, and head. Each component works in tandem with others to optimize the model’s overall performance [36].

The backbone extracts features from the input image frames. Unlike YOLOv8, which employs a C2f module, YOLOv11-Seg introduces the C3k2 module, an advanced iteration of the cross-stage partial (CSP) structure. This module utilizes smaller convolutional kernels (2 × 2) to enhance feature extraction efficiency, allowing the model to focus on finer details while reducing computational redundancy. The integration of two smaller kernels within the CSP structure enhances the models capability to capture intricate spatial details, which is particularly beneficial for detecting small-scale objects and resolving ambiguities in object boundaries.

Additionally, the C2PSA (Cross-Stage Partial with Spatial Attention) module is incorporated to further improve detection precision. This module employs a spatial attention mechanism that prioritizes specific regions of interest, enabling the model to effectively detect partially occluded or small objects. By independently processing spatial and channel interactions, the C2PSA module allocates computational resources to the most relevant features, significantly enhancing the detection accuracy in complex scenarios. For instance, in fire detection tasks, the module allows the model to discern subtle indicators such as smoke or partially visible flames, even in cluttered or obscured environments.

The neck integrates feature maps from multiple scales and transmits them to the head, where additional C3k2 modules are employed to enhance the effective capture and integration of multiscale information. The head predicts object locations and classes while maintaining computational efficiency through the use of a convolution–batchnorm–SiLU (CBS) module and a non-maximum suppression (NMS) process, ensuring accurate and reliable predictions.

In summary, YOLOv11-Seg demonstrates exceptional performance in object detection and segmentation tasks. Its innovative design, incorporating the C3k2 and C2PSA modules, addresses common challenges in object detection, such as occlusions and variations in object scale. These innovations establish YOLOv11-Seg as a highly effective model for real-time fire detection systems, balancing precision and efficiency while maintaining robustness across diverse environments. However, the increased computational complexity and hardware requirements necessitate optimization for deployment on resource-constrained edge devices [37].

2.3. Multi-Stage Object Models

2.3.1. Mask R-CNN

Mask R-CNN is a versatile model that can simultaneously perform object detection and segmentation. Based on the Faster R-CNN framework, this model not only predicts the location and class of objects but also generates pixel-level masks, significantly enhancing detection precision while enabling detailed shape prediction. Its structural framework is illustrated in Figure 4.

The model utilizes a backbone composed of ResNet-50 and a feature pyramid network (FPN) to transform input images into high-dimensional feature maps. The region proposal network (RPN) subsequently identifies candidate regions of interest (ROIs) that are likely to contain objects. The RoIAlign layer refines these RoIs by aligning them into fixed-size feature maps and preserving the pixel-level spatial information. This refinement addresses the quantization error associated with traditional RoI pooling, thereby improving the precision of mask predictions.

The network head performs three distinct tasks: object classification, bounding-box localization, and mask prediction. Classification determines the object category, bounding box localization adjusts the object position and size, and mask prediction generates pixel-level representations of the object shape. Each task is optimized using a multitask loss function. By outputting the class, bounding box, and pixel-level mask of each object in the image, Mask R-CNN achieves precise instance segmentation, facilitating effective detection and segmentation of objects of varying sizes.

Owing to its structural advantages, Mask R-CNN excels in handling diverse object sizes and performs accurate pixel-level instance segmentation, making it a robust option for applications requiring high precision [38].

Mask R-CNN has been demonstrated to be highly effective for tasks requiring precise pixel-level segmentation, particularly in complex scenes. Nevertheless, its substantial computational requirements render it ill-suited for real-time implementations. For use cases necessitating lightweight or resource-efficient models, such as edge-device deployment, its substantial architecture can present a significant constraint. Notwithstanding this limitation, the models precision and reliability make it a highly suitable option for scenarios where segmentation accuracy is paramount.

2.3.2. Cascade Mask R-CNN

Cascade Mask R-CNN is an advanced model designed to improve the performance of Mask R-CNN by organizing the detection head into multiple stages. It facilitates high-resolution image processing and the precise detection of multiple objects within complex images. The structural framework of Cascade Mask R-CNN is shown in Figure 5.

The key components of this model include the backbone, the region proposal network (RPN), the cascade detection head, and RoIAlign. The backbone and neck adopt the same structural configuration as Mask R-CNN. Cascade Mask R-CNN improves detection performance by progressively increasing the Intersection over Union (IoU) thresholds at each detection head stage. This staged approach refines the predictions of object classes, bounding boxes, and masks in each successive step, thereby enhancing the overall detection accuracy.

The cascade structure enables an incremental improvement in accuracy as each stage builds upon the predictions of the previous stage to generate more precise results. Consequently, the Cascade Mask R-CNN achieves higher precision in object detection and pixel-level mask segmentation than Mask R-CNN. Thus, this model is particularly effective in detecting objects of varying sizes within complex images, and for tasks requiring precise segmentation [39].

Cascade Mask R-CNN enhances the accuracy of detection through its multi-stage detection head and iterative refinement. However, this comes at the cost of increased computational overhead and slower processing times. These characteristics render it less suitable for real-time applications but highly suitable for scenarios requiring detailed segmentation and accurate detection in complex environments, such as fire detection in high-resolution imagery.

2.4. Bounding Box and Segmentation Labeling

Existing models, including Faster R-CNN, Cascade R-CNN, and YOLO, employ bounding boxes to represent the locations of detected objects. The upper left corner of the bounding box represents the x_(min) and y_(min) coordinates, and the lower right corner represents the x_(max) and y_(max) coordinates. This defines a minimal bounding box that contains an object and indicates its location. However, given that flames and smoke often occur simultaneously, bounding box-based fire detection is susceptible to overlap among the bounding boxes, which can only approximately indicate the path and extent of fire spread. In particular, smoke spreads diagonally or has an atypical shape depending on airflow. The bounding box method detects the entire screen as fire, making it difficult to clearly identify the origin and spread path.

To address these limitations, this study employs a segmentation method that enables the representation of objects at the pixel level. This method is appropriate for fire detection and path analysis because it can express the origin and spread path of a fire in greater detail. Figure 6 compares the labeling results obtained using bounding box (Figure 6a) and segmentation (Figure 6b) based methods. The implementation of the segmentation method has been demonstrated to enhance the precision and spatial understanding of fire detection, facilitating effective visualization of the fire origin, spread path, and extent [40].

3. Research Methods

Figure 7 presents the methodology employed in this study, which consists of the following steps:

Data collection: Videos of electric vehicle (EV) fires were collected from the Internet. A total of 60 videos encompassing the entire combustion process from ignition to the peak fire stage were investigated.
Frame extraction and preprocessing: Non-fire footage was manually removed from the videos, and the remaining footage was converted into frame-by-frame images. Consecutive or duplicate frames were subsequently excluded, yielding a final dataset of 3000 images.
Labeling for segmentation: The extracted images were labeled for segmentation using RoboFlow.
Model application: The labeled images were processed using segmentation models, including YOLOv5-Seg, YOLOv8-Seg, YOLOv11-Seg, Mask R-CNN, and Cascade Mask R-CNN.
Training, validation, and testing: The models were trained and validated using the constructed dataset, and the inference results were analyzed using a designated test dataset.

3.1. Production of Datasets

A thorough review of existing datasets for electric vehicle (EV) fire detection revealed a lack of validated datasets specifically labeled for flames and smoke. Given the critical importance of dataset quality in ensuring the validity and robustness of deep learning models, the construction of a high-quality dataset tailored for EV fire detection was investigated.

In this regard, 60 videos of EV fires were collected from publicly available Internet sources. They covered the entire combustion process from the initial ignition stage to the peak fire stage. Non-fire footage was manually removed and the videos were converted into individual frames. To further enhance the quality of the dataset and mitigate the risk of overfitting, consecutive and duplicate frames were excluded, resulting in a final dataset comprising 3000 unique images.

The dataset was designed to reflect diverse environmental conditions and vehicle types, thereby ensuring broad applicability. The images include scenes from roadways, surface parking lots (with and without charging stations), garages, underground parking lots (with and without charging stations), vacant lots, and other distinctive locations. The vehicles featured in the dataset included cars, buses, and trucks. Table 1 provides a detailed summary of the characteristics of the dataset and Figure 8 presents an illustrative example of the collected EV fire images.

In order to verify the robustness and generalizability of the proposed models, a second validation dataset was utilized. This dataset, provided by the National Fire Agency of South Korea, included images depicting various EV fire scenarios and suppression activities in diverse settings, such as highways, urban areas, parking garages, and charging stations. In contrast to the initial dataset, these images were not included in the training, validation, or testing phases of the second dataset. By evaluating the models on this entirely unseen dataset, we confirmed their ability to generalize to novel environments and fire conditions, thereby ensuring the reliability and practical applicability of the proposed approach.

The inference performance of the models under real-world conditions was evaluated by sourcing supplementary images from experimental fire drills and simulations conducted by fire departments. These supplementary datasets include images from the Incheon Ganghwa Fire Department’s EV fire response drill, the Gyeongbuk Fire Department’s EV fire experiment, and the Seoul Fire Department’s EV fire simulation experiment. These additional datasets were instrumental in validating the generalization capabilities of the models and their performances in diverse and practical fire scenarios.

3.2. Data Preprocessing

The images of electric vehicle (EV) fires were subjected to segmentation and labeling using Roboflow’s Polygon Tool and Smart Polygon (AI Labeling) tool. The Polygon Tool enables users to manually delineate the boundaries of individual objects for labeling, a process that is time-intensive but ensures high accuracy. In contrast, the Smart Polygon tool leverages the Segment Anything Model (SAM) to automatically detect object boundaries and perform segmentation labeling with a simple click on the object. This automation significantly reduces the time and cost associated with labeling tasks. In addition, automatically generated masks can be further refined to improve precision [41].

Following the labeling process, the images were resized to 640 × 640 pixels to ensure compatibility with the model training pipeline. This resolution was selected to minimize the computational overhead while preserving sufficient detail, thereby enhancing both learning efficiency and inference speed.

The dataset was subsequently divided into three subsets to facilitate model training and performance evaluation. The training set comprised 2400 images (80% of the total dataset), whereas the validation and test sets contained 300 images (10% of the total dataset). These subsets were used in model performance comparison experiments to assess segmentation accuracy and reliability.

3.3. Experimental Environment and Parameter Settings

The hardware environment and hyperparameter settings presented in Table 2 were employed to conduct a comparative performance analysis of the segmentation-based fire detection models. All the training and validation experiments were conducted in the same computational environment, which comprised a 32-core CPU, 96 GB of RAM, and an NVIDIA V100 GPU.

The experimental conditions were optimized according to the default parameter settings for each model to ensure consistency and reliability. In addition, the number of training epochs was set to 300 to enable a sufficient number of iterations for model convergence and performance optimization.

3.4. Model Evaluation Metrics

The performance of the models used in this study was assessed using a comprehensive set of evaluation metrics, including precision, recall, F1-score, mean average precision (mAP), and frames per second (FPS). These metrics are widely recognized in the field of object detection and provide a robust framework for evaluating the effectiveness and efficiency of models. Detailed explanations of these metrics are presented in Figure 9.

In this context, a true positive (TP) refers to instances in which a fire event occurred and was correctly identified by the model. Conversely, a false positive (FP) denotes the cases in which the model incorrectly identifies a fire in the absence of an actual fire. False negatives (FN) represent instances where a fire event occurs but the model fails to detect it.

Precision is defined as the proportion of fire events identified by a model that are actually fires. A high-precision value is critical for minimizing false fire alarms, which is particularly important in practical fire detection systems. Recall measures the proportion of actual fire events correctly identified by the model, ensuring that no fire incidents are missed. Together, precision and recall provide a general overview of the detection capability of the model.

The F1-score represents a harmonic mean of precision and recall, offering a balanced assessment of the models accuracy and detection rate. This metric is particularly useful to evaluate the tradeoff between precision and recall.

Average precision (AP) was calculated as the area under the precision–recall curve, and the model’s performance was evaluated across varying detection thresholds. The mean average precision (mAP) is the mean value of the AP across all object classes and serves as a comprehensive indicator of the models ability to consistently detect fires of different sizes and shapes. mAP50, a commonly used variant, measures the AP at an Intersection over Union (IoU) threshold of 0.5 and is used to evaluate the performance of the model within the test dataset.

Finally, frames per second (FPS) measure the number of images that the model can process per second, providing a benchmark for its real-time processing capability. A high FPS value indicates that the model can rapidly analyze images or video frames, enabling efficient and timely fire detection in practical applications [42,43].

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

F 1_S c o r e = 2 \times \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

A P = \int_{0}^{1} p (r) d r

(4)

m A P = \frac{1}{C} \sum_{i = 1}^{C} {A P}_{i}

(5)

F P S = \frac{N}{T}

(6)

The Intersection over Union (IoU) metric is of critical importance in the evaluation of object detection performance. It quantifies the extent to which the predicted bounding box aligns with the ground truth bounding box, defined as the ratio of the intersection area to the union area of the two bounding boxes. Mathematically, the IoU can be expressed as follows:

I o U = \frac{A r e a o f I n t e r s e c t i o n}{A r e a o f U n i o n}

(7)

The value of the IoU metric is interpreted as follows: an IoU value of 1 indicates perfect overlap, while a value of 0 indicates no overlap. In the context of object detection tasks, the IoU is utilized to ascertain whether a predicted bounding box constitutes a true positive. To illustrate this, consider a detection that is classified as positive, conditional upon the IoU between the predicted box and the ground truth box exceeding a predefined threshold, such as 0.5.

The selection of mAP50 as an evaluation metric is indicative of its practicality and widespread adoption in the field of object detection. The “50” in mAP50 denotes that the IoU threshold for determining a true positive is set at 0.5. This choice of threshold is intended to strike a balance between strict precision and leniency, thereby ensuring that the model effectively detects objects without unduly penalizing minor deviations in bounding box placement. The mAP50 metric is particularly useful for the evaluation of fire detection models, as it provides a comprehensive assessment of the models ability to accurately localize and classify fires of varying shapes and sizes. By focusing on consistent performance across different scenarios, mAP50 enables a robust evaluation of the models generalization capability in real-world applications.

4. Experimental Analysis

4.1. Training and Validation Results of Electric Vehicle Fire Detection Model

In this study, real-time electric vehicle (EV) fire detection performance was evaluated using a series of object detection models, including YOLOv5-Seg, YOLOv8-Seg, YOLOv11-Seg, Mask R-CNN, and Cascade Mask R-CNN, all of which incorporate segmentation technology. A performance comparison was conducted using precision, recall, mAP50, and F1-score as the evaluation metrics. Figure 10 depicts the progression of these metrics across the epochs during the model training process.

The analysis revealed that the YOLOv11-Seg model had the best overall performance, achieving an F1-score of 0.7592 and mAP50 of 0.7635. These results demonstrate that YOLOv11-Seg simultaneously achieves high precision and recall, rendering it the most suitable option for segmentation-based detection and object detection tasks involving EV fire datasets.

Following YOLOv11-Seg, YOLOv8-Seg and YOLOv5-Seg demonstrated satisfactory performance. Both models showed a rapid convergence of precision and recall during the early training stages and maintained consistent performance throughout the training process.

In contrast, the Mask R-CNN and Cascade Mask R-CNN models displayed suboptimal performance, with the precision and recall values converging at relatively low levels as training progressed. These findings suggest that these models may not be adequately optimized for the specific characteristics of the EV fire dataset or the associated segmentation task.

The performance of the models was further examined by conducting additional evaluations using the optimal weights identified during training. This facilitates a more comprehensive analysis of the unique features and limitations of each model and provides deeper insights into their respective capabilities for EV fire detection.

4.2. EV Fire Detection Model Test and Inference Results

Table 3 presents the performance evaluation results of the Mask R-CNN, Cascade Mask R-CNN, YOLOv5-Seg, YOLOv8-Seg, and YOLOv11-Seg models for the test dataset. The evaluation metrics included precision, recall, F1-score, mAP50, and FPS, with the performance analyzed for both fire and smoke classes.

In terms of precision, YOLOv8-Seg achieved the highest fire detection performance of 0.836, whereas YOLOv5-Seg had the highest smoke detection accuracy with a precision value of 0.811. For the overall precision across all classes, YOLOv8-Seg demonstrated the best performance, achieving a score of 0.818, indicating its effectiveness in minimizing false positives. In contrast, the Mask R-CNN and Cascade Mask R-CNN models exhibited relatively high false-positive rates, with precision values of 0.404 and 0.414, respectively, reflecting their limitations in terms of false alarms.

Regarding recall, YOLOv5-Seg achieved the highest performance in fire detection with a value of 0.727, whereas YOLOv11-Seg demonstrated the best recall in terms of smoke detection, with a value of 0.702. YOLOv5-Seg also achieved the highest overall recall value of 0.703, highlighting its ability to minimize undetected fire incidents. Conversely, Mask R-CNN and Cascade Mask R-CNN exhibited restricted fire detection performance, with recall values of 0.494 and 0.489, respectively. These results indicate a higher prevalence of false negatives in these models than in the YOLO models, which is a significant limitation in the context of fire detection.

The F1-score, which reflects the balance between precision and recall, revealed that YOLOv8-Seg achieved the best overall performance, with a score of 0.748 for fire detection and 0.744 for smoke detection. The models overall F1-score across all classes was 0.744, indicating a consistent and balanced detection performance for both fire and smoke. In contrast, Mask R-CNN and Cascade Mask R-CNN had substantially lower F1-scores of 0.445 and 0.448, respectively, demonstrating their inferior performance compared with the YOLO models.

For the mAP50 metric, YOLOv11-Seg achieved the highest fire detection score of 0.766, whereas YOLOv8-Seg exhibited superior performance in smoke detection with a score of 0.731. The average mAP50 values of YOLOv8-Seg and YOLOv11-Seg were 0.744, indicating strong performance across both classes. In comparison, Mask R-CNN and Cascade Mask R-CNN achieved slightly lower average mAP50 values of 0.730 and 0.745, respectively, underscoring their relative inferiority to the YOLO models in terms of precision and detection accuracy.

The frames per second (FPS) analysis, which evaluates the capacity of a model for real-time detection, yielded a value of 136.99 for YOLOv11-Seg, which demonstrates that it is best suited for real-time applications. YOLOv8-Seg and YOLOv5-Seg also demonstrated real-time detection capabilities with FPS values of 111.11 and 67.11, respectively. In contrast, Mask R-CNN and Cascade Mask R-CNN exhibited significantly lower FPS values of 29.1 and 20.1, respectively, rendering them unsuitable for environments that demand real-time responsiveness.

These results demonstrate that YOLOv8-Seg and YOLOv11-Seg outperform the other models in terms of overall detection accuracy and real-time applicability, making them the most suitable candidates for EV fire detection. The lower performance values of Mask R-CNN and Cascade Mask R-CNN, particularly in terms of recall and FPS, highlight the need for further optimization before these models can be applied to time-critical fire detection scenarios.

In general, Mask R-CNN and Cascade Mask R-CNN were suboptimal for real-time fire detection because of their low FPS values and relatively poor performance across the precision, recall, and F1-score metrics. Conversely, YOLOv5-Seg, YOLOv8-Seg, and YOLOv11-Seg exhibited high precision, recall, F1-score, and mAP50 values, making them suitable for real-time fire detection tasks. However, the performance differences among these YOLO-based models are relatively minor, necessitating additional evaluation using external datasets to facilitate a more comprehensive comparison and analysis.

Hence, the performance of the models were further evaluated by collecting and analyzing additional images of electric vehicle (EV) fires that were not included in the original training, validation, and test datasets. This supplementary dataset includes images captured under various conditions, such as the EV fire response drill conducted by the Incheon Ganghwa Fire Department, the EV fire experiment conducted by the Gyeongbuk Fire Department, and the EV fire simulation experiment conducted in Seoul. These images were analyzed to assess the inference performance of the YOLOv5–Seg, YOLOv8-Seg, and YOLOv11-Seg models. Figure 11 shows the results of the fire detection and segmentation evaluations of the additional images.

The inference results indicate that the YOLOv5-Seg model achieves high fire detection accuracy but relatively less precise segmentation boundaries than the other models. For instance, during the Seoul electric vehicle fire reproduction experiment, the YOLOv5-Seg model successfully detected smoke but failed to produce distinct segmentation boundaries, resulting in three instances of non-detection in both the fire and smoke classes.

The YOLOv8-Seg model demonstrated improved detection precision compared with the YOLOv5-Seg model. The segmentation boundaries for fire and smoke were more distinct, resulting in improved detection results. However, the YOLOv8-Seg model was not without limitations, as it exhibited three instances of non-detection in the fire class and two in the smoke class.

Among the evaluated models, YOLOv11-Seg demonstrated the most effective fire detection performance. It consistently distinguished between fire and smoke with high consistency across diverse scenarios. Notably, only this model successfully detected smoke during the Sejong Fire Department’s electric vehicle fire-suppression training exercise. Furthermore, YOLOv11-Seg exhibited the lowest false-positive rate, with only a single instance of misclassification in both the fire and smoke classes. This highlights its superior accuracy and reliability in real-time fire and smoke detection.

5. Conclusions

This study reviewed several deep learning-based fire detection algorithms optimized for electric vehicle (EV) fire detection and evaluated the performance of various object detection models employing segmentation techniques. In particular, comparisons were performed among the YOLOv5-Seg, YOLOv8-Seg, and YOLOv11-Seg models and the multistage models Mask R-CNN and Cascade Mask R-CNN.

The experimental results revealed that the YOLO-based models consistently outperformed Mask R-CNN and Cascade Mask R-CNN across all evaluation metrics, including precision, recall, F1-score, mAP50, and FPS. Among the models, YOLOv11-Seg demonstrated the best performance, with precise delineation of fire and smoke boundaries and reliable detection capabilities across diverse fire scenarios. YOLOv11-Seg achieved the highest real-time detection performance with an FPS of 136.99, highlighting its suitability when rapid responses in complex fire environments are required. Furthermore, supplementary experiments conducted with additional datasets confirmed the reliability of YOLOv11-Seg, which exhibited minimal false positives and exceptional detection performance. For instance, during the Electric Vehicle Fire Suppression Training at Jochiwon Fire Station, only YOLOv11-Seg accurately detected smoke, further underscoring its superiority as a segmentation model.

YOLOv8-Seg also demonstrated a performance profile comparable to that of YOLOv11-Seg and achieved a strong balance between detection accuracy and reliability. It effectively identified the boundary between fire and smoke while maintaining an excellent computational speed. In contrast, YOLOv5-Seg exhibited a slightly lower performance; however, it demonstrated practical applicability for basic EV fire detection tasks.

Mask R-CNN and Cascade Mask R-CNN showed comparable performances to YOLO-based models in certain metrics, such as mAP50. However, these models were markedly inferior in terms of precision and recall and exhibited suboptimal real-time detection performance, with FPS values of 29.1 and 20.1, respectively. These limitations render these models unsuitable for real-time fire detection, particularly in complex and dynamic fire environments.

The findings of this study indicate that YOLO-based segmentation models, particularly YOLOv11-Seg, are the most suitable for real-time fire response that requires precise boundary detection and rapid processing. This study provides a robust technical foundation for the development of real-time fire detection and suppression systems, thereby partially addressing the growing safety demands of an expanding EV market.

Future research will focus on improving data quality by implementing advanced data augmentation techniques and preprocessing methodologies. Additionally, further testing on various hardware configurations would be beneficial to ensure consistent performance across a broader range of practical scenarios. The enhancement of the YOLOv11-Seg model’s structure to optimize detection accuracy and reliability remains a priority, with the ultimate objective being to advance the practicality and effectiveness of EV fire detection and response.

Author Contributions

Conceptualization, H.K. and S.C.; methodology, H.K.; software, S.C.; validation, W.W. and H.J.; formal analysis, H.K.; investigation, S.C.; resources, H.J.; data curation, H.K.; writing—original draft preparation, S.C.; writing—review and editing, S.C. and H.J.; visualization, H.K.; supervision, W.W.; project administration, H.J.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology Development Program (RS-2024-00467922), funded by the Ministry of SMEs and Startups (MSS, Korea).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Challa, R.; Kamath, D.; Anctil, A. Well-to-wheel greenhouse gas emissions of electric versus combustion vehicles from 2018 to 2030 in the US. J. Environ. Manag. 2022, 308, 114592. [Google Scholar] [CrossRef]
Qiao, Q.; Zhao, F.; Liu, Z.; He, X.; Hao, H. Life cycle greenhouse gas emissions of electric vehicles in China: Combining the vehicle cycle and fuel cycle. Energy 2019, 177, 222–233. [Google Scholar] [CrossRef]
Sun, P.; Bisschop, R.; Niu, H.; Huang, X. A review of battery fires in electric vehicles. Fire Technol. 2020, 56, 1361–1410. [Google Scholar] [CrossRef]
Dorsz, A.; Lewandowski, M. Analysis of fire hazards associated with the operation of electric vehicles in enclosed structures. Energies 2021, 15, 11. [Google Scholar] [CrossRef]
Cui, Y.; Liu, J.; Cong, B.; Han, X.; Yin, S. Characterization and assessment of fire evolution process of electric vehicles placed in parallel. Process Saf. Environ. Prot. 2022, 166, 524–534. [Google Scholar] [CrossRef]
La Scala, A.; Loprieno, P.; Foti, D.; La Scala, M. The mechanical response of structural elements in enclosed structures during electric vehicle fires: A computational study. Energies 2023, 16, 7233. [Google Scholar] [CrossRef]
Kiasari, M.M.; Aly, H.H. Enhancing Fire Protection in Electric Vehicle Batteries Based on Thermal Energy Storage Systems Using Machine Learning and Feature Engineering. Fire 2024, 7, 296. [Google Scholar] [CrossRef]
EV Universe. EV Fires. Available online: https://www.evuniverse.io/p/ev-fires (accessed on 5 December 2024).
Zhang, S.; Yang, Q.; Gao, Y.; Gao, D. Real-time fire detection method for electric vehicle charging stations based on machine vision. World Electr. Veh. J. 2022, 13, 23. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar] [CrossRef]
Nath, N.D.; Behzadan, A.H.; Paal, S.G. Deep learning for site safety: Real-time detection of personal protective equipment. Autom. Constr. 2020, 112, 103085. [Google Scholar] [CrossRef]
Mody, S.; Mehta, H.; Mantri, P.; Ali, B.; Khivasara, A. Safety Gear Equipment Detection for Warehouse and Construction Sites Using YOLOv5. Int. Res. J. Eng. Technol. 2021, 9, 3885–3895. [Google Scholar]
Wei, Y.; Liu, X. Dangerous goods detection based on transfer learning in X-ray images. Neural Comput. Appl. 2020, 32, 8711–8724. [Google Scholar] [CrossRef]
Mohapatra, A.; Trinh, T. Early wildfire detection technologies in practice—A review. Sustainability 2022, 14, 12270. [Google Scholar] [CrossRef]
Lee, J.; Jeong, K.; Jung, H. Development of a Forest Fire Detection System Using a Drone-based Convolutional Neural Network Model. Int. J. Fire Sci. Eng. 2023, 37, 30–40. [Google Scholar]
Bouguettaya, A.; Zarzour, H.; Taberkit, A.M.; Kechida, A. A review on early wildfire detection from unmanned aerial vehicles using deep learning-based computer vision algorithms. Signal Process. 2022, 190, 108309. [Google Scholar] [CrossRef]
Gonçalves, L.A.O.; Ghali, R.; Akhloufi, M.A. YOLO-Based Models for Smoke and Wildfire Detection in Ground and Aerial Images. Fire 2024, 7, 140. [Google Scholar] [CrossRef]
Maillard, S.; Khan, M.S.; Cramer, A.; Sancar, E.K. Wildfire and Smoke Detection Using YOLO-NAS. In Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mt Pleasant, MI, USA, 13–14 April 2024; pp. 1–5. [Google Scholar] [CrossRef]
Kwon, H.J.; Lee, B.H.; Jeong, H.Y. A Study on Improving YOLO-Based Object Detection Model Performance for Smoke and Flame Occurring from Various Materials. J. Korean Inst. Electr. Electron. Mater. Eng. 2024, 37, 261–273. [Google Scholar]
Cao, X.; Su, Y.; Geng, X.; Wang, Y. YOLO-SF: YOLO for fire segmentation detection. IEEE Access 2023, 11, 111079–111092. [Google Scholar] [CrossRef]
Zhao, C.; Hu, W.; Meng, D.; Mi, W.; Wang, X.; Wang, J. Full-scale experimental study of the characteristics of electric vehicle fires process and response measures. Case Stud. Therm. Eng. 2024, 53, 103889. [Google Scholar] [CrossRef]
Khan, F.; Xu, Z.; Sun, J.; Khan, F.M.; Ahmed, A.; Zhao, Y. Recent advances in sensors for fire detection. Sensors 2022, 22, 3310. [Google Scholar] [CrossRef]
Brzezinska, D.; Bryant, P. Performance-based analysis in evaluation of safety in car parks under electric vehicle fire conditions. Energies 2022, 15, 649. [Google Scholar] [CrossRef]
Guede-Fernández, F.; Martins, L.; de Almeida, R.V.; Gamboa, H.; Vieira, P. A deep learning based object identification system for forest fire detection. Fire 2021, 4, 75. [Google Scholar] [CrossRef]
Zhang, L.; Li, J.; Zhang, F. An efficient forest fire target detection model based on improved YOLOv5. Fire 2023, 6, 291. [Google Scholar] [CrossRef]
He, Y.; Hu, J.; Zeng, M.; Qian, Y.; Zhang, R. DCGC-YOLO: The Efficient Dual-Channel Bottleneck Structure YOLO Detection Algorithm for Fire Detection. IEEE Access 2024, 12, 65254–65265. [Google Scholar] [CrossRef]
Catargiu, C.; Cleju, N.; Ciocoiu, I.B. A Comparative Performance Evaluation of YOLO-Type Detectors on a New Open Fire and Smoke Dataset. Sensors 2024, 24, 5597. [Google Scholar] [CrossRef]
Wang, D.; Qian, Y.; Lu, J.; Wang, P.; Hu, Z.; Chai, Y. FS-YOLO: Fire-Smoke Detection Based on Improved YOLOv7. Multimed. Syst. 2024, 30, 215–227. [Google Scholar] [CrossRef]
Wang, D.; Qian, Y.; Lu, J.; Wang, P.; Yang, D.; Yan, T. EA-YOLO: Efficient Extraction and Aggregation Mechanism of YOLO for Fire Detection. Multimed. Syst. 2024, 30, 287–299. [Google Scholar] [CrossRef]
Ultralytics. Ultralytics/YOLOv5: V7.0—YOLOv5 SOTA Real-Time Instance Segmentation. Available online: https://github.com/ultralytics/yolov5 (accessed on 28 November 2023).
Ultralytics. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 27 November 2023).
Ultralytics. YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 November 2023).
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Alkhammash, E.H. Multi-Classification Using YOLOv11 and Hybrid YOLO11n-MobileNet Models: A Fire Classes Case Study. Fire 2025, 8, 17. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Roboflow. Smart Polygon Annotation with Roboflow Annotate. Available online: https://docs.roboflow.com/annotate/use-roboflow-annotate/smart-polygon (accessed on 27 November 2023).
Dumitriu, A.; Tatui, F.; Miron, F.; Ionescu, R.T.; Timofte, R. Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1261–1271. [Google Scholar]
Padilla, R.; Netto, S.L.; da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niterói, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar] [CrossRef]

Figure 1. YOLOv5-seg model architecture.

Figure 2. YOLOv8-seg architecture.

Figure 3. YOLOv11-seg architecture.

Figure 4. Mask R-CNN architecture.

Figure 5. Cascade Mask R-CNN architecture.

Figure 6. Examples of data labeling performance. (a) Bounding-box labeling; (b) segmentation labeling.

Figure 7. Performance of electric vehicle fire detection models: comparison analysis process.

Figure 8. Electric vehicle fire image datasets. (a) Electric vehicle fire while driving, (b) electric vehicle fire while charging, (c) electric vehicle fire suppression test I, (d) electric vehicle fire suppression test II.

Figure 9. Segmentation evaluation: IoU concept diagram. (a) IoU (object detection and segmentation accuracy metric); (b) visualization of segmentation performance based on IoU.

Figure 10. Model training and validation results (precision, recall, mAP50, F-1score).

Figure 11. Fire detection model inference results.

Table 1. Dataset characteristics.

Name	Dataset
Object Type	Fire, Smoke
Datasets Format	YOLO, COCO-MMdetection
Division	Train (80%):val(10%):test(10%) = 8:1:1
Quantities	Train (2400):val(300):test(300) = Total(3000)
Scenes	On the road (19), above-ground parking (26), above-ground parking (charging) (4), underground parking (3), underground parking (charging) (1), open lots and others (6)
Vehicle type	Passenger cars (55), buses (4), trucks (1)
Public Dataset	None

Table 2. Hyperparameters and hardware settings of network training.

Hyperparameters		Hardware
YOLOv5-seg, YOLOv8-seg, and YOLOv11-seg
Parameters	Details	Name	Version
Epochs	300	Pytorch	2.2
Batch size	16	CUDA	12.3
Image size (Pixels)	640 × 640	CPU	12 core
Optimizer algorithm	SGD	RAM	96
learning rate	0.01	GPU	NVIDIA V100
Mask R-CNN and Cascade R-CNN
Parameters	Details	Name	Version
Epochs	300	Pytorch	1.11
Batch size	16	CUDA	11.6
Image size (Pixels)	640 × 640	CPU	12 core
Optimizer algorithm	AdamW	RAM	96
learning rate	0.01	GPU	NVIDIA V100

Table 3. Fire detection model evaluation results.

Model	Class	Precision	Recall	F1-Score	mAP50	FPS
Mask R-CNN	Fire	0.418	0.504	0.457	0.771	29.10
	Smoke	0.390	0.483	0.432	0.688
	Total	0.404	0.494	0.445	0.730
Cascade Mask_R-CNN	Fire	0.430	0.506	0.465	0.790	20.10
	Smoke	0.397	0.472	0.431	0.700
	Total	0.414	0.489	0.448	0.745
YOLOv5-Seg	Fire	0.755	0.727	0.741	0.758	67.11
	Smoke	0.795	0.678	0.732	0.713
	Total	0.775	0.703	0.737	0.736
YOLOv8-Seg	Fire	0.836	0.677	0.741	0.757	111.11
	Smoke	0.801	0.689	0.748	0.731
	Total	0.818	0.683	0.744	0.744
YOLOv11-Seg	Fire	0.781	0.702	0.739	0.766	136.99
	Smoke	0.793	0.676	0.730	0.722
	Total	0.787	0.689	0.735	0.744

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kwon, H.; Choi, S.; Woo, W.; Jung, H. Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection. Fire 2025, 8, 66. https://doi.org/10.3390/fire8020066

AMA Style

Kwon H, Choi S, Woo W, Jung H. Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection. Fire. 2025; 8(2):66. https://doi.org/10.3390/fire8020066

Chicago/Turabian Style

Kwon, Heejun, Sugi Choi, Wonmyung Woo, and Haiyoung Jung. 2025. "Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection" Fire 8, no. 2: 66. https://doi.org/10.3390/fire8020066

APA Style

Kwon, H., Choi, S., Woo, W., & Jung, H. (2025). Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection. Fire, 8(2), 66. https://doi.org/10.3390/fire8020066

Article Menu

Evaluating Segmentation-Based Deep Learning Models for Real-Time Electric Vehicle Fire Detection

Abstract

1. Introduction

2. Literature Review

2.1. Existing Fire Detection Research Based on Object Detection

2.2. One-Stage Object Models

2.2.1. YOLOv5-Seg

2.2.2. YOLOv8-Seg

2.2.3. YOLOv11-Seg

2.3. Multi-Stage Object Models

2.3.1. Mask R-CNN

2.3.2. Cascade Mask R-CNN

2.4. Bounding Box and Segmentation Labeling

3. Research Methods

3.1. Production of Datasets

3.2. Data Preprocessing

3.3. Experimental Environment and Parameter Settings

3.4. Model Evaluation Metrics

4. Experimental Analysis

4.1. Training and Validation Results of Electric Vehicle Fire Detection Model

4.2. EV Fire Detection Model Test and Inference Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI