1. Introduction
Since the “12th Five-Year Plan” period, the development and utilization of urban underground space in China have shown a trend of scale and growth, making China a vast country in developing and utilizing urban underground space [
1]. In underground space development, underground commercial streets have rapidly expanded, relying on the vast pedestrian flow brought by rail transit, effectively alleviating urban land pressure, and promoting economic and social development. However, large-scale fires can quickly occur due to the complex internal structure, high personnel density, and flammable materials in underground commercial streets, causing severe economic losses or casualties [
2].
Grasping the correct fire source information can help firefighting and rescue personnel to understand the development of the fire and to make correct firefighting decisions. When a fire occurs in above-ground buildings, the fire source location or fire development can be identified by observing the firelight and smoke outside the building. However, after a fire breaks out in an underground commercial street, the fire scenario cannot be directly observed, resulting in a lack of information during firefighting decision making, leading to incorrect judgments [
3].
In the field of quantitative risk analysis (QRA), the determination of fire source information is crucial [
4]. The identification of fire source information directly impacts the design of evacuation routes, the establishment of smoke propagation models, the optimization of emergency response resource allocation, and aspects of risk prediction and assessment. In the study and practice of fire risk analysis, it is imperative to develop efficient fire source identification technologies.
Many scholars have recently applied machine learning methods to predict fire parameters after a fire occurs. Machine learning is inputting a dataset from numerical simulation or loT devices into a model for training and testing, thus obtaining the coupling relationship of specified parameters in the dataset [
5]. Machine learning algorithms can predict parameters by collecting data on fire parameters, such as temperature, smoke, and gas. Deng et al. [
6] used three parameters to establish a gated recurrent unit (GRU) neural network model to predict the highest temperature of the tunnel ceiling. The results showed that the machine learning algorithm was consistent with the verification experiment. Saeed et al. [
7] established a fire detection convolutional neural network model based on smoke and heat, which can effectively predict fires with an accuracy of 91%. Liu et al. [
8] established a fire detection model based on six machine learning algorithms, such as logistic regression, among which the K-nearest neighbor algorithm demonstrated the best classification performance. Hodges et al. [
9] predicted the temperature distribution in a room based on transposed convolutional neural networks; the prediction accuracy reached 95%. These studies selected appropriate feature parameters based on their predictive objectives and achieved relatively good prediction results.
Regarding fire source determination technologies, Yan et al. [
10] proposed the use of the least squares method based on the Gaussian plume model for fire source localization and the application of the K-means clustering method to reduce localization errors. However, this technique requires the deployment of a large number of gas concentration sensors. Sun [
11] introduced a method for fire source localization using distributed fiber optic temperature sensors, effectively measuring temperature and determining the fire source’s location, yet was unable to ascertain other key parameters like the heat release rate. Chu [
12] et al. developed a fire source localization model based on computer vision, although it is limited to the detection and localization of fire sources. Shen [
13] used thermal flux parameters to infer fire source diameter and heat release rate, however, thermal flux sensors are expensive and prone to failure. Zhang et al. [
14,
15,
16] established a large tunnel fire database, creating a machine learning model that inputs temperature to predict tunnel fire source location, time of danger, and temperature field parameters. This method is cost-effective and relatively precise, yet it does not cover other key parameters of the fire source. However, previous studies have not applied machine learning algorithms that demonstrate good predictive performance to fire source determination in underground commercial streets, which is an area that requires further research.
In the study of information identification of fire source, existing AI fire determination models, such as OpenCV systems [
12], Bayesian machine learning [
13], and neural networks [
16], show unique advantages and limitations compared to traditional fire source determination techniques like wireless sensor networks and distributed fiber optic temperature sensing systems. Traditional fire source determination methods, such as direct physical measurement and real-time monitoring, offer the advantages of accurate measurements and instant data on temperature and fire source location. However, these methods are limited by the spatial coverage of sensors and are costly in terms of maintenance and initial investment. On the other hand, existing AI models for fire source determination excel in handling complex data, automatically selecting influential features, enhancing predictive performance, and adapting to new data, making them suitable for dynamic fire scenarios. Nevertheless, these AI models face challenges in interpretability, especially complex ones like neural networks, and their performance is heavily dependent on the quality and representativeness of the data.
To address the issues of poor interpretability and data dependency in existing AI fire source determination models, this study proposed the establishment of machine learning models with strong interpretability, such as decision tree, random forest, and LightGBM models. To tackle the challenge of data dependency, it suggested creating a more realistic fire database through sampling to determine fire scenarios. Therefore, this study aimed to determine fire source information, like the specific location and heat release rate, via analyzing temperature time series. The study began by selecting fire scenarios using sampling, and simulating them with CFAST 7.7.4. software to build a database for underground commercial street fire scenarios. Subsequently, the obtained temperature was used for feature extraction and processing. Finally, the study developed and applied various machine learning models to accurately determine the fire source in underground commercial streets.
2. The Principle of Machine Learning Models
In machine learning, the objective of fire source classification is to allocate data to predefined fire source categories. The training involves learning the mapping relationship between data features and fire source categories. This paper established three machine learning models: decision tree (DT), random forest (RF), and LightGBM.
2.1. Decision Tree [17]
A decision tree builds a tree by recursively splitting the dataset. Each split is based on features that maximize the purity of fire source determination. The decision tree model in this study utilized the Gini index as the splitting criterion, selecting features and split points that significantly reduced uncertainty after the split.
The
Gini index formula [
18]:
Here, represents the established training set.
is the number of fire source categories.
is the proportion of samples of the -th fire source category in the training set .
For each split in the tree, the algorithm chooses the feature and split point that minimizes the
Gini index of the child nodes. The reduction in the
Gini index for a given node due to a split is defined as:
Here, is the feature considered for splitting.
and are the two subsets of the dataset after the split.
, and are the number of samples in the parent node and the two child nodes, respectively.
2.2. Random Forest [19]
Random forest is an ensemble learning method composed of multiple decision trees. Each tree is built independently, and randomness is introduced in the construction process. This randomness was achieved through Bootstrap sampling of the training data and selecting the best split from a random subset of features at each node. The random forest model can be represented as follows:
Here, refers to the output of the -th decision tree.
refers to the output of the random forest, which was determined via aggregating the predictions of all trees through a voting mechanism for the classification of the fire source.
2.3. LightGBM
LightGBM is a gradient-boosting algorithm that iteratively adds a decision tree to minimize the loss function [
20]. Each new tree in the algorithm was constructed to address the residual errors made by the previous trees in the sequence.
Here, represents the prediction of the model at the -th step,
is the prediction of the new tree added in that step,
is the learning rate.
Distinct from the traditional gradient boosted decision trees (GBDT), LightGBM incorporates two primary technological advancements: histogram optimization and a leaf-wise growth strategy [
21].
Histogram optimization: LightGBM constructs histograms by dispersing the values of continuous features into discrete bins, thereby reducing computational requirements.
Leaf-wise growth strategy: LightGBM opts to grow the leaf to maximize loss reduction, focusing more on minimizing the model’s error.
2.4. Application Examples
For instance, if the input features of the model are denoted as
, and the output fire source classification results are A, B, C, then a simplified decision tree may employ rules of the following form:
If , then: |
If , then: |
Classify as A; |
Else |
Classify as B; |
Else |
Classify as C. |
In this example, and are features, and the thresholds determined how nodes were split. Random forest aggregates the results of multiple decision trees and decides the final classification through voting, while LightGBM iteratively optimizes each decision tree towards an optimal solution.
The model output
is a function of the input vector
, which can be mathematically represented as follows:
where
represents the model function,
is the parameters.
For the three established machine learning models, the input was a feature vector processed from data, and the output was a fire source prediction classification based on the data distribution and structure learned by the model. In practical applications, the implementation and optimization of these models involve more details, including feature selection, model parameter adjustment, and overfitting prevention. Each model provides a fire source classification label for the input feature vector .
4. Machine Learning Model
4.1. Data Preprocessing
Data preprocessing is a pivotal step in artificial intelligence, directly impacting the model’s performance and accuracy. This study primarily employed preprocessing measures such as categorization, segmentation, normalization, and removal of irrelevant data.
4.1.1. Label Categorization
The processed sample data needs to be labeled to train machine learning models more effectively. The current dataset labels were set based on the different fire source positions and HRR in the CFAST simulation. The database established for this study involved categorizing and labeling different types of fire sources.
4.1.2. Segmentation Processing
Selecting a period as the input allows the model to capture and learn the dynamic changes and trends of data over time. This approach is beneficial for identifying the complex nonlinear relationships between temperature and fire source information. When a machine learning model can discriminate temperature curves throughout the fire process, it can obtain more accurate information about the fire source. Although, it will lose the ability to perform in real time. In this paper, the dataset was processed in segments with a selected time interval of 30 s. The obtained data were respectively 30–60 s, 60–90 s, …, 1170–1200 s. After segmenting, 39 samples were obtained for each fire scenario. This study simulated 4800 fire scenarios, resulting in 187,200 samples.
4.1.3. Data Standardization
Normalization of the acquired sample data by converting dimensional expressions into dimensionless expressions, solving the comparability problem of the data.
4.1.4. Deletion of Useless Data
Each CFAST simulation obtained a data curve of 1200 s. As the sensors had a specific activation time, the data obtained during this period did not contribute to the model training. To improve the accuracy and efficiency of the model, the useless data in the first 30 s were removed, and only the data from the 30 s to 1200 s were used.
4.2. Feature Extraction
Feature extraction is a crucial process for obtaining feature vectors of data information. This paper extracted nine manual features based on temperature time series to better describe the information on different fire sources and to achieve optimal classification performance. Each sample had nine temperature curves, resulting in 81 features generated for each sample.
- ①
Maximum (): the highest value in the selected temperature time series.
- ②
Mean (): the arithmetic average of a selected temperature time series, which reflected the average level of a temperature segment.
- ③
Minimum (): the lowest value in the selected temperature time series.
- ④
Standard deviation (
): the arithmetic square root of the arithmetic mean of the squared deviations from the mean of a selected temperature time series, reflecting the degree of temperature dispersion in a period. The formula for calculating standard deviation is as follows:
- ⑤
Mean absolute deviation (
MAD): the average of the absolute deviations of all individual observed values in the selected temperature time series from their arithmetic mean, which avoided the situation where errors in a temperature segment cancelled each other out. The calculation formula is as follows:
- ⑥
Interquartile range (
IQR): the interquartile range (
IQR), which was the difference between the upper quartile (
Q3, located at 75%) and the lower quartile (
Q1, located at 25%) of the selected temperature time series, reflected the dispersion of the middle half of the temperature. The formula for calculating
IQR is as follows:
- ⑦
Coefficient of variation (
c): the ratio of the standard deviation to its corresponding mean in the selected temperature time series, a normalized measure of the temperature dispersion. The calculation formula is as follows:
- ⑧
Skewness (
SK): the ratio of the difference between the mean (
) and median (
) of a selected temperature dataset to its standard deviation, reflecting the degree of skewness of the temperature. The calculation formula is as follows:
- ⑨
Kurtosis (
): the number that reflected the sharpness of the peak of the selected temperature time series at the mean value. The calculation formula is as follows, where
represents the fourth central moment:
4.3. Construction of Fire Source Determination Model
This study used 81 (9 × 9) extracted features from a 30 s temperature time series as the input for the fire source determination model, which outputted the fire source classification results. The obtained samples were randomly shuffled and divided into quantities of 70% for training and 30% for testing. Furthermore, five-fold cross-validation was employed during the training process. Decision tree, random forest, and LightGBM were selected in this study and were individually fine-tuned using random search random searchand Bayesian optimization [
32]. Random parameter tuning involved selecting parameters randomly from a given range of hyperparameters, while Bayesian tuning was an optimization method based on Bayesian probability principles. The tuning results are shown in
Table 3,
Table 4 and
Table 5.
4.4. Evaluation Metrics
In this paper, precision (
P), recall (
R), and F
1-score (
F1) were used as evaluation metrics for the classification model.
Pi represents the proportion of samples predicted as class
i that were actually class
i. In contrast,
Ri represents the ratio of correctly predicted class
i samples to actual class
i samples. The F
1 score was the weighted harmonic mean of precision and recall. Specifically, the formulas for calculating the three metrics are as follows:
In which, TPi (true positive) represents the samples of class i that were correctly predicted as class i; FPi (false positive) represents the samples of other classes that were predicted as class i; FNi (false negative) represents the samples of class i that were predicted as other classes.
In this task, since it involves the classification of multiple categories, macro average metrics needed to be used to evaluate the classification model’s performance from an overall perspective. The specific calculation formula for the macro-average is shown below where
k = 10 is the arithmetic average of accuracy and recall, and F
1 score of each category. Macro-average was commonly used to evaluate a classification model’s performance on multiple datasets.
4.5. Performance Evaluation of the Model
Based on the evaluation metrics, to verify the effectiveness of the three machine learning models established in the task of underground commercial street fire source determination, the experiment used the extracted features of the test set as model inputs and compared the classification performance of decision tree, random forest, and LightGBM models. The comparative experimental results are shown in
Figure 4.
As can be seen from
Figure 4, the LightGBM model achieved the best evaluation metrics, with macro averages of precision, recall, and F
1 score being 99.01%, 98.45%, and 99.04%, respectively. These metrics indicated that the LightGBM model accurately identifies and classified fire sources. A precision rate of 99.01% suggests that the model rarely made false positive predictions, while a recall rate of 98.45% indicated that nearly all actual fire sources were correctly identified, with minimal missed detections. An F
1 score of 99.04% emphasized the model’s excellent balance between precision and recall. These results demonstrated LightGBM’s formidable capability in handling challenging multi-classification tasks, primarily due to the training set’s complex nonlinear relationship between temperature data and fire source information. Compared to the
RF and
DT models, LightGBM’s histogram algorithm and controllable depth leaf-wise growth strategy significantly leveraged its advantages.
Furthermore, the RF model’s evaluation metrics were all higher than the DT model’s, with increases in macro averages of precision, recall, and F1 score by 2.38%, 1.93%, and 2.13%, respectively. This improvement was attributed to the random forest’s ensemble method and its ability to handle high-dimensional data, resulting in a higher prediction accuracy than a single decision tree in complex multi-classification tasks like fire source classification.
In summary, the LightGBM, RF, and DT models exhibited unique strengths. LightGBM exceled in this task, owing to its outstanding class differentiation ability and high-dimensional data processing capability, enabling it to identify and classify complex data patterns effectively. As an ensemble method of decision trees, the random forest also demonstrated excellent performance, particularly in reducing overfitting and handling of high-dimensional data. In contrast, a single decision tree may be less effective in complex classification problems. Therefore, considering the characteristics of fire source classification, LightGBM, and RF models are more suitable for further research and improvement.
4.6. Kappa Coefficient
The kappa coefficient is a statistical method used to evaluate consistency and is commonly used to evaluate multi-class models accurately. The coefficient ranges [−1, 1] but is typically normalized to [0, 1] in practical applications. The higher the coefficient value, the higher the accuracy of the classification achieved by the model. The kappa coefficient is calculated using the following formula:
In which, represents the ratio of the sum of the correctly classified samples in each fire source category to the total number of samples. refers to the probability of the classifier agreeing with the actual labels by chance in a completely random scenario.
The kappa coefficients of the three models are illustrated in
Figure 5. The figure shows that the LightGBM model exhibited the best performance with a kappa value of 98.81%, signifying near-perfect classification performance and demonstrating remarkable consistency. Meanwhile, although the kappa value of the
RF model was slightly lower than that of LightGBM, it still surpassed the
DT model. This advantage was attributed to its random feature selection and multi-tree voting mechanism, which maintained good accuracy.
4.7. Application of Fire Source Determination Technology in Real Fire Situations
Fire source identification is crucial to fire risk assessment and emergency response. In an underground commercial street, the application of artificial intelligence fire source determination technology for fire risk assessment and emergency response in real fire situations can proceed as follows.
- (1)
Real-time fire source identification
① The artificial intelligence model analyses temperature sensor data from the corridors of the underground commercial street to locate fire source information accurately.
② The system automatically triggers a fire alarm and communicates the fire source information to the emergency response center and the building management system.
- (2)
Fire emergency response
Based on fire source information, the emergency response center rapidly deploys firefighting, medical, and rescue teams, ensuring effective response tailored to the specific details of the fire source.
- (3)
Evacuation plan optimization
① The building management system automatically adjusts evacuation instructions based on the specific location of the fire source, guiding personnel through electronic displays or broadcast systems within the commercial street to the safest evacuation routes.
② The monitoring center continuously tracks the evacuation of personnel, ensuring the safe withdrawal of all individuals.
- (4)
Risk assessment and safety strategy
① After the event, using data provided by the artificial intelligence model and records of the fire situation, assess the fire risk of the underground commercial street.
② Based on the risk assessment results, adjust and optimize the underground commercial street’s fire prevention measures, safety system design, and emergency response plans.
- (5)
Continuous monitoring and improvement
① In day-to-day operations, continuously monitor and analyze temperature sensor data to promptly identify potential risks and implement preventive measures.
② Regularly review and update the artificial intelligence model to ensure accuracy and adaptability, thereby better addressing potential fire incidents.
5. Conclusions
This paper established a fire source determination method for underground commercial streets based on temperature and machine learning. It constructed fire source determination models for underground commercial streets using three machine learning algorithms: RF, DT, and LightGBM. The paper calculated the macro averages of precision, recall, and F1 scores for the three models and performed a comparative analysis of their kappa values, leading to the following conclusions:
- (1)
The LightGBM model performed best in determination with its exceptional class differentiation ability and high-dimensional data processing capability. Its macro averages for precision, recall, and F1 score were 99.01%, 98.45%, and 99.04%, and its kappa value was 98.81%.
- (2)
The high determination performance of the three machine learning models indicated that the fire database established through CFAST simulation, based on random sampling for determining fire conditions, was more aligned with the objective laws of the real world.
- (3)
This study’s three machine learning models demonstrated strong classification capabilities and interpretability.
The fire source determination method proposed in this study offers technical support for the management of fire situations in underground commercial streets. In subsequent research, consideration should be given to how artificial intelligence technology can be better applied in fire risk assessment and emergency response. Furthermore, the variety of fire sources and the development of fires in real scenarios are more complex. To enhance the precision and practical value of fire source determination in underground commercial streets, future research should focus on two aspects: firstly, increasing sample data to enable the model to understand new categories better and to capture fire source characteristics, thereby improving determination accuracy; secondly, improving training models, such as adopting more advanced machine learning algorithms, to enhance the model’s generalizability and practical application value.