Next Article in Journal
Dynamic Analysis and Risk Assessment of Vegetation Net Primary Productivity in Xinjiang, China
Previous Article in Journal
Genetic Algorithm Empowering Unsupervised Learning for Optimizing Building Segmentation from Light Detection and Ranging Point Clouds
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of Remote Sensing and Explainable Artificial Intelligence (XAI) for Wildfire Occurrence Mapping in the Mountainous Region of Southwest China

1
Institute of Mountain Hazards and Environment, Chinese Academy of Sciences and Ministry of Water Resources, Chengdu 610041, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Plateau Atmosphere and Environment Key Laboratory of Sichuan Province, College of Atmospheric Science, Chengdu University of Information Technology, Chengdu 610225, China
4
Sichuan Provincial Climate Centre, Sichuan Provincial Meteorological Service, Chengdu 610072, China
5
Wenjiang National Climatic Observatory, Wenjiang District Meteorological Service, Chengdu 611100, China
6
Sichuan Province Meteorological Disaster Defense Technology Center, Sichuan Provincial Meteorological Service, Chengdu 610072, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(19), 3602; https://doi.org/10.3390/rs16193602
Submission received: 6 August 2024 / Revised: 15 September 2024 / Accepted: 24 September 2024 / Published: 27 September 2024

Abstract

:
The ecosystems in the mountainous region of Southwest China are exceptionally fragile and constitute one of the global hotspots for wildfire occurrences. Understanding the complex interactions between wildfires and their environmental and anthropogenic factors is crucial for effective wildfire modeling and management. Despite significant advancements in wildfire modeling using machine learning (ML) methods, their limited explainability remains a barrier to utilizing them for in-depth wildfire analysis. This paper employs Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) models along with the MODIS global fire atlas dataset (2004–2020) to study the influence of meteorological, topographic, vegetation, and human factors on wildfire occurrences in the mountainous region of Southwest China. It also utilizes Shapley Additive exPlanations (SHAP) values, a method within explainable artificial intelligence (XAI), to demonstrate the influence of key controlling factors on the frequency of fire occurrences. The results indicate that wildfires in this region are primarily influenced by meteorological conditions, particularly sunshine duration, relative humidity (seasonal and daily), seasonal precipitation, and daily land surface temperature. Among local variables, altitude, proximity to roads, railways, residential areas, and population density are significant factors. All models demonstrate strong predictive capabilities with AUC values over 0.8 and prediction accuracies ranging from 76.0% to 95.0%. XGBoost outperforms LR and RF in predictive accuracy across all factor groups (climatic, local, and combinations thereof). The inclusion of topographic factors and human activities enhances model optimization to some extent. SHAP results reveal critical features that significantly influence wildfire occurrences, and the thresholds of positive or negative changes, highlighting that relative humidity, rain-free days, and land use land cover changes (LULC) are primary contributors to frequent wildfires in this region. Based on regional differences in wildfire drivers, a wildfire-risk zoning map for the mountainous region of Southwest China is created. Areas identified as high risk are predominantly located in the Northwestern and Southern parts of the study area, particularly in Yanyuan and Miyi, while areas assessed as low risk are mainly distributed in the Northeastern region.

1. Introduction

The mountainous region of Southwest China has been identified as a vital source of water, electrical energy, forests, and various other natural resources [1,2], while also being recognized as a global hotspot and high-risk area for wildfires, with a notable increase in the frequency and intensity of such events in recent years [3,4]. These wildfires pose significant risks to life, property, biodiversity, carbon capture, water and air purification, and other socio-ecological benefits [5,6,7,8,9,10], exacerbated by factors such as climate change-induced warming [11,12,13], extreme fire weather conditions [14,15], increased forest compactness owing to fire control strategies [16], demographic expansion [17], and the spread of human habitation [18]. The mountainous region of Southwest China, including the Minshan, Longmen, and Hengduan Mountain ranges, encounter wildfires significantly impacted by monsoons originating from the Western Pacific and Indian Oceans [19]. The wildfire season coincides with the monsoon’s retreat period (from January to the subsequent May), during which dry Southwestern winds take over [20]. The region’s diverse altitude range, complex terrain, variable climate, and vegetation types contribute to a fire system that varies with altitude [21,22,23]. Fire frequency exhibits an inverse relationship with elevation, suggesting that lower elevations experience fires that are less severe but more frequent, whereas higher elevations are subject to fires that are more intense but occur less frequently. The uneven distribution of fire risk highlights the critical need to comprehend the spatial patterns and factors driving wildfire occurrences [15]. Wildfire management efforts in the mountainous region of Southwest China have focused on strict fire-prevention measures, facing challenges due to limited forestry personnel and financial resources, highlighting the need for consistent and comprehensive management practices [24].
Global wildfire research records indicate that the spatiotemporal distribution of fires at the regional scale is influenced by four major driving factors: climate, topography, vegetation, and human activity [25,26]. Numerous studies have assessed the relationships between fire indicators and various explanatory variables across different spatial scales, including the global scale [26], national scale [27], regional scale [28], and local scale [29], with some studies even analyzing multiple spatial scales [30]. Many of these studies have used statistical methods to explore the relationships between wildfires and their environmental determinants [31,32,33] or human activities [34,35,36,37], particularly in regions such as boreal forests, Mediterranean ecosystems, the Alps, and other areas, typically employing statistical approaches. The relationship between climate and weather characteristics and fire occurrence and burned area has been extensively studied across many regions of the world [38,39]. A significant body of research has revealed that climate, vegetation types, and topography jointly influence wildfire mechanisms [40,41]. Subsequently, Parisien et al. found that human activity variables are equally important in predicting fires [42]. Although numerous studies have described the spatial and temporal fire dynamics in temperate ecosystems and their driving factors [43], few have focused on subtropical monsoon mountain regions, which are among the most vulnerable and fire-prone areas on Earth. Over the past decade, research has primarily concentrated on forest fires in Northern China, while studies on forests in Southwestern China have been fewer and less detailed [8,15,17]. Fire-risk classification and fire estimates in this region often rely on a limited number of variables, with restricted analytical methods. Results from studies across China may not be applicable to this local area [44], as it integrates multiple microclimates and ecosystems within short distances and hosts a large population with diverse socioeconomic activities. Understanding the influence of biophysical factors and human-related variables on wildfire occurrence at a finer spatial scale is critical for developing effective fire management strategies, but this aspect has not yet been comprehensively studied. Furthermore, the biophysical and human-related predictive factors that may lead to spatial variations in fire occurrence probability have not been systematically and thoroughly explored [45].
Remote Sensing (RS) and Geographic Information Systems (GIS) have proven to be effective tools for modeling wildfires over time and space, as well as mapping areas prone to fires [46]. Different methods, categorized into physics-based, statistical, and Machine Learning (ML) approaches, are utilized based on study area specifications and model complexity. Physics-based methods use mathematical equations to predict fire behavior by considering heat transfer, biomass combustion, and fluid dynamics [47,48]. Statistical methodologies, like Multi-Criteria Decision Analysis (MCDA), encompassing techniques such as the Analytical Hierarchy Process (AHP) [49], point pattern analysis [50], and fuzzy logic, are employed to investigate the relationship between fire incidents and influencing factors. Their aim is to evaluate the probability of fires transpiring [51,52]. These methods rely on expert opinions organized into comparison matrices [53]. Expert knowledge plays a crucial role in creating informative comparison matrices for effective analysis [46]. Nevertheless, wildfires represent a typically nonlinear and intricate phenomenon, and conventional methods may not consistently fulfill the requirements of wildfire management. In recent decades, with the development of computer computing power, it has become possible to use machine learning methods to study the relationship between wildfires and driving factors and predict wildfire risk, such as random forest (RF) [54], support vector machines (SVMs) [44], multilayer perceptron neural networks (MLP) [55], and artificial neural network (ANN) [56]. The logistic regression model (LR) and random forest model have good performance in wildfire-risk prediction, but the extreme gradient lifting model proposed in recent years also has good performance in classification [57]. Currently, there exists no consensus regarding the selection of a model for wildfire-risk prediction. No study has definitively established a universal method applicable across all regions and environments. Various models diverge in their ability to predict the probability of wildfire occurrence and map wildfire-risk zones [58,59]. In the practical management of wildfires, the opacity of ML presents a challenge to understanding their decision-making processes [60]. Consequently, the application of Explainable Artificial Intelligence (XAI) is becoming increasingly significant as it aids in elucidating how complex machine learning models make decisions concerning wildfire prediction and management [61,62]. XAI approaches can be categorized into two types: intrinsic explainability, which is built into the model by design, and post-hoc explainability, which is applied after model development [63]. These methods aim to enhance the transparency and comprehensibility of the decision-making processes in AI systems. In the field of wildfire science, the introduction of XAI is still in its early stages. The researches mainly focused on individual countries [64,65], leaving a gap in the application of this method to larger-scale wildfire research. Although SHAP has been widely used to interpret the decision-making processes of machine learning models, this study expands its application to wildfire prediction. Specifically, SHAP is applied to the diverse and complex terrain and climatic conditions of the mountainous regions in Southwest China. By visualizing SHAP values, the study quantifies the impact of interactions among multiple factors on wildfire occurrence, revealing spatial variations in these interactions. This provides a foundation for more accurate and targeted wildfire management strategies.
Motivated by the identified research gap, we employed Explainable Artificial Intelligence (XAI) to interpret a machine learning-based wildfire susceptibility model and to understand fire mechanisms in the mountainous region of Southwest China. To better comprehend the causes of wildfires in this region, we integrated multi-source geospatial data, providing detailed insights at a 1-km grid scale. We utilized three different machine learning algorithms: Logistic Regression (LR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost, XGB), to assess potential driving factors. Our study aimed to determine the relative impact of meteorological elements, vegetation, topography, and human activity on fire occurrences within Southwest China’s mountainous forest landscapes. To address this objective, we employed machine learning algorithms and focused on the following questions: (1) How do climate factors compare to local factors in their relative influence on wildfire occurrence, and what mechanisms drive their respective impacts? (2) Do the impacts of climatic factors (regional scale) and local factors (vegetation and topography at the landscape scale) on fire occurrence differ in Southwest China’s mountains? A comprehensive investigation into wildfire drivers in the mountainous region of Southwest China could enhance the existing understanding of wildfire distribution within the country. This research could also aid local forest agencies in implementing more effective measures for wildfire prevention and suppression.

2. Data and Methodology

2.1. Study Area

The study region is located in the mountainous region of Southwest China, consisting of complex mountainous regions including the Da Xiangling, Xiao Xiangling, and Anning River Valley. These mountains significantly influence the study region’s elevation, bridging the gap between the Qinghai-Tibet Plateau and the Sichuan Basin. Additionally, they obstruct moist air flows from both the north and south, making the study region one of China’s rare arid or semi-arid zones (Figure 1).
The research area is characterized by a highland subtropical semi-arid climate typical of South Asia, along with a subtropical monsoon climate. This climatic condition is influenced by the warm, moist tropical monsoon air masses during summer and the dry, warm continental air in winter, resulting in a period of concurrent rainfall and warmth. During the dry season, the daily minimum relative humidity can drop to around 20%, especially in high-altitude areas and under strong wind conditions. Low humidity causes rapid evaporation of moisture from vegetation and soil, increasing fire risk. Such extremely dry conditions are particularly common during clear weather and under the influence of dry, hot winds. The pronounced variations in altitude and the intricate topographic features of the region lead to a heterogeneous distribution of local climatic conditions. Such climatic and ecological diversity renders the area exceptionally rich in biodiversity. The combination of rich biological diversity, distinctive climatic circumstances, and prevalent local slash-and-burn farming techniques renders the study area among regions most susceptible to wildfires [17]. The moisture’s uneven distribution, with clear dry and wet seasons, further accentuates the seasonal nature of wildfires, which primarily occur from January to May. Hence, this location is an exemplary study area for examining wildfires in the mountainous region of Southwest China.

2.2. Dataset

2.2.1. FIRMS-MODIS and VIIRS Active Fire Products and Pre-Treatment

Wildfire data were sourced from the MODIS and VIIRS active fire location vector products (Shapefile, WGS1984) provided by NASA FIRMS (Fire Information for Resource Management System) (https://firms.modaps.eosdis.nasa.gov/download/ (accessed on 1 January 2024)). Both datasets demonstrate robust consistency in monitoring active fire frequency within the study region, exhibiting high linear correlation and compatibility for concurrent utilization [6]. For this study, fire season data for the years 2004 to 2020 (1 January to 31 May) were extracted from the MOD14A1 satellite fire point interpretation data for the study region in Sichuan Province. Each fire pixel represents an individual fire point, encompassing geographical coordinates and occurrence time information for a total of 15,300 wildfire incidents in both Pan Zhihua and Liangshan Prefecture. In this study, fire points with a confidence level greater than 80% were selected. Given that a single fire incident could consist of multiple fire points, using ArcGIS10.6, fire points within less than 1 km of one another within 24 h were removed. The satellite fire point data were then overlaid with the land use distribution map of Sichuan Province, and non-forest areas were eliminated. This process resulted in a total of 3837 reliable fire points (Figure 1). These points were cross-validated against data from Sichuan Province’s Forest Fire Management Agency, yielding an accuracy rate of 68.3%. The LR model necessitates that the data for the response variable adhere to a binomial distribution. Therefore, in this study, it was necessary to generate a proportionate number of random points (non-fire points). Using ArcGIS10.6, random points were selected at a ratio of 1:1. Generally, the quantity of random points marginally surpasses the fire points, with the spatial and temporal distribution of these random points being entirely arbitrary [66,67]. For this investigation, a total of 3837 non-fire points were chosen.

2.2.2. Data Sources and Extraction of Factors Affecting Wildfires

Based on the literature review and previous research findings, this study selected a total of 29 factors, categorized into three groups: climatic factors, human factors, and vegetation and topographical factors, as potential influencing factors for wildfires in the study region (Table 1).
(1)
Meteorological factors. Meteorological data consists of daily weather data from 21 national-level meteorological stations and 35 regional-level meteorological stations in the study region. Python programming was employed to compute the closest meteorological station to each point (fire and random points) in batch processing. The corresponding meteorological variables extracted included the average number of rainless days during the fire-prevention period, precipitation, average relative humidity, average minimum temperature, average maximum temperature, daily mean temperature, daily maximum temperature, daily minimum temperature, daily accumulated precipitation, daily maximum wind speed, daily average relative humidity, daily minimum relative humidity, daily accumulated evaporation, daily average land surface temperature, daily maximum land surface temperature, and daily minimum land surface temperature. Daily meteorological data primarily originate from the China Meteorological Administration’s Comprehensive Meteorological Information Service System (CIMISS) through the China Meteorological Data Service Platform (CMDSP).
(2)
Human factors. Human factors encompass distances to highways, distances to railways, distances to residential areas, population density, and per capita GDP. Among these, highway, railway, and residential area data are derived from the 1:250,000 infrastructure vector map provided by the China National Administration of Surveying, Mapping, and Geoinformation. The minimal distance linking points to respective features was ascertained through the employment of the nearest neighbor analysis functionality within ArcGIS 10.6. Data on population density and GDP per capita were derived from the 1 km resolution grid data for population and GDP for the years 2000, 2005, 2010, and 2015, obtained from the Resource and Environmental Science Data Center of the Chinese Academy of Sciences (http://www.resdc.cn (accessed on 1 January 2024)). Population and GDP growth rates for each year were calculated based on the National Statistical Yearbook from 2004 to 2020. Using the raster calculator tool in ArcGIS 10.6, raster data for population and GDP were computed for each year. The multivalve extraction tool was subsequently used to obtain population density and per capita GDP corresponding to each point’s respective raster data.
(3)
Topographical and vegetation factors. The terrain-related factors include three key variables: altitude, slope, and aspect. Data on altitude, featuring a 90-m resolution, were sourced from the NASA Shuttle Radar Topography Mission (SRTM), accessible online through SRTM’s website (https://SRTM.csi.cgiar.org). Slope and aspect were generated using the Surface toolbox within ArcGIS 10.6, which utilized the Digital Elevation Model (DEM) dataset. Vegetation-type information originates from the digital map published by the Chinese Academy of Science. This dataset serves as a substitute for ground fuel maps. The Fractional Vegetation Cover (FVC) data were extracted from the Normalized Difference Vegetation Index (NDVI) data, which have a spatial resolution of 1 km. These data were made available through the International Scientific & Technological Cooperation Program in conjunction with the Chinese Academy of Sciences. These were calculated using the pixel binary model [68]. The formula is as follows:
F V C = N D V I N D V I s o i l N D V I v e g N D V I s o i l
N D V I s o i l assigned to areas of bare soil or zones devoid of vegetation coverage. N D V I v e g values for pixels that are fully enveloped by vegetation. This study selected a threshold of 95% of pixels as pure vegetation pixels and a threshold of 5% of pixels as pure soil pixels.
The multivalued extraction tool in ArcGIS 10.6 was employed to extract elevation, slope, aspect, vegetation type, and vegetation coverage corresponding to each point.

2.3. Methodology

2.3.1. Machine Learning Modeling

(1)
Logistic Regression model (LR)
The Logistic Regression (LR) technique is a core predictive model for wildfires, renowned for its high precision in forecasting fire incidents. The LR model requires the response variable data to follow a binomial distribution (1 or 0, where 1 represents the occurrence of a wildfire and 0 represents no occurrence). These fire and random points were combined for model fitting, and the model expression is as follows:
I n P 1 P = β 0 + β 1 x 1 + β 2 x 2 + + β m x m
Following the Logit transformation, the formula for assessing the probability of wildfire occurrence is derived as follows:
P = e β 0 + β 1 x 1 + β 2 x 2 + + β m x m 1 + e β 0 + β 1 x 1 + β 2 x 2 + + β m x m
In the equation, P symbolizes the likelihood of a wildfire happening, while m signifies the count of factors driving wildfire occurrences. β represents the regression coefficients of the wildfire driver factors, and x represents the wildfire driver factors.
(2)
Random Forest model (RF)
Random Forest (RF) is a method of ensemble learning that employs the Classification and Regression Tree (CART) methodology. It integrates numerous classification trees, each constructed from bootstrap samples, to enhance prediction accuracy and robustness [69,70]. RF predicts the final output by taking a vote from all decision trees. The class with the majority of votes is selected as the prediction. The formula is:
y ^ = m o d e ( T 1 x , T 2 x , T B x )
where y ^ is the final predicted class. T i x represents the prediction from the i-the decision tree for input x. B is the total number of decision trees. mode takes the majority vote across all trees. The RF model constructs itself through a series of iterations, each creating decision trees from randomly selected subsets of the input data. In every iteration, samples are split into training and validation groups, employing cross-validation to curb overfitting and assess model inaccuracies [71].
(3)
eXtreme Gradient Boosting Model (XGBoost)
XGBoost, short for eXtreme Gradient Boosting, utilizes boosting, a technique that merges numerous decision trees to generate a conclusive prediction, as its ensemble learning approach [72]. XGBoost aims to combat overfitting while maintaining computational efficiency, distinguishing itself from gradient-boosting machines. It constructs multiple trees sequentially, each aimed at minimizing the errors of its predecessors. During training, new trees are incrementally added to forecast the errors of prior trees, iteratively continuing until a stopping criterion is reached. The final prediction aggregates the predictions from all trees. The prediction formula at step t and grid location i can be expressed as follows:
y ^ i t = k = 1 t f k x i = y ^ i ( t 1 ) + f t ( x i )
where f i x i is the tree model at step t , y ^ i t and y ^ i ( t 1 ) are the predictions at steps t and t 1 , and x i are the predictor variables. The model’s parameters, denoted as f t ( x i ) , are determined by optimizing the objective function, which gauges the model’s adequacy in fitting the training data:
O b j t = i = 1 n L t + Ω t
The first part of the formula is the loss function, which is used to describe the errors between the predicted value and the actual value. The second part is the standard term, which can effectively control the complexity of the model and establish a tree structure model to prevent overfitting [73].
Table 2 presents the advantages, disadvantages, and usage conditions of the three main wildfire occurrence probability prediction models.

2.3.2. Selection of Wildfire Driver Factors

(1)
Multicollinearity Test
If there is multicollinearity among variables, it can affect both significances testing and the predictive capability of the model. Therefore, before conducting multivariable model fitting, a test for multicollinearity is performed to eliminate variables with significant collinearity. In this study, the Variance Inflation Factor (VIF) is used to assess the presence of multicollinearity among predictor variables. Generally, a VIF value exceeding 10 is considered an indication of collinearity among variables. The diagnostic results indicate that the VIF values for the average precipitation during the fire-prevention period, daily mean, maximum, and minimum temperatures, as well as daily average, maximum, and minimum land surface temperatures, are 15.8, 64.18, 18.98, 27.7, 37.15, 12.95, and 16.13, respectively. Based on these findings, the stepwise regression method is used to select the influential factors for wildfires. We progressively eliminate weaker correlated factors based on the significance level of their relationship with the probability of wildfire occurrence, until all variables are significant.
(2)
Determination of Wildfire Driving Factors
LR, RF, and XGB were employed to evaluate the influence of meteorological and local factors on fire occurrence. Subsequently, the significant factors identified in each analysis were amalgamated and scrutinized to ascertain their relative impacts on fire incidence using all statistical models. Based on the analysis, the significant factors that emerged as significant variables for meteorological or local factors were considered as comprehensive factors, and their impact on fire occurrence was determined using the three methods. To eliminate the impact of random sample selection on the selection of wildfire driver factors and ensure the accuracy of their selection, the fire-prevention period data was randomly divided into 80% as the modeling sample and 20% as the independent test sample.
(3)
Variable Importance Measures of Factors
To reduce the interference of the distribution of modeling samples on the selection of wildfire influencing factors, this study randomly divided the modeling samples into 70% as the model training sample and 30% as the training validation sample, repeating this process five times for random divisions. This process resulted in five random data sub-samples, each comprising training and validation datasets. Subsequently, LR was applied to the training data of each sub-sample, generating five intermediate models. To assess the validity of these intermediate models, their corresponding validation samples were utilized for testing each model. The final LR model incorporated at least three significant variables identified from the five intermediate models. This final LR model was subsequently applied to the complete dataset for comprehensive analysis. Since R and XGB were produced through bootstrapping, partitioning the entire dataset into training and validation subsets isn’t necessary. Nevertheless, to align with the LR procedure, RF and XGB were conducted on the training dataset for each sub-sample. These models were also trained on each of the subsamples, and variable importance was assessed using Out-of-Bag (OOB) error rates for RF and XGB. Analogous to the LR model, the final RF and XGB models integrated at least three noteworthy variables identified in no fewer than three of the five intermediate models. These models were combined with the complete dataset, yielding five distinct training models employed to identify stable factors influencing wildfires.
(4)
Assessment of Relative Importance of Driver Factors
Determining the degree of influence of wildfire driver factors on fire occurrence is crucial for fire prevention and monitoring. To identify the relative importance of factors, we use the SHapley Additive exPlanations (SHAP). SHAP is a method for explaining the predictions of machine learning models. It is based on the concept of Shapley values from cooperative game theory and is used to determine the contribution of each feature to the model’s predictions [74]. SHAP values provide an intuitive way to explain individual prediction outcomes, showing the influence of each feature on that prediction. By analyzing SHAP values, we can gain a better understanding of how the model arrives at specific predictions, thus enhancing the trust and interpretability of the model. For one predictor variable, the SHAP value considers the difference in the model’s predictions f x made by including and excluding the predictor i for all the combinations of predictors:
i = S F \ i S ! F S 1 ! F ! [ f x S i f x S ]
where i is the weighted average of all marginal contribution of predictor i , F is the total number of features, S is the subset of predictors from all predictors except for predictor i , and S ! F S 1 ! F ! is the weighting factor counting the number of permutations of the subset S . f x S is the expected output given the predictors subset S . [ f x S i f x S ] is the difference made by predictor i .
(5)
Validation of Wildfire Prediction Models
This study evaluated model performance using accuracy, recall rate, and Receiver Operating Characteristic (ROC) curves. These measures were used to compare LR, RF, and XGB models based on meteorological, local, and comprehensive factors. Accuracy reflects the proportion of correctly predicted instances, while recall rate indicates the ratio of correctly classified positive instances. The ROC curve, unaffected by threshold changes, gauges model generalization, with the area under the curve (AUC) quantifying predictive ability [67,75]. AUC values range from 0.5 to 1, with AUC ≥ 0.8 indicating good predictive ability. Cross-validation was also employed to ensure model accuracy, stability, and prevent overfitting.

3. Results and Discussion

3.1. Selection of Wildfire Driving Factors

3.1.1. Selection of Meteorological Drivers

In this study, the VIF analysis was utilized to evaluate multicollinearity among the predictor variables within the model. Subsequently, those variables exhibiting notable collinearity were systematically eliminated. After the examination, 26 factors, including vegetation coverage and daily cumulative precipitation, met the requirements (excluding daily mean temperature and average land surface temperature) and entered the model fitting stage. Among the 15 meteorological factors, it was found that factors such as minimum relative humidity, sunshine hours, daily average relative humidity, fire-prevention period without precipitation days, daily average land surface maximum and minimum temperatures, fire-prevention period average precipitation, relative humidity, and maximum daily temperature were significant in three or more intermediate models, qualifying them for inclusion in the full-sample fitting (Table 3). Based on the parameters resulting from the model fitting using the complete data set and these nine factors, it was observed that maximum daily temperature and sunshine hours were positively correlated with wildfire occurrence, while the remaining factors exhibited a negative correlation with wildfire occurrence.
According to the principle of minimum OOB error, variable importance graphs were obtained for the five subsets. The research applied RF and XGB models, incorporating a minimum of three significant variables identified from the five intermediate models, to analyze the full dataset comprehensively. Figure 2 illustrates the paramount meteorological variables within the final model, arranged in descending importance, encompassing a total of 15 factors. According to Figure 2, the impact of daily meteorological factors on fire occurrence is greater than that of average climate factors during the fire season. Based on the results of the feature selection algorithm, only the top nine variables with the highest average importance (with importance exceeding 5%) were selected to participate in the modeling using the RF algorithm. These variables include minimum daily relative humidity, sunshine hours, daily average relative humidity, fire-prevention period without precipitation days, daily average land surface maximum and minimum temperatures, fire-prevention period average maximum temperature, precipitation, and relative humidity. These variables are consistent with the important meteorological factors identified in the LR experiments. From the figures, it is evident that variables such as Da_minRH, SSH, and Da_RH significantly affect the drying of vegetation and the ground, thereby increasing fire risk. GST_max and Tmax_avg further exacerbate fire occurrence by accelerating moisture evaporation. The importance of meteorological factors lies in their direct influence on moisture content in vegetation and the ground [76]. Although variables such as the number of days without rainfall and maximum temperature rank lower, they are crucial in prolonging dry conditions and increasing fire persistence [15].

3.1.2. Selection of Local Influence Factors

Table 4 utilizes the identified local variables for parameter estimation in the final LR model. In the ultimate model, out of the 11 variables, seven were found to be significant, including population density, vegetation coverage, elevation, slope, distance to roads, land use type, and distance to residential areas.
Employing the principle of minimizing OOB error, variable importance graphs were derived for the five subsets, subsequently adjusted for application to the complete dataset via the RF and XGB models. In the culminating model, variables identified as most critical based on feature importance were organized in a descending sequence (Figure 3). These variables include land use type, distance to railways, elevation, distance to residential areas, slope, distance to roads, slope direction, GDP, and population density. Notably, there are certain differences in significant local factors between the LR, RF, and XGB models.

3.2. Wildfire Prediction Model and Risk Zoning

3.2.1. Estimation and Identification of Model Parameters

Based on the above information, we can conclude that different models select key factors from candidate meteorological and local elements based on factor correlations and feature importance. The LR model factors are chosen using variance inflation coefficients. For the RF and XGB, variables with feature importance greater than 5% are selected. Comprehensive factors (determined as significant variables when analyzing both meteorological and local factors) were used to fit LR, RF, and XGB models. Taking wildfire occurrence as the dependent variable and 16 fire-risk factors as independent variables, the enter method was applied for LR analysis with a 95% confidence interval. At a significance level of 0.05, the Sig. values for each influencing factor were all less than 0.05, confirmed by the Wald test. The coefficients for each factor are shown in Table 5. From Table 5, it can be observed that coefficients for variables such as daily maximum temperature, sunshine hours, distance to residential areas, distance to roads, slope, and vegetation coverage are positive, indicating a positive correlation with fire-risk probability. An increase in the values of these variables corresponds to an increase in fire-risk probability. Among them, population density and minimum relative humidity have the greatest impact on fire occurrence, with negative coefficients, implying a negative correlation with fire-risk probability. The resulting model is as follows.
P = e ( 0.51 0.8 x 1 0.81 x 2 0.4 x 3 + 0.38 x 4 0.53 x 5 1.07 x 6 0.66 x 7 0.42 x 8 + 0.60 x 9 + 0.15 x 10 + 0.04 x 11 + 0.27 x 12 0.19 x 13 1.23 x 14 + 0.34 x 15 0.17 x 16 ) 1 + e ( 0.51 0.8 x 1 0.81 x 2 0.4 x 3 + 0.38 x 4 0.53 x 5 1.07 x 6 0.66 x 7 0.42 x 8 + 0.60 x 9 + 0.15 x 10 + 0.04 x 11 + 0.27 x 12 0.19 x 13 1.23 x 14 + 0.34 x 15 0.17 x 16 )
Our analysis delves into the essential roles of variables in predicting wildfire occurrences, employing mean absolute SHAP values for elucidation, presented in Figure 4a. This figure ranks the top 20 variables critical to the model’s efficacy, indicating that higher mean absolute SHAP values correlate with a variable’s stronger impact on predictive precision. Primarily, variables such as LULC coordinates, daily minimum and mean relative humidity, the duration of rain-free days, and the minimum surface temperature rank prominently, underscoring the influence of land use and local climate on wildfire probabilities. Figure 4b offers a concise summary of how predictor variables influence wildfire occurrence rates. These plots show each variable’s impact on wildfire frequency on the x-axis, with positive values indicating an increased risk and negative values indicating a decreased risk. Color gradients on the plots indicate variable values from low (blue) to high (red). The analysis reveals that certain weather conditions, like longer sunshine hours and warmer temperatures, increase wildfire risks, whereas higher humidity and precipitation lower them. Topographical factors show that higher elevations and steeper slopes generally lead to a reduced wildfire risk, stabilizing at certain levels. Vegetation coverage significantly influences wildfire occurrences, with denser vegetation indicating higher risk. Human factors like closer proximity to roads and settlements increase wildfire likelihood, but areas with higher population density show a lower risk of wildfires, suggesting that more populated regions have lower wildfire probabilities, as inferred from Table 5.
Besides variables related to land use, local meteorological conditions have emerged as primary determinants in the incidence of wildfires. Variables positioned below the daily maximum wind speed (Da_maxwind) in ranking contribute minimally to wildfires (~<0.05), highlighting the complex nature of wildfires, which are influenced by a multitude of factors instead of being dictated by just one or two key elements. It has been found that these interrelated meteorological variables are the main driving factors for wildfires in the study region [15,40]. We will delve into the interplay among key variables and their influence on wildfire frequency, further informed by the insights from Figure 5.
Studies have shown that high temperatures and low humidity are the main reasons for frequent wildfires in southern Sichuan [8,15] and Yunnan [6] during fire-prevention periods. Therefore, SHAP dependence plots were used to analyze the relationship between two key climatic factors (Da_minRH and Norainday_avg) and wildfire frequency. Figure 5a shows that as Da_minRH increases, the SHAP value decreases, indicating that low humidity (<20%) favors wildfire occurrence. Figure 5b demonstrates that when the average number of rainless days during the fire season (Norainday_avg) exceeds 20 days, the SHAP value turns positive and stabilizes around 23 days, suggesting that longer periods without rain promote wildfires. Figure 5c,d, respectively, displays the SHAP dependence plots of Da_minRH and Norainday_avg with the average maximum temperature (Tmax_avg). When Da_minRH is below 20%, the associated SHAP values tend to be positive, and Tmax_avg typically ranges from 22 °C to 28 °C. Therefore, a low Da_minRH strongly indicates favorable conditions for wildfires under the influence of temperature. Figure 5d shows the relationship between Norainday_avg and SHAP values colored by Tmax_avg. The positive contribution of Norainday_avg stabilizes after exceeding 23 days, and the increase in Norainday_avg and its contribution to wildfire frequency is related to an increase in Tmax_avg. This suggests that high-temperature periods can lead to a longer Norainday_avg, thus drying the fuel and increasing flammability. Therefore, the combined effect of low humidity, high temperature, and extended rainless periods increases the likelihood of wildfire occurrence.
Figure 6a presents a heatmap analysis using SHAP values to explain the model, visualizing the importance and directional impact of different features across various samples. Comparing Figure 4a and Figure 6a, the importance of variables is represented by the absolute average of SHAP values across all available points, yet the significance of these variables can vary by region and time [58]. Figure 6b indicates that the Norainday_avg, SSH, Da_maxwind, Da_Tmax, Tmin_avg, and Aspect have a positive impact on the occurrence frequency. The association between Tmax_avg, Da_EVP, Dis_railway, GST_avg, Da_Tave, RH0_avg, and the frequency of wildfires demonstrates a more nonlinear pattern. The non-linear relationship between wildfire occurrence and predictive factors is dominated by other variables [40]. The SHAP interaction plot further demonstrates the contribution of interactions between several key features to the model output (Figure 6b). The vertical axis displays the primary features, while the horizontal axis shows the SHAP interaction values. The interactions between each feature and other features are illustrated in a scatter plot format. The scatter points transition from blue to red, indicating the change in feature values from low to high. For instance, in the interaction between Da_minRH and GST_min, as the GST_min values transition from low (blue) to high (red), the SHAP interaction values shift significantly. This indicates that higher GST_min values (red) correspond to higher SHAP interaction values, suggesting that the combination of these two features substantially contributes to the wildfire prediction model. In summary, there are significant differences in the impact of interactions between different features on forest wildfire occurrence. Specifically, under conditions of low humidity or prolonged periods without rain, the influence of other factors on wildfires becomes more pronounced.

3.2.2. Likelihood of Fire Occurrence

The maps produced under identical conditions using LR, RF, and XGB show consistent analysis parameters (meteorological, local, and comprehensive factors) across Figure 6, Figure 7 and Figure 8. Generally, RF and LR estimate a lower fire-occurrence probability under the same conditions compared to XGB. Specifically, under meteorological factors (Figure 7), RF predicts a high-risk fire occurrence (probability greater than 0.60) at 58%, which is below LR at 66% and XGB at 63%. For local factors, XGB projects high-risk areas at 65% (Figure 8). When considering comprehensive factors, the predicted high-risk areas by RF, LR, and XGB are 59%, 64%, and 65%, respectively (Figure 9). Additionally, the spatial analysis of fire probability from meteorological and comprehensive factors shows high-risk areas predominantly in the northwest and south of the region. In contrast, the map based on local factors reveals a slightly smaller fire hazard area, mainly focused on the Ssouthwest.

3.2.3. Evaluation of Predictive Performance and Activation Ability

Table 6 reveals that among the machine learning models assessed, the XGB model outperforms LR and RF in terms of accuracy and recall. This advantage primarily stems from how RF and XGB handle the lagged effects of temperature, unlike LR, which omits the average maximum temperature during the fire-prevention period from its training due to variable collinearity. Consequently, XGB leads in predictive effectiveness, followed by RF. Since metrics like accuracy and recall are influenced by classification thresholds, additional model assessment is undertaken through ROC curve analysis. This analysis uses fire occurrence (0/1) as the state variable and fire-risk probability as the test variable, yielding AUC values of 0.860 for LR, 0.944 for RF, and 0.952 for XGB, which indicate well-fitting models. As illustrated in Figure 10, all three models perform commendably with AUC values above 0.8, with XGB ranking highest in predictive precision. The diminished effectiveness of the LR model is linked to lesser linearity between predictor datasets and wildfire occurrences, making it less effective than RF and XGB. The superior generalization of the XGB model over LR and RF is notable, though it is important to recognize that a high predicted wildfire probability by the RF model does not always correlate with actual fires, as the training and validation datasets include non-fire instances rated as high risk. Despite this, the high accuracy of the XGB model in predicting real fire events highlights its utility in forecasting wildfire probabilities, demonstrating robust predictive capabilities [6].
From Table 7, it can be seen that 10-fold cross-validation was used to evaluate the robustness of three machine learning models. Accuracy was used as the evaluation criterion during cross-validation. The coefficient of variation of accuracy for all three models was less than 0.05, indicating that the models performed consistently across different data splits, demonstrating high stability and reliability. Random Forest and XGBoost models showed significantly higher average accuracy than the Logistic Regression model. The coefficient of variation for XGBoost was the lowest, suggesting its performance was the most stable across folds.
Comparative analysis of error and residual distributions among the three models (Figure 11) reveal that the Logistic Regression model exhibited higher mean absolute error, mean absolute percentage error, and root mean square error, indicating weaker predictive capability. In contrast, the Random Forest and XGBoost models performed better on these metrics, with XGBoost showing superior handling of larger errors and the closest approximation to a normal distribution of residuals, suitable for predictions on complex datasets.

3.2.4. Using the Optimal Model for Risk Zoning Prediction

Based on the calculations from the XGB models, a kriging interpolation analysis was conducted on the spatial probability distribution of wildfire occurrences in the study region. Since the study area falls under a large-scale macro-prediction, to further validate the accuracy of the models, the risk probabilities of each fire point in the validation dataset were computed using the established models. Referring to the domestic and international five-level classification standards for wildfire-risk probability, namely Extremely High Risk (p > 0.8), High Risk (0.6 < p ≤ 0.8), Moderate Risk (0.4 < p ≤ 0.6), Low Risk (0.2 < p ≤ 0.4), and Extremely Low Risk (p ≤ 0.2), the probability intervals were equally divided into five fire-risk levels based on previous research. The percentages of fire points falling into each risk level were statistically analyzed in the validation dataset. Approximately 75% of fire points fell into the Extremely High Risk, High Risk, and Moderate Risk levels, with the highest concentration in the Extremely High-Risk level at 29%. Fire points in the Low Risk and Extremely Low-Risk categories accounted for only 25%. This indicates the XGB model’s strong fitting capability. The overall distribution of wildfire occurrence probabilities in the study region showed a trend of being lower in the Northeast and higher in the Southwest (Figure 12). From the division of risk levels based on probability intervals, it can be observed that about 29% of the study region falls into the (Extremely) High-Risk category, primarily concentrated in the Northwest and Southern regions. Notably, the Yanyuan and Miyi areas have a relatively concentrated distribution in the (Extremely) High-Risk category. Additionally, regions such as Huili, Huidong, and Yanbian also require attention in terms of wildfire prevention. In wildfire-prevention efforts in the study region, stronger forest management and allocation of fire-prevention resources should be emphasized in these areas to reduce the occurrence of wildfires. (Extremely) Low-Risk areas account for 25% and are mainly distributed throughout the entire Yuxi, Meigu, Ganluo, Leibo, and Jinyang forest regions in the northeast of the study region, where the probability of wildfire occurrence is relatively low.

4. Discussion

4.1. Factors Influencing the Occurrence of Fires

Wildfires are influenced by a combination of factors that affect their expression in the landscape, with these factors varying across both time and spatial scales and encompassing physical and biological characteristics. Climate is typically regarded as a factor on a regional scale, while vegetation and topography is viewed as local factors [77]. Nonetheless, key factors contributing to this issue include the unpredictable nature of fires and the intricate web of causes behind them. This is largely because such incidents often stem from human actions, and identical factors may have varying degrees of impact across diverse environmental contexts [78].
Building upon this foundation, the meteorological factors selected in this study include daily weather variables and climate mean state variables representing seasonal climate characteristics during fire seasons. Analysis of feature importance rankings across different models reveals that variables representing climate mean states in the LR model (e.g., consecutive dry days and average precipitation during fire seasons) have a more significant impact on wildfire-occurrence probabilities. In contrast, weather variables (e.g., daily minimum relative humidity, sunshine hours, surface temperature, and humidity) in RF and XGB models show a more pronounced influence on wildfire-occurrence probabilities. Further integrating existing research findings, climate factors representing long-term average values can influence the potential wildfire environment by affecting vegetation moisture content and distribution [79]. Weather conditions typically affect ignition likelihood and fire-spread speed, thereby directly influencing wildfire occurrence. Thus, climate has a greater impact on potential wildfire conditions and long-term trends, whereas weather has a more direct impact on actual fire spread. The combined influence of these factors shapes the complexity and variability of wildfires. The comprehensive results of LR, RF, and XGB models indicate that wildfire frequency and occurrence rates in Southwestern China’s mountainous region show less dependence on these climate variables, suggesting that wildfire activity may be more influenced by short-term weather patterns. With climate warming, extreme weather events are expected to become more frequent, amplifying the influence of short-term weather variables on future potential wildfire environments. Consequently, spatial heterogeneity of wildfires in Southwestern China’s mountainous regions is anticipated to face greater uncertainty.
The LR, RF, and XGB models yielded comparable results when analyzing local factors. Key drivers identified across all models for local wildfire occurrences included altitude, population density, proximity to railways, vegetation coverage, and distance from the nearest residential area. The strong explanatory capabilities of these models highlight the critical role of including both anthropogenic and biophysical factors in assessing and mapping wildfire risks.
Historically, wildfire behavior has been predominantly viewed as a phenomenon primarily influenced by physical factors, a perspective supported by the classical Fire Environment Triangle theory, which conceptualizes wildfires as functions of weather, fuel, and topography [80]. However, we must also consider the impact of human actions on modern wildfire management systems to meet the demands of wildfire management [81]. Local conditions, particularly in landscapes significantly influenced by human activities, can modify the overarching effects of climate [82,83,84]. Human actions can have a direct impact on fire by either initiating or suppressing it [85] and can also indirectly influence the frequency of fires by altering the vegetation’s distribution and makeup across different terrains [86,87]. In most research areas in Europe and the United States, population density is typically positively correlated with wildfire occurrence [88,89,90]. Human ignition is the primary cause of wildfires on the mountainous region of Southwest China [6,8]. In this research, population density appears to have a negative correlation with wildfire occurrences. This may be attributed to the fact that in the mountainous region of Southwest China, areas with high population densities are typically urban or developed, hosting advanced industries but possessing minimal forest cover and fewer people involved in forestry-related activities. Additionally, transport routes have been recognized as a significant factor contributing to wildfires [91]. In this study, the distance from roads and the nearest residential areas was determined as positive driving factors. It is worth noting that although roads are important in localized-scale ignition modeling, detecting their impact on wildfire ignition can be challenging at an aggregated county level because they are narrow linear features [45].
In summary, although climate factors play a key role in wildfire occurrences, it is not sufficient to apply climate factors alone to the entire analysis. This is because the inclusion of various scales of wildfires under multiple land-ownership categories in the study area, and the historical fire patterns in lower-altitude areas, are inconsistent with those in mountainous forests [92]. Environmental and societal conditions vary across different regions, and the wildfire process is controlled by a range of factors, with different variables holding significance at various scales [67]. The high correlation between local factors, climate factors, and wildfires in diverse regions highlights that human influence is increasingly superimposed on the biophysical template. However, managers must consider the interactions between ecological zones and LULC types at the landscape scale when making management decisions. Additionally, identifying non-linear human-induced relationships, such as thresholds, will be crucial to understanding how wildfire risk is distributed across the entire landscape.

4.2. Model Comparison and Impact

The XGB model consistently outperformed the LR and RF models in terms of prediction accuracy across all factor groups (climatic, local, and composite). Thus, XGB is potentially more effective for simulating local fire occurrences. However, local factors alone do not provide adequate explanatory power for fire occurrence in any of these models, although certain local factors significantly impact fire likelihood. Our model includes both climatic and human-related variables, which are likely to evolve due to climate change and human development. Such changes will inevitably influence future wildfire patterns in the study area. If future projections of these variables are available, they could enhance the models’ ability to discern spatial wildfire patterns.

4.3. Limitations

Limitations of model predictions mainly include data limitations, model constraints, and regional specificity. Firstly, the revisit cycles of MODIS and VIIRS satellites are 6 h and 12 h, respectively, which may result in some fires being under-detected due to temporal factors, cloud cover, smoke, or vegetation [93]. Additionally, the fire-detection algorithm of MODIS is based on thermal anomalies in the infrared spectrum, making it sensitive to fires, particularly under dry conditions [94]. However, it may misclassify high-temperature areas such as bare ground or industrial heat sources as fires [95]. Although MODIS fire point data are highly accurate, the difficulty in distinguishing wildfires from other types of burning (e.g., agricultural residue burning) can lead to misclassifications that affect model predictions [57]. Atmospheric conditions further affect its detection capability. Cold and dry weather in winter can enhance detection sensitivity, while increased cloud cover and humidity in spring and summer may hinder observations [95]. Moreover, the temporal resolution of data may not capture all short-term weather changes, which are crucial for understanding wildfire dynamics. Future research should integrate ground-based observations or higher-resolution satellite data to reduce the risk of missing fire detections. Secondly, models like LR, RF, and XGB simplify the complex processes of wildfire occurrence, providing valuable insights but possibly failing to fully capture all interactions and feedback mechanisms among different factors [8]. Models are based on historical data and assumptions that may not hold in future scenarios; changes in climate patterns and human activities could alter factors influencing wildfires, leading to potential discrepancies between model predictions and actual outcomes. Finally, the study focuses on the Southwestern mountainous region of China, where specific local factors such as topography and land use play significant roles, findings that may not generalize to regions with different environmental and social conditions. Additionally, seasonal human activities and cultural practices (e.g., festival fireworks, agricultural burning) introduce variability that is challenging to quantify and accurately model.
The study faces several limitations. Primarily, the vegetation variables, focused on live fuel load, do not adequately capture the impact of dead fuel on wildfire risks [96]. Additionally, we overlooked seasonal human activities and customs, such as farming practices and festival-related fireworks, which introduce significant variability [97]. Another regional specificity is straw burning, common in the study area [59], which satellite monitoring struggles to differentiate from wildfires. To mitigate misclassification, we set a high confidence threshold (>80%) for wildfire identification and excluded areas known for straw burning. Despite these measures, some misclassified incidents likely remained in our sample, affecting the results.

5. Conclusions

In this study, we employed the LR, RF, and XGB models to determine the relative impacts of climate and local factors on fire occurrence. The results consistently highlight that climate factors, especially sunshine hours, relative humidity (seasonal and daily), precipitation (seasonal), and surface temperature (daily), are the dominant drivers of wildfire risk in the study region. These findings suggest that variations in these factors can significantly increase the likelihood of fire, particularly in areas with prolonged dry spells and high temperatures. Additionally, local factors such as elevation, proximity to roads, and population density also contribute to fire occurrence, underscoring the importance of both natural and anthropogenic influences.
Among the models tested, XGB demonstrated superior predictive capabilities, indicating its potential as a reliable tool for forecasting wildfires. This enhanced performance could be attributed to XGB’s ability to capture complex nonlinear interactions among variables, making it particularly useful in regions with diverse topography and environmental conditions, like the study area.
The spatial distribution of fire risk identified through the XGB model further reveals critical hotspots, with approximately 29% of the region classified as high or extremely high risk. These areas, notably in Yanyuan, Miyi, Huili, and Huidong, should be prioritized in wildfire prevention and management strategies. Conversely, the Northeastern parts of the region exhibit relatively lower fire risks, which may suggest different management approaches could be employed in these areas.
These results offer a comprehensive understanding of how both climate and local factors interact to influence wildfire risks, enabling more targeted and region-specific prevention efforts. Future research could focus on incorporating real-time monitoring and early warning systems, using the identified key variables to mitigate fire risks more effectively.

Author Contributions

Conceptualization: J.L. and Y.W., methodology: J.L., S.W. and Y.L. (Yafeng Lu), formal analysis: J.L. and Y.W., investigation: J.L., S.W. and Y.L. (Yafeng Lu), resources: J.L. and Y.L. (Yu Luo), data curation: J.L. and Y.S., writing—original draft preparation: J.L. and P.Z., writing—review and editing: J.L. and Y.L. (Yu Luo), visualization: J.L. and P.Z., supervision: Y.W. and Y.L. (Yafeng Lu), project administration: J.L. and Y.W., funding acquisition: J.L. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (42205195), the Development Project of Plateau Atmosphere and Environment Key Laboratory of Sichuan Province (PAEKL-2022-618 K07), and the Sichuan Province Key Laboratory Science and Technology Development Fund Project (SCQXKJQN202111).

Data Availability Statement

The MODIS and VIIRS active fire location vector products provided by NASAFIRMS (https://firms.modaps.eosdis.nasa.gov/download/ (accessed on 1 January 2024)). Other datasets can be directed to the corresponding author.

Acknowledgments

The authors acknowledge the support from Sichuan Provincial Climate Centre.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yang, Y.; Tian, K.; Hao, J.; Pei, S.; Yang, Y.B. Biodiversity and biodiversity conservation in Yunnan, China. Biodivers. Conserv. 2004, 13, 813–826. [Google Scholar] [CrossRef]
  2. Zhao, P.; Zhang, F.; Lin, H.; Xu, S. GIS-Based Forest Fire Risk Model: A Case Study in Laoshan National Forest Park, Nanjing. Remote Sens. 2021, 13, 3704. [Google Scholar] [CrossRef]
  3. Bowman, D.M.J.S.; Balch, J.K.; Artaxo, P.; Bond, W.J.; Carlson, J.M.; Cochrane, M.A.; D’Antonio, C.M.; DeFries, R.S.; Doyle, J.C.; Harrison, S.P.; et al. Fire in the earth system. Science 2009, 324, 481–484. [Google Scholar] [CrossRef] [PubMed]
  4. Cui, L.; Luo, C.; Yao, C.; Zou, Z.; Wu, G.; Li, Q.; Wang, X. The influence of climate change on forest fires in Yunnan Province, Southwest China detected by GRACE satellites. Remote Sens. 2022, 14, 712. [Google Scholar] [CrossRef]
  5. Zhang, Z.; Yin, J. Research on forest fire risk assessment in high mountain forest areas in southwest China. China Emerg. Rescue 2021, 5, 179–186. [Google Scholar] [CrossRef]
  6. Wang, W.; Zhao, F.; Wang, Y.; Huang, X.; Ye, J. Seasonal differences in the spatial patterns of wildfire drivers and susceptibility in the southwest mountains of China. Sci. Total Environ. 2023, 869, 161782. [Google Scholar] [CrossRef]
  7. Flannigan, M.D.; Krawchuk, M.A.; de Groot, W.J.; Wotton, B.M.; Gowman, L.M. Implications of changing climate for global wildland fire. Int. J. Wildland Fire 2009, 18, 483–507. [Google Scholar] [CrossRef]
  8. Xie, L.; Zhang, R.; Zhan, J.; Li, S.; Shama, A.; Zhan, R.; Wang, T.; Lv, J.; Bao, X.; Wu, R. Wildfire Risk Assessment in Liangshan Prefecture, China Based on an Integration Machine Learning Algorithm. Remote Sens. 2022, 14, 4592. [Google Scholar] [CrossRef]
  9. Hu, H.; Luo, B.; Luo, S.; Wei, S.; Wang, Z.; Li, X.; Liu, F. Research progress on effects of forest fire disturbance on carbon pool of the forest ecosystem. Sci. Silvae Sin. 2020, 56, 160–169. [Google Scholar] [CrossRef]
  10. Zhuang, Y.; Li, R.; Yang, H.; Chen, D.; Chen, Z.; Gao, B.; He, B. Understanding temporal and spatial distribution of crop residue burning in China from 2003 to 2017 Using MODIS Data. Remote Sens. 2018, 10, 390. [Google Scholar] [CrossRef]
  11. Luo, X.; He, H.; Liang, Y.; Wu, Z.; Huang, C.; Zhang, Q. Simulating the effects of fire disturbance for predicting aboveground biomass of major forest types in the great Xing’an mountains. Acta Ecol. Sin. 2016, 36, 1104–1114. [Google Scholar] [CrossRef]
  12. Shi, K.; Touge, Y. Identifying the shift in global wildfire weather conditions over the past four decades: An analysis based on change-points and long-term trends. Geosci. Lett. 2023, 10, 3. [Google Scholar] [CrossRef]
  13. Westerling, A.L.; Hidalgo, H.G.; Cayan, D.R.; Swetnam, T.W. Warming and earlier spring increase western us forest wildfire activity. Science 2006, 313, 940–943. [Google Scholar] [CrossRef] [PubMed]
  14. Kane, V.R.; Cansler, C.A.; Povak, N.A.; Kane, J.T.; McGaughey, R.J.; Lutz, J.A.; Churchill, D.J.; North, M.P. Mixed severity fire effects within the Rim fire: Relative importance of local climate, fire weather, topography, and forest structure. For. Ecol. Manag. 2015, 358, 62–79. [Google Scholar] [CrossRef]
  15. Liu, J.; Guo, H.Y.; Gan, W.W.; Xu, Y.X.; Sun, R.; Li, Z.Y.; Wang, C.X.; Luo, Y. Study on spatio-temporal distribution and heterogeneity of climate forces of wildfires in Panxi Region. J. Southwest For. Univ. 2023, 43, 106–117. [Google Scholar] [CrossRef]
  16. Collins, B.M.; Stephens, S.L. Managing natural wildfires in Sierra Nevada wilderness areas. Front. Ecol. Environ. 2007, 5, 523–527. [Google Scholar] [CrossRef]
  17. Ying, L.; Cheng, H.; Shen, Z.; Guan, P.; Luo, C.; Peng, X. Relative humidity, and agricultural activities dominate wildfire ignitions in Yunnan, Southwest China: Patterns, thresholds, and implications. Agric. For. Meteorol. 2021, 307, 108540. [Google Scholar] [CrossRef]
  18. Syphard, A.D.; Radeloff, V.C.; Keeley, J.E.; Hawbaker, T.J.; Clayton, M.K.; Stewart, S.I.; Hammer, R.B. Human influence on California fire regimes. Ecol. Appl. 2007, 17, 1388–1402. [Google Scholar] [CrossRef]
  19. Ying, L.; Shen, Z.; Guan, P.; Cao, J.; Luo, C.; Peng, X.; Cheng, H. Impacts of the Western Pacific and Indian Ocean warm pools on wildfires in Yunnan, Southwest China: Spatial patterns with interannual and intra-annual variations. Geophys. Res. Lett. 2022, 49, e2022GL098797. [Google Scholar] [CrossRef]
  20. Wang, Z.H.; Dong, H.; Zhao, Y.Y.; He, S.C.; Yuan, Y.B.; Zhang, L.W. Predicting Forest Fire Risk in the Yunnan-Guizhou-Sichuan Region of China Using Machine Learning Models. J. Northeast. For. Univ. 2023, 51, 113–119. [Google Scholar]
  21. Flatley, W.T.; Lafon, C.W.; Grissino-Mayer, H.D. Climatic and topographic controls on patterns of fire in the southern and central Appalachian Mountains, USA. Landsc. Ecol. 2011, 26, 195–209. [Google Scholar] [CrossRef]
  22. Xu, C.; You, C. Climate-linked increasing vegetation fires in global high mountains. Ecography 2022, 12, e06527. [Google Scholar] [CrossRef]
  23. Sharma, N.; Behera, M.D.; Das, A.P.; Panda, R.M. Plant richness pattern in an elevation gradient in the Eastern Himalaya. Biodivers. Conserv. 2019, 28, 2085–2104. [Google Scholar] [CrossRef]
  24. Xiong, Q.L.; Luo, X.J.; Xiao, Y.; Liang, P.H.; Sun, H.; Pan, K.W.; Wang, L.X.; Li, L.J.; Pang, X.Y. Fire from policy, human interventions, or biophysical factors? temporal–spatial patterns of forest fire in southwestern China. For. Ecol. Manag. 2020, 474, 118381. [Google Scholar] [CrossRef]
  25. Maingi, J.K.; Henry, M.C. Factors influencing wildfire occurrence and distribution in eastern Kentucky, USA. Int. J. Wildland Fire 2007, 16, 23–33. [Google Scholar] [CrossRef]
  26. Mansuy, N.; Miller, C.; Parisien, M.A.; Parks, S.A.; Batllori, E.; Moritz, M.A. Contrasting human influences and macro-environmental factors on fire activity inside and outside protected areas of North America. Environ. Res. Lett. 2019, 14, 064007. [Google Scholar] [CrossRef]
  27. Thompson, M.P.; Calkin, D.E.; Finney, M.A.; Ager, A.A.; Gilbertson-Day, J.W. Integrated National-Scale Assessment of Wildfire Risk to Human and Ecological Values. Stoch. Environ. Res. Risk Assess. 2011, 25, 761–780. [Google Scholar] [CrossRef]
  28. Miranda, B.R.; Sturtevant, B.R.; Stewart, S.I.; Hammer, R.B. Spatial and Temporal Drivers of Wildfire Occurrence in the Context of Rural Development in Northern Wisconsin, USA. Int. J. Wildland Fire 2012, 21, 141–154. [Google Scholar] [CrossRef]
  29. Ganteaume, A.; Long-Fournel, M. Driving Factors of Fire Density Can Spatially Vary at the Local Scale in South-Eastern France. Int. J. Wildland Fire 2015, 24, 650–664. [Google Scholar] [CrossRef]
  30. Lozano, F.J.; Suárez-Seoane, S.; Kelly, M.; Luis, E. A Multiscale Approach for Modeling Fire Occurrence Probability Using Satellite Data and Classification Trees: A Case Study in a Mountainous Mediterranean Region. Remote Sens. Environ. 2008, 112, 708–719. [Google Scholar] [CrossRef]
  31. Yue, X.; Mickley, L.J.; Logan, J.A.; Kaplan, J.O. Ensemble projections of wildfire activity and carbonaceous aerosol concentrations over the western United States in the mid-21st century. Atmos. Environ. 2013, 77, 767–780. [Google Scholar] [CrossRef] [PubMed]
  32. Abatzoglou, J.T.; Kolden, C.A. Relationships between climate and macroscale area burned in the western United States. Int. J. Wildland Fire 2013, 22, 1003–1020. [Google Scholar] [CrossRef]
  33. Liu, Z.; Wimberly, M.C. Climatic and landscape influences on fire regimes from 1984 to 2010 in the western United States. PLoS ONE 2015, 10, e0140839. [Google Scholar] [CrossRef]
  34. Abatzoglou, J.T.; Williams, A.P. Impact of anthropogenic climate change on wildfire across western us forests. Proc. Natl. Acad. Sci. USA 2016, 113, 11770–11775. [Google Scholar] [CrossRef]
  35. Nagy, R.C.; Fusco, E.; Bradley, B.; Abatzoglou, J.T.; Balch, J. Human-related ignitions increase the number of large wildfires across U.S. ecoregions. Fire 2018, 1, 4. [Google Scholar] [CrossRef]
  36. Sofia, B.; Ferrara, C.; Guglietta, D.; Ricotta, C. Fifteen years of changes in fire ignition frequency in Sardinia (Italy): A rich-get-richer process. Ecol. Indic. 2019, 104, 543–548. [Google Scholar] [CrossRef]
  37. Zubkova, M.N.; Boschetti, L.; Abatzoglou, J.T.; Giglio, L. Changes in fire activity in Africa from 2002 to 2016 and their potential drivers. Geophys. Res. Lett. 2019, 46, 7643–7653. [Google Scholar] [CrossRef] [PubMed]
  38. Littell, J.S.; Peterson, D.L.; Riley, K.L.; Liu, Y.; Luce, C.H. A review of the relationships between drought and forest fire in the United States. Glob. Chang. Biol. 2016, 22, 2353–2369. [Google Scholar] [CrossRef] [PubMed]
  39. Shawki, D.; Field, R.D.; Tippett, M.K.; Saharjo, B.H.; Albar, I.; Atmoko, D.; Voulgarakis, A. Long-lead prediction of the 2015 fire and haze episode in Indonesia. Geophys. Res. Lett. 2017, 44, 9996–10005. [Google Scholar] [CrossRef]
  40. Liu, J.; Wang, Y.K.; Guo, H.Y.; Lu, Y.F.; Xu, Y.X.; Sun, Y.; Gan, W.W.; Sun, R.; Li, Z.Y. Spatial and temporal patterns and driving factors of forest fires based on an optimal parameter-based geographic detector in the Panxi region, Southwest China. Fire Ecol. 2024, 20, 27. [Google Scholar] [CrossRef]
  41. Parisien, M.-A.; Moritz, M.A. Environmental Controls on the Distribution of Wildfire at Multiple Spatial Scales. Ecol. Monogr. 2009, 79, 127–154. [Google Scholar] [CrossRef]
  42. Parisien, M.-A.; Miller, C.; Parks, S.A.; DeLancey, E.R.; Robinne, F.N.; Flannigan, M.D. The Spatially Varying Influence of Humans on Fire Probability in North America. Environ. Res. Lett. 2016, 11, 075005. [Google Scholar] [CrossRef]
  43. McWethy, D.B.; Pauchard, A.; García, R.A.; Holz, A.; González, M.E.; Veblen, T.T.; Stahl, J.; Currey, B. Correction: Landscape Drivers of Recent Fire Activity (2001–2017) in South-Central Chile. PLoS ONE 2018, 13, e0205287. [Google Scholar] [CrossRef] [PubMed]
  44. Su, J.; Liu, Z.; Wang, W.; Jiao, K.; Yu, Y.; Li, K.; Lü, Q.; Fletcher, T.L. Evaluation of the Spatial Distribution of Predictors of Fire Regimes in China from 2003 to 2016. Remote Sens. 2023, 15, 4946. [Google Scholar] [CrossRef]
  45. Zacharakis, I.; Tsihrintzis, V.A. Integrated wildfire danger models and factors: A review. Sci. Total Environ. 2023, 899, 165704. [Google Scholar] [CrossRef]
  46. Iban, M.C.; Sekertekin, A. Machine learning based wildfire susceptibility mapping using remotely sensed fire data and GIS: A case study of Adana and Mersin provinces, Turkey. Ecol. Inform. 2022, 69, 101647. [Google Scholar] [CrossRef]
  47. Sachdeva, S.; Bhatia, T.; Verma, A.K. GIS-based evolutionary optimized gradient boosted decision trees for forest fire susceptibility mapping. Nat. Hazards 2018, 92, 1399–1418. [Google Scholar] [CrossRef]
  48. Sánchez, M.B.; Tonini, M.; Mapelli, A.; Fiorucci, P. Spatial assessment of wildfires susceptibility in Santa Cruz (Bolivia) using random Forest. Geosciences 2021, 11, 224. [Google Scholar] [CrossRef]
  49. Hong, H.; Jaafarid, A.; Zenner, E.K. Predicting spatial patterns of wildfire susceptibility in the Huichang County, China: An integrated model to analysis of landscape indicators. Ecol. Indic. 2019, 101, 878–891. [Google Scholar] [CrossRef]
  50. Liu, Z.H.; Yang, J.; He, H.S.; Chang, Y. Spatial point analysis of fire occurrence and its influence factor in Huzhong forest area of the Great Xing’an Mountains in Heilongjiang Province, China. Acta Ecol. Sin. 2011, 31, 1669–1677. [Google Scholar]
  51. Hernandez, C.; Keribin, C.; Drobinski, P.; Turquety, S. Statistical modelling of wildfire size and intensity: A step toward meteorological forecasting of summer extreme fire risk. Ann. Geophys. 2015, 33, 1495–1506. [Google Scholar] [CrossRef]
  52. Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. For. Meteorol. 2019, 266–267, 198–207. [Google Scholar] [CrossRef]
  53. Dillon, G.K.; Holden, Z.A.; Morgan, P.; Crimmins, M.A.; Heyerdahl, E.K.; Luce, C.H. Both topography and climate affected forest and woodland burn severity in two regions of the western US, 1984 to 2006. Ecosphere 2011, 2, art130. [Google Scholar] [CrossRef]
  54. Fu, J.J.; Wu, Z.W.; Yan, S.J.; Zhang, Y.J.; Gu, X.L.; Du, L.H. Effects of climate, vegetation, and topography on spatial patterns of burn severity in the Great Xing’an Mountains. Acta Ecol. Sin. 2020, 40, 1672–1682. [Google Scholar] [CrossRef]
  55. Kumar, N.; Kumar, A. Australian Bushfire Detection Using Machine Learning and Neural Networks. In Proceedings of the 2020 7th International Conference on Smart Structures and Systems (ICSSS), Chennai, India, 23–24 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
  56. Ghorbanzadeh, O.; Kamran, K.V.; Blaschke, T.; Aryal, J.; Naboureh, A.; Einali, J.; Bian, J. Spatial Prediction of Wildfire Susceptibility Using Field Survey GPS Data and Machine Learning Approaches. Fire 2019, 2, 43. [Google Scholar] [CrossRef]
  57. Wang, S.S.; Qian, Y.; Leung, L.R.; Zhang, Y. Identifying key drivers of wildfires in the contiguous US using machine learning and game theory interpretation. Earth’s Future 2021, 9, e2020EF001910. [Google Scholar] [CrossRef]
  58. Gholamnia, K.; Nachappa, T.G.; Ghorbanzadeh, O.; Blaschke, T. Comparisons of diverse machine learning approaches for wildfire susceptibility mapping. Symmetry 2020, 12, 604. [Google Scholar] [CrossRef]
  59. Zhang, G.; Wang, M.; Liu, K. Deep neural networks for global wildfire susceptibility modelling. Ecol. Indic. 2021, 127, 107735. [Google Scholar] [CrossRef]
  60. Gunning, D.; Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 2019, 40, 44–58. [Google Scholar] [CrossRef]
  61. Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  62. Guyette, R.P.; Muzika, R.M.; Dey, D.C. Dynamics of an anthropogenic fire regime. Ecosystems 2002, 5, 472–486. [Google Scholar] [CrossRef]
  63. Cilli, R.; Elia, M.; D’Este, M.; Giannico, V.; Amoroso, N.; Lombardi, A.; Pantaleo, E.; Monaco, A.; Sanesi, G.; Tangaro, S.; et al. Explainable artificial intelligence (XAI) detects wildfire occurrence in the Mediterranean countries of southern Europe. Sci. Rep. 2022, 12, 16349. [Google Scholar] [CrossRef]
  64. Tuia, D.; Roscher, R.; Wegner, J.D.; Jacobs, N.; Zhu, X.; Camps-Valls, G. Toward a collective agenda on AI for earth science data analysis. IEEE Geosci. Remote Sens. Mag. 2021, 9, 88–104. [Google Scholar] [CrossRef]
  65. Grünig, M.; Seidl, R.; Senf, C. Increasing aridity causes larger and more severe forest fires across Europe. Glob. Chang. Biol. 2023, 29, 1648–1659. [Google Scholar] [CrossRef] [PubMed]
  66. Chang, Y.; Zhu, Z.L.; Bu, R.C.; Chen, H.G.; Feng, Y.T.; Li, Y.H.; Hu, Y.M.; Wang, Z.C. Predicting fire occurrence patterns with logistic regression in Heilongjiang Province, China. Landsc. Ecol. 2013, 28, 1989–2004. [Google Scholar] [CrossRef]
  67. Guo, F.; Selvalakshmi, S.; Lin, F.; Wang, G.; Wang, W.; Su, Z.; Liu, A. Geospatial information on geographical and human factors improved anthropogenic fire occurrence modeling in the Chinese boreal forest. Can. J. For. Res. 2016, 46, 582–594. [Google Scholar] [CrossRef]
  68. Purevdorj, T.; Tateishi, R.; Ishiyama, T.; Honda, Y. Relationships between Percent Vegetation Cover and Vegetation Indices. Int. J. Remote Sens. 1998, 19, 3519–3535. [Google Scholar] [CrossRef]
  69. Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
  70. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  71. Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar]
  72. Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
  73. Sivrikaya, F.; Küçük, O. Modeling Forest fire risk based on GIS-based analytical hierarchy process and statistical analysis in Mediterranean region. Ecol. Inform. 2022, 68, 101537. [Google Scholar] [CrossRef]
  74. Lundberg, S.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774. Available online: https://arxiv.org/abs/1705.07874 (accessed on 1 January 2024).
  75. Jiménez-Valverde, A. Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modeling. Glob. Ecol. Biogeogr. 2012, 21, 498–507. [Google Scholar] [CrossRef]
  76. Stoyanova, J.; Georgiev, C.; Neytchev, P.; Kulishev, A. Spatial-Temporal Variability of Land Surface Dry Anomalies in Climatic Aspect: Biogeophysical Insight by Meteosat Observations and SVAT Modeling. Atmosphere 2019, 10, 636. [Google Scholar] [CrossRef]
  77. Ali, A.A.; Carcaillet, C.; Bergeron, Y. Long-term fire frequency variability in the eastern Canadian boreal forest: The influences of climate vs. local factors. Glob. Chang. Biol. 2009, 15, 1230–1241. [Google Scholar] [CrossRef]
  78. Chen, B.; Wu, S.; Jin, Y.; Song, Y.; Wu, C.; Venevsky, S.; Xu, B.; Webster, C.; Gong, P. Wildfire risk for global wildland–urban interface areas. Nat. Sustain. 2024, 7, 474–484. [Google Scholar] [CrossRef]
  79. Zumbrunnen, T.; Pezzatti, G.B.; Menéndez, P.; Bugmann, H.; Bürgi, M.; Conedera, M. Weather and human impacts on forest fires: 100 years of fire history in two climatic regions of Switzerland. For. Ecol. Manag. 2011, 261, 2188–2199. [Google Scholar] [CrossRef]
  80. Countryman, C.M. The Fire Environment Concept; USDA Forest Service, Pacific Southwest Range and Experiment Station: Berkeley, CA, USA, 1972.
  81. DellaSalla, D.A.; Williams, J.E.; Williams, C.D.; Franklin, J.F. Beyond smoke and mirrors: A synthesis of fire policy and science. Biol. Conserv. 2004, 18, 976–986. Available online: http://www.jstor.org/stable/3589171 (accessed on 1 January 2024). [CrossRef]
  82. Ganteaume, A.; Jappiot, M. What causes large fires in Southern France. For. Ecol. Manag. 2013, 294, 76–85. [Google Scholar] [CrossRef]
  83. Birch, D.S.; Morgan, P.; Kolden, C.A.; Abatzoglou, J.T.; Dillon, G.K.; Hudak, A.T.; Smith, A.M.S. Vegetation, topography, and daily weather influenced burn severity in central Idaho and western Montana forests. Ecosphere 2015, 6, 17. [Google Scholar] [CrossRef]
  84. Parks, S.A.; Parisien, M.A.; Miller, C.; Dobrowski, S.Z. Fire activity and severity in the western US vary along proxy gradients representing fuel amount and fuel moisture. PLoS ONE 2014, 9, e99699. [Google Scholar] [CrossRef]
  85. Andela, N.; Morton, D.C.; Giglio, L.; Chen, Y.; van der Werf, G.R.; Kasibhatla, P.S.; DeFries, R.S.; Collatz, G.J.; Hantson, S.; Kloster, S.; et al. A human-driven decline in global burned area. Science 2017, 356, 1356–1362. [Google Scholar] [CrossRef]
  86. Yebra, M.; Dennison, P.E.; Chuvieco, E.; Riaño, D.; Zylstra, P.M.; Hunt, E.R.; Danson, F.M.; Qi, Y.; Jurdao, S. A global review of remote sensing of live fuel moisture content for fire danger assessment: Moving towards operational products. Remote Sens. Environ. 2013, 136, 455–468. [Google Scholar] [CrossRef]
  87. Rao, K.; Williams, A.P.; Diffenbaugh, N.S.; Yebra, M.; Konings, A.G. Plant-water sensitivity regulates wildfire vulnerability. Nat. Ecol. Evol. 2022, 6, 332–339. [Google Scholar] [CrossRef]
  88. Cardille, J.A.; Ventura, S.J.; Turner, M.G. Environmental and social factors influencing wildfires in the upper Midwest, United States. Ecol. Appl. 2001, 11, 111–127. [Google Scholar] [CrossRef]
  89. Syphard, A.D.; Keeley, J.E.; Abatzoglou, J.T. Trends and drivers of fire activity vary across California aridland ecosystems. J. Arid. Environ. 2017, 144, 110–122. [Google Scholar] [CrossRef]
  90. Keeley, J.E.; Fotheringham, C.J.; Morais, M. Reexamining fire suppression impacts on shrubland fire regimes. Science 1999, 284, 1829–1832. [Google Scholar] [CrossRef]
  91. Robinne, F.N.; Parisien, M.A.; Flannigan, M. Anthropogenic influence on wildfire activity in Alberta, Canada. Int. J. Wildland Fire 2016, 25, 1131–1143. [Google Scholar] [CrossRef]
  92. Krawchuk, M.A.; Moritz, M.A. Constraints on global fire activity vary across a resource gradient. Ecology 2011, 92, 121–132. [Google Scholar] [CrossRef]
  93. Giglio, L.; Descloitres, J.; Justice, C.O.; Kaufman, Y.J. An enhanced contextual fire detection algorithm for MODIS. Remote Sens. Environ. 2003, 87, 273–282. [Google Scholar] [CrossRef]
  94. Schroeder, W.; Oliva, P.; Giglio, L.; Csiszar, I.A. The new VIIRS 375m active fire detection data product: Algorithm description and initial assessment. Remote Sens. Environ. 2014, 143, 85–96. [Google Scholar] [CrossRef]
  95. Li, P.; Li, W.; Feng, Z.; Xiao, C.; Liu, Y. Spatiotemporal dynamics of active fire frequency in Southeast Asia with the FIRMS Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer (VIIRS) data. Resour. Sci. 2019, 41, 1526–1540. [Google Scholar] [CrossRef]
  96. Santos, F.L.M.; Couto, F.T.; Dias, S.S.; Ribeiro, N.A.; Salgado, R. Vegetation fuel characterization using a machine learning approach over southern Portugal. Remote Sens. Appl. Soc. Environ. 2023, 32, 101017. [Google Scholar] [CrossRef]
  97. del Hoyo, L.V.; Isabel, M.P.M.; Vega, F.J.M. Logistic Regression Models for Human-Caused Wildfire Risk Estimation: Analysing the Effect of the Spatial Accuracy in Fire Occurrence Data. Eur. J. For. Res. 2011, 130, 983–996. [Google Scholar] [CrossRef]
Figure 1. Location of the research region and the distribution of MODIS active fire incidents from 2004 to 2020. Maps at a national scale represent the kernel density of local wildfires for the same time frame.
Figure 1. Location of the research region and the distribution of MODIS active fire incidents from 2004 to 2020. Maps at a national scale represent the kernel density of local wildfires for the same time frame.
Remotesensing 16 03602 g001
Figure 2. Hierarchical importance of climatic variables.
Figure 2. Hierarchical importance of climatic variables.
Remotesensing 16 03602 g002
Figure 3. Hierarchical importance of local factors.
Figure 3. Hierarchical importance of local factors.
Remotesensing 16 03602 g003
Figure 4. The SHAP summary plot ranks the top 20 variables affecting model predictions by their mean absolute SHAP values, shown on the y-axis. Subfigure (a) showcases the importance of these features, while subfigure (b) illustrates their positive or negative effects on wildfire predictions through scatter points.
Figure 4. The SHAP summary plot ranks the top 20 variables affecting model predictions by their mean absolute SHAP values, shown on the y-axis. Subfigure (a) showcases the importance of these features, while subfigure (b) illustrates their positive or negative effects on wildfire predictions through scatter points.
Remotesensing 16 03602 g004
Figure 5. The SHAP dependence plots (a) between SHAP values and Da_minRH, with a fitted trend line (red line); (b) between SHAP values and Norainday_avg, with a fitted trend line (red line); (c) between SHAP values and Da_minRH, showing the interaction with Tmax_avg (color scale); (d) between SHAP values and Norainday_avg, showing the interaction with Tmax_avg (color scale). Da_minRH, daily minimum relative humidity; Noraindy_avg, average number of rainless days of fire season.
Figure 5. The SHAP dependence plots (a) between SHAP values and Da_minRH, with a fitted trend line (red line); (b) between SHAP values and Norainday_avg, with a fitted trend line (red line); (c) between SHAP values and Da_minRH, showing the interaction with Tmax_avg (color scale); (d) between SHAP values and Norainday_avg, showing the interaction with Tmax_avg (color scale). Da_minRH, daily minimum relative humidity; Noraindy_avg, average number of rainless days of fire season.
Remotesensing 16 03602 g005
Figure 6. SHAP interaction plot (a) and heatmap analysis (b).
Figure 6. SHAP interaction plot (a) and heatmap analysis (b).
Remotesensing 16 03602 g006aRemotesensing 16 03602 g006b
Figure 7. Fire-occurrence probability: analysis using LR, RF, and XGB based on meteorological factors.
Figure 7. Fire-occurrence probability: analysis using LR, RF, and XGB based on meteorological factors.
Remotesensing 16 03602 g007
Figure 8. Fire-occurrence probability: analysis using LR, RF, and XGB based on local factors.
Figure 8. Fire-occurrence probability: analysis using LR, RF, and XGB based on local factors.
Remotesensing 16 03602 g008
Figure 9. Fire-occurrence probability: combined meteorological and local factors analysis with LR, RF, and XGB.
Figure 9. Fire-occurrence probability: combined meteorological and local factors analysis with LR, RF, and XGB.
Remotesensing 16 03602 g009
Figure 10. ROC curves of the success rate of three models.
Figure 10. ROC curves of the success rate of three models.
Remotesensing 16 03602 g010
Figure 11. Comparison of error metrics for different models.
Figure 11. Comparison of error metrics for different models.
Remotesensing 16 03602 g011
Figure 12. Risk-assessment mapping results of XGB model.
Figure 12. Risk-assessment mapping results of XGB model.
Remotesensing 16 03602 g012
Table 1. Selection of forest fire driver factors.
Table 1. Selection of forest fire driver factors.
Variable TypeVariable NameCodeResolution/UnitsSource
Meteorological elementAverage number of rainless days of fire season (the year of fire)Norainday_avg1 km/dayComprehensive Meteorological Information Service System (CIMISS): http://www.ncdc.ac.cn/portal/ (accessed on 1 January 2024)
Average precipitation of fire season (the year of fire)Pre0_avg1 km/mmIbid
Average relative humidity of fire season (the year of fire)RH0_avg1 km/%Ibid
Average temperature of fire season (the year of fire)T0_avg1 km/°CIbid
Average minimum temperature of fire season (the year of fire)Tmin_avg1 km/°CIbid
Average maximum temperature of fire season (the year of fire)Tmax_avg1 km/°CIbid
Daily mean temperatureDa_TaveDaily/°CIbid
Daily maximum temperatureDa_TmaxDaily/°CIbid
Daily minimum temperatureDa_TminDaily/°CIbid
Daily precipitationDa_preDaily/mmIbid
Daily maximum windspeedDa_maxwindDaily/m·s−1Ibid
Daily mean relative humidityDa_RHDaily/%Ibid
Daily minimum relative humidityDa_minRHDaily/%Ibid
Daily evaporationDa_EVPDaily/°CIbid
Daily mean ground surface temperatureGST_avgDaily/°CIbid
Daily maximum ground surface temperatureGST_maxDaily/°CIbid
Daily minimum ground surface temperatureGST_minDaily/°CIbid
Sunshine hoursSSHDaily/hIbid
TopographicElevationElev25 m/mShuttle Radar Topography Mission (SRTM): https://SRTM.csi.cgiar.org (accessed on 1 January 2024)
SlopeSlope25 m/degreeIbid
AspectAspect25 m/%Ibid
VegetationForest typeForest_type1 km/ha1:1 million vegetation data set in China: http://www.ncdc.ac.cn (accessed on 1 January 2024)
Fractional vegetation coverFVC1 km/%National Earth System Science Data Center: http://www.geodata.cn (accessed on 1 January 2024)
InfrastructureDistance to the nearest settlementDis_sett1:250,000/kmNational Administration of Surveying, Mapping and Geoinformation of China: https://www.webmap.cn (accessed on 1 January 2024)
Distance to the nearest roadDis_road1:250,000/kmIbid
Distance to the nearest railwayDis_railway1:250,000/kmIbid
SocioeconomicDensity of populationPop1 km/number/kmChinese Academy of Sciences Resource and Environmental Science Data Center: https://www.resdc.cn (accessed on 1 January 2024)
Per capita GDPGDP1 km/RMBIbid
Land use/land coverLULC1 kmGeographic remote sensing ecological network platform: https://www.gisrs.cn (accessed on 1 January 2024)
Table 2. Comparison of three wildfire prediction models.
Table 2. Comparison of three wildfire prediction models.
ModelSelection BasisModel ConstructionData ComplexityModel RobustnessComputational Resource RequirementsAdvantagesDisadvantages
Logistic Regression model (LR)Generally applicable, suitable for simple binary classification problems, appropriate for situations where variable relationships are relatively simple and linear, and clear interpretation is needed.The dependent variable follows a binomial distribution. Predicts the probability of wildfire occurrence through the Logit transformation.Suitable for data with simple and near-linear relationships.Sensitive to outliers and noise, not robust.LowSimple and easy to operate; has a clear expression.Does not consider spatial correlation and heterogeneity of wildfire influencing factors; does not account for the asymmetric structure of wildfire data; requires separate collinearity diagnostics for variables.
Random Forest model (RF)Suitable for data with complex interactions and nonlinear relationships.The random forest model predicts wildfire occurrence probability by constructing multiple decision trees, each using bootstrap sampling from the original dataset. The final prediction is determined by the voting results of all decision trees.Suitable for data with complex interactions and nonlinear relationships.Robust to outliers and noise, not sensitive to data.MediumCan effectively avoid overfitting and underfitting; automatically selects important variables, achieving high prediction results.No clear expression, model is opaque and difficult to interpret.
eXtreme Gradient Boosting Model (XGBoost)Suitable for data requiring high prediction accuracy and handling complex nonlinear relationships.The XGBoost model minimizes the prediction error of the previous tree by gradually constructing decision trees. It uses boosting, a technique that combines numerous decision trees to generate a conclusive prediction, as its ensemble learning method.Suitable for handling complex nonlinear relationships and large-scale data.Highly robust to outliers and noise.HighEffectively handles nonlinear relationships and interactions; includes regularization to prevent overfitting while maintaining high computational efficiency.Requires tuning multiple hyperparameters, computationally intensive.
Table 3. Identification of meteorological variables by intermediate models employing LR and parameter estimation for the selected variable.
Table 3. Identification of meteorological variables by intermediate models employing LR and parameter estimation for the selected variable.
Variables Identified by Intermediate ModelsParameter Estimation
Variablep-Value minp-Value maxSignificant SamplesDirectionCoefficientsStandard ErrorWald Testp-Value
Norainday_avg<0.0001<0.00015−1.330.12118.68<0.0001
Pre0_avg<0.0001<0.00015−1.030.1929.22<0.0001
RH0_avg<0.0001<0.00015−0.500.1412.52<0.0001
Tmin_avg0.370.980+0.010.100.010.93
Tmax_avg0.0070.170+0.220.105.060.03
Da_Tmax<0.0001<0.00015+0.480.1118.17<0.0001
Da_Tmin0.220.940−0.050.110.240.63
Da_pre0.180.820−0.040.050.810.37
Da_maxwind0.090.760+0.020.050.100.75
Da_RH<0.0001<0.00015−0.500.1028.34<0.0001
Da_minRH<0.0001<0.00015−0.980.1271.48<0.0001
Da_EVP0.040.180+0.080.052.470.12
GST_max<0.0001<0.00015−0.630.0944.66<0.0001
GST_min<0.0001<0.00015−0.430.1115.83<0.0001
SSH<0.0001<0.00015+0.580.0859.89<0.0001
Table 4. Identification of local variables by intermediate models employing LR and parameter estimation for the selected variable.
Table 4. Identification of local variables by intermediate models employing LR and parameter estimation for the selected variable.
Variables Identified by Intermediate ModelsParameter Estimation
Variablep-Value minp-Value maxVariablep-Value minp-Value maxVariablep-Value minp-Value max
Dis_sett<0.0001<0.00015−0.160.0414.88<0.0001
Dis_road<0.0001<0.00015+0.180.0514.54<0.0001
Dis_railway0.1140.7650−0.050.041.670.20
Forest_type0.3860.7980+0.030.040.750.39
Aspect0.3180.9340−0.010.040.040.84
Slope<0.00010.0065+0.170.0421.72<0.0001
Elev<0.0001<0.00015−0.190.0516.19<0.0001
Pop<0.0001<0.00015−2.870.3761.93<0.0001
GDP0.5150.8600+0.030.170.030.85
FVC<0.0001<0.00015+0.250.0526.83<0.0001
LULC<0.00010.0015−0.130.0412.05<0.0001
Table 5. Variables in the equation.
Table 5. Variables in the equation.
VariableModel VariablesΒStandard Error (S.E.)Wald TestSignificant (Sig.)
ConstantConstant0.510.0594.840.000
Norainday_avgx1−0.800.1434.480.000
Pre0_avgx2−0.810.1918.100.000
RH0_avgx3−0.400.157.630.006
Da_Tmaxx40.380.1015.160.000
Da_RHx5−0.530.0935.300.000
Da_minRHx6−1.070.1189.090.000
GST_maxx7−0.660.0949.680.000
GST_minx8−0.420.0920.850.000
SSHx90.600.0860.750.000
Dis_settx100.150.066.770.009
Dis_roadx110.040.050.580.04
Slopex120.270.0531.530.000
Elevx13−0.190.078.710.003
Popx14−1.230.3214.710.000
FVCx150.340.0635.790.000
LULCx16−0.170.0512.550.000
Table 6. Accuracy and recall rates of different models.
Table 6. Accuracy and recall rates of different models.
ModelTraining Set AccuracyTest Set AccuracyTest Set Recall Rate
LR0.790.800.88
RF0.890.920.92
XGB0.900.910.92
Table 7. Mean accuracy, standard deviation, and coefficient of variation for cross-validation of each model.
Table 7. Mean accuracy, standard deviation, and coefficient of variation for cross-validation of each model.
StatisticsLRRFXGBoost
Mean0.79310.89410.8841
Standard Deviation0.01940.02480.0181
Coefficient of Variation0.02450.02780.0205
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Wang, Y.; Lu, Y.; Zhao, P.; Wang, S.; Sun, Y.; Luo, Y. Application of Remote Sensing and Explainable Artificial Intelligence (XAI) for Wildfire Occurrence Mapping in the Mountainous Region of Southwest China. Remote Sens. 2024, 16, 3602. https://doi.org/10.3390/rs16193602

AMA Style

Liu J, Wang Y, Lu Y, Zhao P, Wang S, Sun Y, Luo Y. Application of Remote Sensing and Explainable Artificial Intelligence (XAI) for Wildfire Occurrence Mapping in the Mountainous Region of Southwest China. Remote Sensing. 2024; 16(19):3602. https://doi.org/10.3390/rs16193602

Chicago/Turabian Style

Liu, Jia, Yukuan Wang, Yafeng Lu, Pengguo Zhao, Shunjiu Wang, Yu Sun, and Yu Luo. 2024. "Application of Remote Sensing and Explainable Artificial Intelligence (XAI) for Wildfire Occurrence Mapping in the Mountainous Region of Southwest China" Remote Sensing 16, no. 19: 3602. https://doi.org/10.3390/rs16193602

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop