Next Article in Journal
Post-Little Ice Age Equilibrium-Line Altitude and Temperature Changes in the Greater Caucasus Based on Small Glaciers
Previous Article in Journal
Signal-to-Noise Ratio Model and Imaging Performance Analysis of Photonic Integrated Interferometric System for Remote Sensing
Previous Article in Special Issue
Formal Quantification of Spatially Differential Characteristics of PSI-Derived Vertical Surface Deformation Using Regular Triangle Network: A Case Study of Shixi in the Northwest Xuzhou Coalfield
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Generalized Spatiotemporally Weighted Boosted Regression to Predict the Occurrence of Grassland Fires in the Mongolian Plateau

1
Science of Collage, Inner Mongolia University of Technology, Hohhot 010051, China
2
Institute of Grassland Research, Chinese Academy of Agricultural Sciences, Hohhot 010022, China
3
College of Geographic Science, Inner Mongolia Normal University, Hohhot 010022, China
4
Department of Geography, School of Arts and Sciences, National University of Mongolia, Ulaanbaatar 14200, Mongolia
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(9), 1485; https://doi.org/10.3390/rs17091485
Submission received: 2 January 2025 / Revised: 3 March 2025 / Accepted: 11 March 2025 / Published: 22 April 2025
(This article belongs to the Special Issue Machine Learning for Spatiotemporal Remote Sensing Data (2nd Edition))

Abstract

:
Grassland fires are one of the main disasters in the temperate grasslands of the Mongolian Plateau, posing a serious threat to the lives and property of residents. The occurrence of grassland fires is affected by a variety of factors, including the biomass and humidity of fuels, the air temperature and humidity, the precipitation and evaporation, snow cover, wind, the elevation and topographic relief, and human activities. In this paper, MCD12Q1, MCD64A1, ERA5, and ETOPO 2022 remote sensing data products and other products were used to obtain the relevant data of these factors to predict the occurrence of grassland fires. In order to achieve a better prediction, this paper proposes a generalized geographically weighted boosted regression (GGWBR) method that combines spatial heterogeneity and complex nonlinear relationships, and further attempts the generalized spatiotemporally weighted boosting regression (GSTWBR) method that reflects spatiotemporal heterogeneity. The models were trained with the data of grassland fires from 2019 to 2022 in the Mongolian Plateau to predict the occurrence of grassland fires in 2023. The results showed that the accuracy of GGWBR was 0.8320, which was higher than generalized boosted regression models’ (GBM) 0.7690. Its sensitivity was 0.7754, which is higher than random forests’ (RF) 0.5662 and GBM’s 0.6927. The accuracy of GSTWBR was 0.8854, which was higher than that of RF, GBM and GGWBR. Its sensitivity was 0.7459, which is higher than that of RF and GBM. This study provides a new technical approach and theoretical support for the disaster prevention and mitigation of grassland fires in the Mongolian Plateau.

1. Introduction

Grassland fires have been one of the typical types of disasters in the main areas of the temperate steppe of the Mongolian Plateau, with an average of 1.3 × 104 square kilometers of an area being engulfed by fire each year from 2001 to 2021 [1]. Frequent grassland fires pose a serious threat to the lives and property of residents. The occurrence of grassland fires is affected by fuel, meteorology, topography, human activities and other factors [2]. By using these factors to establish a prediction model for grassland fires, high-risk areas can be identified in advance, reducing the threat of wildfires to the ecosystem and to local residents [3].
Traditional statistical methods can be used to estimate the probability of grassland fire occurrence. Methods such as the Frequency Ratio method [4,5] and the Weights of Evidence [6] utilize the frequency or number of wildfire occurrences under different conditions to calculate the weights of various influencing factors, and then comprehensively evaluate the likelihood of wildfire occurrence. These methods require prior knowledge and rely on the empirical judgment of experts. In addition, some methods construct generalized regression models to find the linear relationship between influencing factors and whether wildfires occur or the number of occurrences, including logistic regression dealing with binary classification problems [7] and Poisson regression dealing with counting problems [8]. On the basis of logistic regression, geographically weighted logistic regression (GWLR) emerged, a generalized linear model that takes into account spatial heterogeneity, considering that the coefficients expressing the relationship between the explanatory and response variables are geographically variable. Peng, X. et al. used GWLR to analyze the temporal variation characteristics of wildfires in Zhejiang Province [9]. Liang S. et al. used LR and GWLR to analyze the linear relationship between tropical wildfires and driving factors and the spatial distribution of wildfires in Leizhou Peninsula [10]. These methods mainly focus on the descriptive and inferential analysis of data, revealing the patterns and trends behind the data.
With the continuous development of computer science, geographic information system, and remote sensing technology, machine learning has been widely used in the prediction of grassland fires [11]. Machine learning focuses on training with large amounts of data to achieve more accurate predictions. Studies have demonstrated a complex nonlinear relationship between the occurrence of wildfires and their drivers [12]. Many machine learning methods consider nonlinear relationships between variables and have great advantages in processing large data and predictive capabilities, such as Artificial Neural Networks [13], Support Vector Machines [14], random forests (RF) [15], etc. Wang et al. used a recurrent neural network model to construct a forest fire early warning model in Chongli District, Zhangjiakou City [16]. Murali Mohan, K.V. et al. compared the prediction performance of multiple neural networks for wildfire susceptibility and selected the most locally adapted prediction model [12]. Jaafari A et al. applied five decision tree-based classifier models to analyze the spatial pattern of wildfires in the Zagros Mountains [17]. These studies provide an important reference for the prevention and management of wildfire disasters.
However, most machine learning-based predictive models that capture nonlinear relationships are globally calibrated, i.e., they do not reflect the spatial heterogeneity of the relationships between variables. In order to apply spatial heterogeneity to nonlinear relationships, Georganos et al. proposed a locally calibrated Geographical Weighted Random Forest (GWRF) model based on RF and the idea of geographically weighted regression. Numerous studies have shown that GWRF outperforms RF when processing some spatial data [18,19]. However, in scenarios where RF is difficult to deal with, the advantages of GWRF cannot be reflected, so more machine learning methods need to be tried. The generalized boosted regression model (GBM) is a globally calibrated machine learning prediction model. Whether it performs better than RF in the prediction of grassland fires in the Mongolian Plateau remains to be tested. Further, how to combine spatial or even spatiotemporal heterogeneity into the GBM to more effectively predict the occurrence of grassland fires is a problem that needs to be solved.
In this paper, we explore a generalized geographic weighted boosted regression (GGWBR) that considers spatial heterogeneity and can handle complex nonlinear relationships. It is used to construct a prediction model of grassland fire occurrence in the Mongolian Plateau, and uses the factors of combustibles, meteorology, topography and human activities to estimate the probability of grassland fire occurrence in the next month. The occurrence of grassland fires is then predicted. On this basis, generalized spatiotemporally weighted boosted regression (GSTWBR) reflecting the spatiotemporal heterogeneity is attempted in order to provide a new technical approach and theoretical support for the prevention and mitigation of wildfires in the Mongolian Plateau.

2. Materials and Methods

2.1. Study Area

In this paper, the main parts of the Mongolian Plateau (87°00′–126°04′E, 37°22′–53°20′N), namely the Inner Mongolia Autonomous Region and the Mongolian People’s Republic of China, were selected as the study areas, with a total area of about 2.74 × 106 km2 [20]. They have a temperate continental climate, with long and cold winters, and with an average monthly temperature of −10 °C in the south and −38 °C in the north. Summers are warm and short, with average monthly temperatures ranging from 16 °C to 27 °C [21]. The altitude gradually increases from east to west, with an average altitude of about 1580 m [22].
They belong to the arid and semi-arid climate regions [23] with an average annual precipitation of between 50 mm and 400 mm. The average annual evaporation of grassland ranges from 242 to 374 mm, and more than 95% of the water is lost through evapotranspiration [24,25]. Precipitation from May to September [1,25] accounts for more than 90% of the annual precipitation. As a result, from October to April, the climate is drier and the water content of the vegetation is low. Every year from April to early May, before the grassland vegetation turns green, the dead grass with a low moisture content loses its snow cover and becomes flammable. Around October, the vegetation begins to wither, resulting in a significant increase in the amounts of fuels. This, combined with windy weather in autumn, increases the risk of grassland fires occurring and spreading. As a result, most grassland fires occur from March to June and September to October [26]. From the MCD64A1 product, it can be calculated that the numbers of the cumulative 500 m × 500 m burned cells of each month from 2019 to 2023 are 78 in January, 1912 in February, 3493 in March, 104,558 in April, 23,128 in May, 3983 in June, 1075 in July, 744 in August, 7868 in September, 7466 in October, 283 in November and 0 in December.

2.2. Data Sources and Data Preprocessing

The occurrence of grassland fire is not random but has a certain law, which is affected by the comprehensive effect of a variety of factors, such as fuel factors, meteorological factors, topographic factors, human factors, etc. In order to understand the occurrence and influencing factors of grassland fires in the Mongolian Plateau over the past five years (2019–2023), we selected a series of data, listed in Table 1. ArcGIS 10.6 was used to process the acquired data to extract the samples.

2.2.1. Land Cover Type

MCD12Q1 [27] data were the land cover-type product of the Moderate-resolution Imaging Spectroradiometer (MODIS), which provides raster data of land cover types on a global scale with a resolution of 500 m. In this paper, the classification scheme of their plant function types (PFT) was used, and it was divided into 11 categories, as shown on the left side of Figure 1. Since the land cover type data from 2019 to 2023 are basically stable, this paper uses the 2019 data to extract the grassland.

2.2.2. Fuel and Meteorology Data

ERA5-Land monthly averaged data from 1950 to the present [28]: ERA5-land consists of raster data generated by replaying the land component of the European Centre for Medium-range Weather Forecasts (ECMWF) ERA5 climate reanalysis. The spatial resolution is 0.1°. The product provides but is not limited to the monthly average values of the following data:
The leaf area index of low vegetation is half of the total area of green leaves of low vegetation per unit of horizontal surface area, which is used to characterize the biomass of fuels; the skin reservoir content is the amount of water in the vegetation canopy or thin layer of soil, which represents the moisture in the rain and dew intercepted by the leaves of the trees, as a reference index for the humidity of fuels. They were taken as fuel factors.
The 2 m temperature consisted of the air temperature at 2 m above the ground; the 2 m dewpoint temperature was at the air will reach saturation when it is cold to this temperature at 2 m above the ground, which is a measure of air humidity; the total precipitation consisted of the liquid and frozen water, including rain and snow, that falls on the Earth’s surface; the total evaporation was the accumulation of water evaporated from the Earth’s surface, with negative values indicating evaporation and positive values indicating condensation; the snow cover consisted of the proportion of snow area in the cell; the 10 m u wind component was the horizontal velocity at which air moves eastward at an altitude of 10 m above the earth’s surface; the 10 m v wind component was the horizontal velocity at which air moves north at an altitude of 10 m above the Earth’s surface. These data were used as meteorological factors.
The objective of this paper was to establish the relationship between the monthly average level of fuel, meteorology, topography, human activities and the occurrence of grassland fires in the following month, so as to use these influencing factors to predict the occurrence of grassland fires. Hence, we chose fuel and meteorology data from December 2018 to November 2023. A grid is obtained by creating a fishnet and aligning it with the raster of the ERA5 reanalysis data. The grid that contains the grassland is selected, called the sampling grid, and the center of the grid is our sampling point, and other data are extracted from these points.

2.2.3. Topographic Data

ETOPO 2022 [29] is a new version of the Global Bathymetric Geomorphological Elevation Dataset that was released by the National Center for Environmental Information in the United States, with an enhanced 15 arcsecond resolution. Topographic relief is the largest relative elevation difference in a specified analysis area, which reflects the relative elevation difference in the ground and describes the quantitative index of landform morphology [30]. In this paper, the elevation data are used to calculate the difference between the maximum and minimum values in the 3 × 3 cells, known as the topographic relief.

2.2.4. Human Activity Data

Population density included in World Pop, a grid dataset on the spatial distribution of the global population, launched by the University of Southampton in October 2013. This paper uses a population density dataset modified from United Nations national population estimates. The probability of grassland fires is more susceptible to human factors in areas with high population density.
The utilization intensity index [31] is based on the remote sensing data products of vegetation productivity and the existing aboveground biomass of the main grazing grasslands in Eurasia, and the raster data of the grassland use intensity index was obtained by calculating the gap between the total aboveground grass production and the existing biomass of the main grazing grasslands in Eurasia, which was used to reflect the overall use of grassland vegetation. The herders’ behavior of using burning to improve the growth of grassland vegetation was considered.
Since the population density and utilization intensity index of the data from 2019 to 2023 were basically stable, the 2019 dataset was selected in this paper. On the sampling grid, the elevation data, topographic relief, population density, and grassland use intensity index are calculated using partition statistics to calculate the mean values, and generate a raster with the resolution of 0.1°. There are some missing values in the human activity data, which are filled with inverse distance weights. And these data were extracted from the sampling points.

2.2.5. Historical Fire Data

MCD64A1 [32] is a monthly global grid of the 500 m MODIS fire slot data product, containing per-cell burn time and quality information. In this paper, a dataset with a time range from January 2019 to December 2023 was used for the study.
The cells belonging to grassland in the Mongolian Plateau were extracted. Using the partition statistics tool, these cells were processed on the sampling grid, with the historical grassland fire data containing information about whether a grassland fire has occurred or not is obtained with the resolution of 0.1°. And then they were extracted from the sampling points.
The data of historical fires, i.e., whether grassland fires occurred (1 for occurrence and 0 for non-occurrence) were used as response variables, and the fuel, meteorology, topography and human factors were used as explanatory variables for grassland fires. According to the response variable, the sampling points were divided into fire points and non-fire points, and the coordinates were generated. An unbalanced raw dataset was generated with a large number of events that do not occur and a small number of events that occur with grassland fires.

2.3. Methodology

2.3.1. Random Forest

Decision trees are a very common nonlinear machine learning method that can be used for classification and regression, but a single decision tree is not enough for complex scenarios. Bagging and boosting are two of the most common Ensemble Learning methods that can combine multiple decision trees to improve prediction performance. Random forest is a typical application of bagging, which improves the accuracy and stability of the model by constructing multiple explanatory decision trees and synthesizing their prediction results. Bootstrap sampling was used to generate multiple training subsets from random samples from the original dataset, each of which were used to train a decision tree. These decision trees then arrive at the final result by voting or averaging. In this paper, we choose to use the ranger package (v0.16.0) used in R v4.4.1 for random forests, which provides a way to calculate probabilities [33]. We use 500 decision trees, with each decision tree randomly given one-third of the explanatory variables for training, and calculate probabilities of fire points.

2.3.2. Generalized Boosted Regression Models

A typical application of boosting is the generalized boosted regression model (GBM), which forms a powerful prediction model by iteratively adding weak learners (decision trees) and focusing on improving the previous prediction errors at each step. In this paper, we use the gbm package (v2.2.2) in R v4.4.1 for generalized boosted regression models, and set the response variable to the Bernoulli distribution, and then calculate the probability of the fire point. The model building process can be summarized as follows [34]:
Initialize the model: a simple model (a 1-depth decision tree) is typically used as the initial weak classifier. Calculate a negative gradient: for each sample in the training set, calculate the negative gradient of the loss function under the current model. Train a new classifier: use these negative gradients as target variables to train a new weak classifier. Update the model: add the newly trained weak classifier to the original model, and use the multiplier (learning rate of 0.1) to adjust the weight of the base learner so that the loss function is minimized. Iterate 100 times to set the 100 weak learners into a strong learner.

2.3.3. Generalized Geographically Weighted Boosted Regression

GWR uses a scheme to calibrate a separate regression model at each location, calculating a weight that declines with distance by using observations around each regression point [35]. This operation conforms to Tobler’s first law of geography—“everything is related to everything else, but things near are more related than things far away” [36]. The study area is divided into a series of sub-regions, the center of a sub-region is set as an observation point, the distance from the sample to these observation points is calculated, and the sample is given different spatial weights. A region-specific generalized boosted regression model is then trained, which we call generalized geographically weighted boosted regression (GGWBR). The closer the sample is to an observation point, the greater the corresponding spatial weight, and the greater the contribution to the model specific to that region. Therefore, each sub-model reflects the relationship between the explanatory and response variables in a specific region, reflecting spatial heterogeneity. These sub-models can be used to predict samples for the corresponding sub-region.
When it is necessary to build a model in the sub-region where the observation point u i , v i is located, the spatial weight of the sample located at u j , v j is:
W i j = 1                                                                               , u j , v j R i exp 1 2 d i j r h 2   , u j , v j R i
where d i j is the Euclidean distance from u j , v j to u i , v i , R i is the set of the observation points in the sub-region, r is the radius of the tangent circle in the sub-region, and h is the bandwidth of the method and is a parameter that needs to be optimized.
The spatial weight of the samples in the region is 1; outside the region, the spatial weight of the sample with a larger distance to the observation point is smaller, that is, the spatial weight decreases with the increase in distance. The larger the bandwidth, the faster this change will be. When h is r , 3 r , 5 r , 7 r , or 9 r , the relationship between the spatial weight and the distance is shown in Figure 2:
When determining the spatial weight, we need to set a series of h values, and then select an optimal value, named h o p t , through cross-validation. The selection method is as follows: Under a certain bandwidth, the spatial weight of the sample at the observation point in a sub-region is calculated. The generalized weighted boosted regression model is trained with the samples from outside the sub-region and used to classify the samples within the sub-region. The above operations are performed in each sub-region to obtain the probability that each sample is a positive sample, and then calculate the balanced cross-entropy loss [37] function:
L o s s = 1 n i = 1 n n 0 n 1 y i ln p i + 1 y i ln 1 p i
where n is the total sample size, n 0 is the negative sample size, n 1 is the positive sample size, the response variable y i of the negative sample is 0, y i of the positive sample is 1, and p i is the probability of the sample being a positive sample. When L o s s is the smallest, the corresponding h o p t is optimal.

2.3.4. Generalized Spatiotemporally Weighted Boosted Regression

For GGWBR, the sub-models of different regions reflect the relationship between the explanatory and response variables of a specific region, reflecting spatial heterogeneity. However, the relationship between the explanatory and response variables at different times is also different, i.e., there is heterogeneity in time. In order to capture temporal heterogeneity, samples are given different spatiotemporal weights in different time and regions, and different generalized boosted regression sub-models are established, which we call generalized spatiotemporally weighted boosted regression (GSTWBR).
In this study, a different sub-model was established for each region in each month, and when the sub-model needs to be built in the region where the month t i   and the region center are at u i , v i , the spatiotemporal weights of the samples in the month t j   and u i , v i are:
W s t i j = W i j exp 1 2 d t i j h t 2
d t i j = min t i t j , 12 t i t j
where W i j   is calculated by using the GGWBR method and h o p t . d t i j is the time difference between the month t i and the month t j , and h t is the time bandwidth of the method and is a parameter that needs to be optimized.
The greater the distance to the observation points and the farther away the months of the samples are, the smaller the spatiotemporal weight is. The larger the h t value is, the faster this change will be. The optimal h t value, named h t o p t , was selected by cross-validation. This is similar to GGWBR. First, a weighted generalized boosted regression model is built for each region and month at certain h t values, and it is trained with samples outside of this region and month. This model was used to predict the sample of the region and month, and the classification effect of the model was evaluated. When L o s s is the smallest, the corresponding h t o p t value is optimal.

2.4. Experimental Design

2.4.1. Model Building

The data from 2019 to 2022 were used as the training set, and the data from 2023 were used as the test set to evaluate the prediction effect. In order to ensure that the algorithm learns the two types of cases in a balanced manner, it is necessary to perform random under-sampling of the non-fire points in the training set samples, so that the ratio of the fire to the non-fire points of the total samples is approximately 1:1 [38]. Then, it is necessary to train a random forest and a generalized boosted regression model.
In order to use GGWBR method, using the grids obtained from the ERA5 reanalysis data, the study area was divided into several study sub-regions with 20 × 20 grids as a sub-region, shown in Figure 3. The radius of the tangent circle in the sub-regions was 107 km. Then, random under-sampling of non-fire points was performed in different areas.
In each study sub-area, in order to ensure the balance between fire and non-fire samples, the non-fire samples were under-sampled. And in order for the model to not achieve an imbalance in the information obtained in different regions, the sampling ratio of each region should be equal. To calculate this sampling ratio, first calculate the proportion of fire points in each area to the total sample, and then find the maximum value. In this area, the non-fire spot samples are randomly under-sampled to obtain a balanced sample, so that the ratio of fire points to non-fire points is approximately 1:1. The proportion of these samples in the total sample in the region is used as our sampling ratio, and all regional samples are sampled (all fire points are retained, and non-fire points are under-sampled). At this time, the maximum ratio of fire sample to non-fire samples in each area is 1:1.
The overall sample is still unbalanced, so the initial weight of the fire sample is set as the ratio of the number of non-fires to the number of fires, and the initial weight of the non-fire sample is set as 1, and on this basis, the spatial weight is multiplied as the weight of the sample. In order to find the appropriate bandwidth to calculate the spatial weight, set h to 80 km, 107 km, 133 km, 160 km, 187 km and 214 km. After h o p t is obtained using the cross-validation method, the model can be trained.
When using the GSTWBR method, non-fires need to be randomly under-sampled in each study sub-region each month. Then, the generalized geographic weighted boosted regression method was used to determine the spatial weight, and then the h t was set to 1, 2, 3, 4, 5, and 6. After h o p t and h t o p t are obtained using the cross-validation method, the model can be trained.
After the models are built using the four methods, the test set is used to evaluate the predicted results. The specific workflow is presented in Figure 4. Variables of fuel, meteorology, and human activity are abbreviated, as shown in the Short Form column in Table 1.

2.4.2. Evaluation Index

With 0.5 as the threshold, the sample can be divided into fire and non-fire according to whether the probability of the sample being a fire is greater than or equal to 0.5. The results were then evaluated using a confusion matrix. The confusion matrix usually includes the following basic elements: a True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN), and the relationship between these basic elements is shown in Table 2:
Then, we can calculate that the sensitivity is equal to TP/(TP + FN), which reflects the model’s ability to capture a fire, and the larger the value, the more the model can capture the fire. Specificity, which is equal to TN/(FP + TN), reflects the model’s ability to capture non-fires. Accuracy, which is equal to (TP + TN)/(TP + FN + FP + TN), reflects the model’s ability to predict the data as a whole.
The ROC, or Receiver Operating Characteristic Curve, shows the performance of the classifier in distinguishing between positive and negative samples by plotting the sensitivity and specificity relationship at different classification thresholds. In general, the closer the ROC is to the top left corner, the better the performance of the model. The AUC, or Area Under Curve, ranges from 0 to 1, with higher values indicating the better performance of the classifier. The significance of the AUC value is that it reflects the classifier’s ability to sort the samples. The AUC value is not affected by classification thresholds, which makes it excellent when dealing with datasets with unbalanced categories.

3. Results

When using generalized geographic weighted boost regression, h o p t is 107 km, which corresponds to the smallest L o s s . When using generalized spatiotemporally weighted boosted regression, h o p t is determined to be 160 km and h t o p t is determined to be 1. Then, we can obtain the probability distributions of fire and non-fire events in 2023, respectively, as shown in Figure 5. The closer the probability of a fire point is to 1, the better the result of the positive sample classification is; the closer the probability of non-fire point is to 0, the better the result of the negative sample classification is.

3.1. Evaluation of Classification Effect

The probability of the test set sample being a fire is predicted by the model, and the test set sample is divided into fire points and non-fire points with 0.5 as the threshold. The confusion matrix, sensitivity and accuracy of RF, GBM, GGWBR, and GSTWBR can be obtained, and the results are shown in Table 3:
By changing the threshold of classification, the ROC is drawn, the AUC value is calculated, and the classification effects of RF, GBM, GGWBR, and GSTWBR are shown in Figure 6. The classification effects of different methods are ranked from good to bad, and the sensitivity results in GGWBR > GSTWBR > GBM > RF; the accuracy results in GSTWBR > RF > GGWBR > GBM; and the AUC results in GSTWBR > GGWBR > RF > GBM.

3.2. Spatiotemporal Distribution of Fire Points and Fire Point Probabilities

Using the trained RF, GBM, GGWBR, and GSTWBR, the probability of grassland fires at all sampling sites in 2023 was predicted month by month. A probability distribution map of grassland fires in the grassland area was drawn, as shown in Figure 7. Blank values were added by inverse distance interpolation. Most of the fires were located in areas with a probability greater than 0.5 or near areas with a probability greater than 0.5.

4. Discussion

Since the number of non-fires is much larger than the number of fires, the specificity and accuracy are almost equal. If the probability of predicting the sample to be a fire is generally too low, it will lead to high accuracy and low sensitivity. The lower the sensitivity of the prediction, the higher the false negative rate will be if the ability to capture the fire is insufficient. For areas with high fire risk, in order to improve the awareness of fire prevention, the timeliness and accuracy of early warning are realized. We need to focus on the sensitivity of the prediction, followed by the accuracy.
RF and the GBM are good at dealing with nonlinear relationships between variables, and this paper compares the applicability of RF and the GBM to the prediction of wildfires. The generalized boosted regression model has higher sensitivity and a stronger ability to capture fire points. This means greater vigilance for fires, which is important in areas with a higher risk of grassland fires. However, the global calibration method is difficult to adapt to the spatial changes in the variable relationship. In this paper, the GBM is combined with spatial weights, and GGWBR is used to predict the probability of fire points, which successfully makes up for this shortcoming. It provides a new technical path in the field of fire prediction.
Different from the globally calibrated GBM, GGWBR simulates the change in the relationship between variables with spatial position by assigning different spatial weights to each sub-region, making the model more suitable for grassland fire prediction scenarios with large regional differences. This study showed that the AUC, accuracy, and sensitivity of the prediction results of the model were better than those of the GBM. The sample sizes for monthly fire are obviously different, so there may be temporal heterogeneity in the probability of wildfires. Accordingly, GSTWBR is used to estimate the probability of fire points, which further improves the prediction effect. Moreover, the spatiotemporal distribution of the probability of the fire point can reflect the distribution of the real fire point to a certain extent. Compared with the random forest with higher accuracy at the expense of sensitivity, it significantly improves the sensitivity and AUC value, and the accuracy can be close to that of random forest, which makes up for the non-fire point prediction ability in some low-fire-risk areas.
Although GSTWBR shows significant predictive advantages, its applicability still has some limitations. The premise of the study sub-region is that the region can be predicted by the same model for a certain month. Obviously, these sub-regions can be smaller and more numerous to improve the adaptability of the model to regional changes in variable relationships, but this will increase the computational complexity. In addition, in areas and time points where fires occur rarely, due to the scarcity of positive samples, the model may rely on positive samples that are nearby or close in time, and it may train according to the location and time point that should not be appropriate. Future research should explore higher-resolution datasets and improve computational efficiency to improve the model’s ability to capture features at multiple time scales and regions. At the same time, the generalizability of the model needs to be further verified, such as whether it is suitable for other disaster types (such as floods, sandstorms) and disaster risk assessment scenarios in different regions.

5. Conclusions

The focus of this study is to find a suitable method for predicting the occurrence of fires in the Mongolian Plateau, and the higher the sensitivity and accuracy of fire point prediction are, the stronger the practicability of the constructed model is. Prediction sensitivity is especially important for high-risk areas that require greater ability to capture fire spots. Compared with the GBM, RF has a higher accuracy rate, but it sacrifices too much sensitivity to adapt to the prediction of grassland fires in the Mongolian Plateau. Compared with RF, the GBM may have greater potential in steppe fire prediction in the Mongolian Plateau.
Due to the seasonal and regional characteristics of grassland fires, the relationship between variables in different months and regions is different, and spatiotemporal heterogeneity needs to be considered. In order to explore how spatial heterogeneity can be incorporated into nonlinear relationship models, and whether this can improve the predictive power of the model, in this study, the steppe fire data and structural model of the Mongolian Plateau from 2019 to 2022 were used to predict the occurrence of grassland fires in 2023. The results show that GSTWBR considering spatiotemporal heterogeneity achieves higher sensitivity and accuracy in grassland fire prediction than the GGWBR considering the global calibration of the GBM and the GGWBR considering spatial heterogeneity. Compared to RF, GSTWBR significantly improves the sensitivity of prediction without losing much accuracy. It provides new possibilities and theoretical references for disaster prediction methods with significant spatiotemporal heterogeneity.

Author Contributions

Conceptualization, R.W. (Ritu Wu); Methodology, R.W. (Ritu Wu) and Z.H.; Software, R.W. (Ritu Wu), Z.H., W.D. and Y.S.; Investigation, R.W. (Ritu Wu); Data curation, R.W. (Ritu Wu); Writing—original draft, R.W. (Ritu Wu); Writing—review & editing, R.W. (Ritu Wu), Z.H., W.D., Y.S., H.Y., R.W. (Rihan Wu) and B.G.; Supervision, Z.H., W.D., Y.S., H.Y. and R.W. (Rihan Wu); Funding acquisition, Z.H., W.D., Y.S., H.Y., R.W. (Rihan Wu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Science and Technology Program of Inner Mongolia Autonomous Region grant number 2024KJHZ0002, 2022YFSH0027, 2024KJHZ0007; the Key Special Project of Inner Mongolia’s “Science and Technology for the Development of Mongolia” Action Plan grant number 2020ZD0028; National Natural Science Foundation of China grant number 81860605; Inner Mongolia Natural Science Foundation grant number 2023MS01001; Basic Scientific Research Business Expense Project of Colleges and Universities Directly under Inner Mongolia grant number JY 20220087. And The APC was funded by 2024KJHZ0002.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bao, Y.; Shinoda, M.; Yi, K.; Fu, X.; Sun, L.; Nasanbat, E.; Li, N.; Xiang, H.; Yang, Y.; DavdaiJavzmaa, B.; et al. Satellite-Based Analysis of Spatiotemporal Wildfire Pattern in the Mongolian Plateau. Remote Sens. 2023, 15, 190. [Google Scholar] [CrossRef]
  2. Bousquet, E.; Mialon, A.; Rodriguez-Fernandez, N.; Mermoz, S.; Kerr, Y. Monitoring post-fire recovery of various vegetation biomes using multi-wavelength satellite remote sensing. Biogeosciences 2022, 19, 3317–3336. [Google Scholar] [CrossRef]
  3. Liu, X.; Zhang, G.; Lu, J.; Zhang, J. Risk assessment using transfer learning for grassland fires. Agric. For. Meteorol. 2019, 269, 102–111. [Google Scholar] [CrossRef]
  4. Mofokeng, O.D.; Adelabu, S.A.; Jackson, C.M. An Integrated Grassland Fire-Danger-Assessment System for a Mountainous National Park Using Geospatial Modelling Techniques. Fire 2024, 7, 61. [Google Scholar] [CrossRef]
  5. Nur, A.S.; Kim, Y.J.; Lee, C. Creation of Wildfire Susceptibility Maps in Plumas National Forest Using InSAR Coherence, Deep Learning, and Metaheuristic Optimization Approaches. Remote Sens. 2022, 14, 4416. [Google Scholar] [CrossRef]
  6. Salavati, G.; Saniei, E.; Ghaderpour, E.; Hassan, Q.K. Wildfire Risk Forecasting Using weights of Evidence and Statistical IndexModels. Sustainability 2022, 14, 3881. [Google Scholar] [CrossRef]
  7. Jin, T.; Hu, X.; Liu, B.; Xi, C.; He, K.; Cao, X.; Luo, G.; Han, M.; Ma, G.; Yang, Y.; et al. Susceptibility Prediction of Post-Fire Debris Flows in Xichang, China, Using a Logistic Regression Model from a Spatiotemporal Perspective. Remote Sens. 2022, 14, 1306. [Google Scholar] [CrossRef]
  8. Graff, C.A.; Coffield, S.R.; Chen, Y.; Foufoula-Georgiou, E.; Randerson, J.T.; Smyth, P. Forecasting Daily Wildfire Activity Using Poisson Regression. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4837–4851. [Google Scholar] [CrossRef]
  9. Peng, X.; Jin, Q.; Zhan, Q.; Guo, F. Analysis of Factors Related to Regional Wildfires in Zhejiang Using a Geographically Weighted Logistic Regression Model. J. Northeast For. Univ. 2021, 49, 57–66. [Google Scholar]
  10. Liang, S.; Su, Z. Analysis of Driving Factors of Wildfires on Leizhou Peninsula Based on Logistic and GWR Logistic Models. J. Southwest For. Univ. (Nat. Sci. Ed.) 2022, 42, 161–169. [Google Scholar]
  11. Dong, H.; Wu, H.; Sun, P.; Ding, Y. Wildfire Prediction Model Based on Spatial and Temporal Characteristics: A Case Study of a Wildfire in Portugal’s Montesinho Natural Park. Sustainability 2022, 14, 10107. [Google Scholar] [CrossRef]
  12. Murali Mohan, K.V.; Satish, A.R.; Mallikharjuna Rao, K.; Yarava, R.K.; Babu, G.C. Leveraging Machine Learning to Predict Wild Fires. In Proceedings of the 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 7–9 October 2021; pp. 1393–1400. [Google Scholar]
  13. Bui, D.T.; Hoang, N.D.; Samui, P. Spatial pattern analysis and prediction of forest fire using new machine learning approach of Multivariate Adaptive Regression Splines and Differential Flower Pollination optimization: A case study at Lao Cai province (Viet Nam). J. Environ. Manag. 2019, 237, 476–487. [Google Scholar]
  14. Banerjee, P. Maximum entropy-based forest fire likelihood mapping: Analysing the trends, distribution, and drivers of forest fires in Sikkim Himalaya. Scand. J. For. Res. 2021, 36, 275–288. [Google Scholar] [CrossRef]
  15. Cao, Y.X.; Wang, M.; Liu, K. Wildfire Susceptibility Assessment in Southern China: A Comparison of Multiple Methods. Int. J. Disaster Risk Sci. 2017, 8, 164–181. [Google Scholar] [CrossRef]
  16. Wang, J.; Guo, Z.; Ye, Q.; Gao, D. LSTM Forest Fire Prediction Model Based on Buffer Zone Resampling. J. Saf. Sci. Technol. 2023, 19, 195–202. [Google Scholar]
  17. Jaafari, A.; Zenner, E.K.; Pham, B.T. Wildfire spatial pattern analysis in the Zagros Mountains, Iran: A comparative study of decision tree based classifiers. Ecol. Inform. 2018, 43, 200–211. [Google Scholar] [CrossRef]
  18. Georganos, S.; Grippa, T.; Gadiaga, A.N.; Linard, C.; Lennert, M.; Vanhuysse, S.; Mboga, N.; Wolff, E.; Kalogirou, S. Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int. 2021, 36, 121–136. [Google Scholar] [CrossRef]
  19. Khan, S.N.; Li, D.; Maimaitijiang, M. A Geographically Weighted Random Forest Approach to Predict Corn Yield in the US Corn Belt. Remote Sens. 2022, 14, 2843. [Google Scholar] [CrossRef]
  20. Dong, Y.; Yan, H.; Wang, N.; Huang, M.; Hu, Y. Automatic Identification of Shrub-Encroached Grassland in the Mongolian Plateau based on UAS Remote Sensing. Remote Sens. 2019, 11, 1623. [Google Scholar] [CrossRef]
  21. Wu, R.; Zhao, J.; Zhang, H.; Guo, X.; Ying, H.; Deng, G.; Li, H. Wildfires on the Mongolian Plateau: Identifying Drivers and Spatial Distributions to Predict Wildfire Probability. Remote Sens. 2019, 11, 2361. [Google Scholar] [CrossRef]
  22. Chen, D.; Mi, J.; Chu, P.; Cheng, J.; Zhang, L.; Pan, Q.; Xie, Y.; Bai, Y. Patterns and drivers of soil microbial communities along a precipitation gradient on the Mongolian Plateau. Landsc. Ecol. 2015, 30, 1669–1682. [Google Scholar] [CrossRef]
  23. Chao, L.; Bao, Y.; Zhang, J.; Bao, Y.; Mei, L.; Cha, E. Effects of Vegetation Belt Movement on Wildfire in the Mongolian Plateau over the Past 40 Years. Remote Sens. 2023, 15, 2341. [Google Scholar] [CrossRef]
  24. Liu, Y.; Zhuang, Q.; Chen, M.; Pan, Z.; Tchebakova, N.; Sokolov, A.; Kicklighter, D.; Melillo, J.; Sirin, A.; Zhou, G.; et al. Response of evapotranspiration and water availability to changing climate and land cover on the Mongolian Plateau during the 21st century. Glob. Planet. Change 2013, 108, 85–99. [Google Scholar] [CrossRef]
  25. Bao, G.; Qin, Z.; Bao, Y.; Zhou, Y.; Li, W.; Sanjjav, A. NDVI-Based Long-Term Vegetation Dynamics and Its Response to Climatic Change in the Mongolian Plateau. Remote Sens. 2014, 6, 8337–8358. [Google Scholar] [CrossRef]
  26. Zhao, H.; Zhang, Z.; Ying, H.; Chen, J.; Zhen, S.; Wang, X.; Shan, Y. The spatial patterns of climate-fire relationships on the Mongolian Plateau. Agric. For. Meteorol. 2021, 308, 108549. [Google Scholar] [CrossRef]
  27. Friedl, M.; Sulla-Menashe, D. MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V061; NASA EOSDIS Land Processes DAAC; NASA: Washington, DC, USA, 2022. [Google Scholar]
  28. Muñoz Sabater, J. ERA5-Land Monthly Averaged Data from 1950 to Present; Copernicus Climate Change Service (C3S) Climate Data Store (CDS): Berks, UK, 2019. [Google Scholar]
  29. NCEI; NOAA. ETOPO 2022 15 Arc-Second Global Relief Model; NOAA National Centers for Environmental Information: Washington, DC, USA, 2022. [Google Scholar]
  30. Zhang, M.; Li, J.; Li, N.; Sun, W.; Li, P.; Zhao, Y. Spatial Inhomogeneity of Synoptic-Induced Precipitation in a Region of Steep Topographic Relief: A Case Study. J. Geophys. Res. Atmos. 2024, 129, e2023JD039129. [Google Scholar] [CrossRef]
  31. Xin, X.; Xu, D.; Li, Z. The Data Set of the Use Intensity Index of the Main Grazing Grasslands in Eurasia (2000–2020); The National Earth Observation Science Data Center: Beijing, China, 2021. [Google Scholar]
  32. Giglio, L.; Justice, C.; Boschetti, L.; Roy, D. MODIS/Terra+Aqua Burned Area Monthly L3 Global 500m SIN Grid V061; NASA EOSDIS Land Processes DAAC; NASA: Washington, DC, USA, 2021. [Google Scholar]
  33. Malley, J.D.; Kruppa, J.; Dasgupta, A.; Malley, K.G.; Ziegler, A. Probability machines: Consistent probability estimation using nonparametric learning machines. Methods Inf. Med. 2012, 51, 74–81. [Google Scholar] [CrossRef]
  34. Ridgeway, G. Generalized Boosted Models: A Guide to the gbm Package. 2024. Available online: https://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf (accessed on 26 June 2024).
  35. Fraser, L.K.; Clarke, G.P.; Cade, J.E.; Edwards, K.L. Fast Food and Obesity a Spatial Analysis in a Large United Kingdom Population of Children Aged 13–15. Am. J. Prev. Med. 2012, 42, E77–E85. [Google Scholar] [CrossRef]
  36. Tobler, W.R. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ. Geogr. 1970, 46 (Suppl. S1), 234–240. [Google Scholar] [CrossRef]
  37. Wu, Y.X.; Du, K.; Wang, X.J.; Min, F. Misclassification-guided loss under the weighted cross-entropy loss framework. Knowl. Inf. Syst. 2024, 66, 4685–4720. [Google Scholar] [CrossRef]
  38. Pérez-Porras, F.J.; Triviño-Tarradas, P.; Cima-Rodríguez, C.; Meroño-de-Larriva, J.E.; García-Ferrer, A.; Mesas-Carrascosa, F.J. Machine Learning Methods and Synthetic Data Generation to Predict Large Wildfires. Sensors 2021, 21, 3694. [Google Scholar] [CrossRef]
Figure 1. Study area.
Figure 1. Study area.
Remotesensing 17 01485 g001
Figure 2. Spatial-weight-and-distance relationship.
Figure 2. Spatial-weight-and-distance relationship.
Remotesensing 17 01485 g002
Figure 3. Diagram of the division of the research sub-areas.
Figure 3. Diagram of the division of the research sub-areas.
Remotesensing 17 01485 g003
Figure 4. Flowchart of predicting the probability of grassland fire occurrence.
Figure 4. Flowchart of predicting the probability of grassland fire occurrence.
Remotesensing 17 01485 g004
Figure 5. The frequency distribution histogram of grassland fire probability in fire and non-fire samples.
Figure 5. The frequency distribution histogram of grassland fire probability in fire and non-fire samples.
Remotesensing 17 01485 g005
Figure 6. ROC and AUC for different methods.
Figure 6. ROC and AUC for different methods.
Remotesensing 17 01485 g006
Figure 7. (a) The spatiotemporal distribution of grassland fire probability in 2023 from January to June. (b) The spatiotemporal distribution of grassland fire probability in 2023 from July to December.
Figure 7. (a) The spatiotemporal distribution of grassland fire probability in 2023 from January to June. (b) The spatiotemporal distribution of grassland fire probability in 2023 from July to December.
Remotesensing 17 01485 g007aRemotesensing 17 01485 g007b
Table 1. Summary of data sources.
Table 1. Summary of data sources.
DataVariablesShort FormUnitSourceResolution
Land cover type - -MCD12Q1500 m
Historical fire data- -MCD64A1500 m
Fuel and meteorology dataLeaf area index, low vegetationlai_lvm2·m−2ERA5-Land monthly averaged data from 1950 to present10 km
Skin reservoir contentsrcm of water equivalent
2 m temperature2tK
2 m dewpoint temperature2dK
Total precipitationtpm
Total evaporationem of water equivalent
Snow coversnowc%
10 m u wind componentu10m·s−1
10 m v wind componentv10m·s−1
Topographic data elevationelmETOPO 2022500 m
Human Activity dataPopulation densitypdkm−2WorldPop1 km
Utilization intensity indexui-Dataset of utilization intensity index of main grazing grasslands in Eurasia (2000–2020)1 km
Table 2. The elements of the confusion matrix.
Table 2. The elements of the confusion matrix.
Predicted as a FirePredicted as a Non-Fire
Actually a fireTPFN
Actually a non-fireFPTN
Table 3. Results of prediction.
Table 3. Results of prediction.
FireNon-fire
MethodTPFNTNFPSensitivitySpecificityAccuracy
RF479367199,50731,8880.56620.86220.8611
GBM586260178,00453,3910.69270.76930.7690
GGWBR656190192,55738,8380.77540.83220.8320
GSTWBR631215204,98726,4080.74590.88590.8854
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, R.; Hong, Z.; Du, W.; Shan, Y.; Ying, H.; Wu, R.; Gantumur, B. A Generalized Spatiotemporally Weighted Boosted Regression to Predict the Occurrence of Grassland Fires in the Mongolian Plateau. Remote Sens. 2025, 17, 1485. https://doi.org/10.3390/rs17091485

AMA Style

Wu R, Hong Z, Du W, Shan Y, Ying H, Wu R, Gantumur B. A Generalized Spatiotemporally Weighted Boosted Regression to Predict the Occurrence of Grassland Fires in the Mongolian Plateau. Remote Sensing. 2025; 17(9):1485. https://doi.org/10.3390/rs17091485

Chicago/Turabian Style

Wu, Ritu, Zhimin Hong, Wala Du, Yu Shan, Hong Ying, Rihan Wu, and Byambakhuu Gantumur. 2025. "A Generalized Spatiotemporally Weighted Boosted Regression to Predict the Occurrence of Grassland Fires in the Mongolian Plateau" Remote Sensing 17, no. 9: 1485. https://doi.org/10.3390/rs17091485

APA Style

Wu, R., Hong, Z., Du, W., Shan, Y., Ying, H., Wu, R., & Gantumur, B. (2025). A Generalized Spatiotemporally Weighted Boosted Regression to Predict the Occurrence of Grassland Fires in the Mongolian Plateau. Remote Sensing, 17(9), 1485. https://doi.org/10.3390/rs17091485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop