1. Introduction
Fire is a natural phenomenon that occurs worldwide in complex environments. Fire presents a great threat to people and the natural environment and scientists are aware of the necessity to manage fire risk. An example is the big fire that occurred in Chicago in 1871 when flying debris obscured the sky and Chicago was overcome by fire. To date, it is still a challenge for scientists to explain the occurrence and spread of natural fires, and it is even more difficult to predict fires. Fires have been studied in many regions around the world [
1,
2,
3,
4,
5], such as in Spain [
6,
7] and many other countries [
8]. However, few studies have examined fire risk at the city scale and its influence on the environment requires further studies. Worldwide, it is widely known that human activities play an important role in fire risk [
9,
10,
11,
12]. Dense urban population in urban areas and related hidden sources of hazards such as electrical installations and power lines may lead to the occurrence of fires as described in previous studies [
13,
14]. Places with a high population density such as markets and residential areas are especially vulnerable to high levels of fire risk. In addition, other factors such as road density, distance to water bodies, average temperature, precipitation, relative humidity, wind speed, slope, and aspect, which can be summarized as socioeconomic, climate and topographic predictors, were found to be associated with fire risk [
7,
13,
15].
Several studies have adopted global linear regression models (LM) to study fire risk [
15,
16]. However, because of the relationship between fire risk and its influencing factors that may vary over space, which is referred to as spatial heterogeneity, LM, including ordinary least squares (OLS) models, may not be adequate for examining the spatially varying relationship between multiple predictors and fire occurrence. This limitation is mostly due to the constant coefficients in LM. Geographically weighted regression (GWR) has been widely used to take into account the spatial heterogeneity because of the unique characteristics of the model. GWR allows the regression coefficients to vary for individual locations, capturing the effects of non-stationarity and revealing variations in the importance of the variables across the study area. The use of GWR focuses particularly on data analysis and interpretation rather than prediction [
8,
16,
17]. Aside from the popular spatially varied coefficient model, GWR was extended further and geographically and temporally weighted regression (GTWR) was developed to deal with both spatial and temporal non-stationarity [
17]. GWR-based models are not just designed for improving model fitness; rather they facilitate the spatiotemporal exploration of natural phenomena.
GTWR integrates both temporal and spatial information in the weight matrices to capture spatial and temporal heterogeneity simultaneously [
17,
18]. The approach has been used in models of house prices and land use change [
17,
18,
19]. The statistical performance of the GTWR is better than that of the GWR and the OLS in terms of goodness-of-fit. It is a well-known fact that fire risk is a phenomenon that changes over space and time [
11,
20,
21,
22]. The frequency of fire and its ignition locations show spatiotemporal dependence such as clustering, lagging and seasonal trends. Therefore, the existing fire models could be improved by incorporating temporal effects by integrating both spatial and temporal information in the weighting matrices [
17,
18,
23]. Therefore, we aim to use GTWR to discover the potential rules for fire risk and to compare the approach with GWR and LM.
Before developing a GWR or GTWR model, the first task is to select variables and to fit an LM by adopting the OLS method for the purpose of comparison. The process of selecting variables is complex and different criteria or approaches may be used for different models, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), cross validation (CV), stepwise regression, and mean squared errors (MSE) reduction [
15,
24,
25]. Further, an accuracy assessment of predictive spatial models needs to account for spatial autocorrelation. However, little attention has been paid to the influence arising from the presence of spatial autocorrelation in geospatial data and residuals, which may result in overfitting or underestimation [
26]. By using spatial cross validation and bootstrap strategies, spatial prediction errors in the resampling-based accuracy assessment can be improved and the bias caused by residual spatial autocorrelation (RSA) can be corrected [
26,
27]. R statistical software and the packages “spgwr”, “sperrorest”, and “caret” have been used to calibrate the spatial cross validation (SCV) process and thus the resampling-based variable importance and prediction error across data folds can be achieved [
26,
28]. Therefore, it is necessary to first select variables by using SCV before using the variable in a further regression model.
This study compares geographically weighted regression-based models (including GWR and GTWR, which integrates spatial and temporal effect) and LM for modeling fire risk at the city scale. We use historical fire records and related datasets for Hefei city in China to undertake the comparative analysis. The study is divided into three tasks. First, SCV and CV are separately employed and compared in order to obtain the importance of the variables and then the relatively important predictors are selected after multicollinearity test. We also compare SCV and CV and identify the specific differences between them. Second, we adopt the selected variables from the previous step and fit the OLS model using the “caret” package in the R software. The significance level and the relative importance of the variables in the OLS model are quantified and the non-significant variables are removed. Third, we use the significant variables to fit a GWR model and visualize the local coefficients, the local significance of the coefficients and the residual distribution. GTWR is then employed and the fitness of the three models is summarized, along with a semivariogram analysis of residuals in different time periods. In this study, we adopted the original GTWR model created by Huang [
18]. Therefore, the other improved GTWR models are not examined in the study.
The study shows that by using advanced GIS and spatial statistical methods along with detailed historical datasets of fire ignition, it is possible to build valid and meaningful models to explain fire risk. We can use them to improve the management of fire risk and safety in urban areas as well as in the natural environment.
4. Conclusions
In this study, we first performed a spatial cross validation for a linear regression model and compared its results with a stochastic cross validation. The contribution to fire risk by variables varied in different sub training sets and we infer that this kind of nonstationary situation also existed across space and that SCV could reduce the prediction error. The results also showed that the variables LINE and ENTERPRISE were the most important variables for modeling fire risk, although their effects on fire risk were opposite to each other. Further, the results indicated that road density and the population distribution had the most positive influence on fire risk, which implies that we should pay more attention to locations where roads and people are densely clustered. The results also showed that areas with a large number of enterprises had fewer fire ignition records, probably because of strict fire management and prevention measures. Infrastructure fire risk was commonly clustered in areas with dense population and increased human activities, which was in line with the common-sense knowledge.
The study compared LM, GWR, and GTWR by using the variables with a high mean importance value, which were used in the modeling process for fire risk at the city scale. LINE, ENTERPRISE, DEM, SLOPE, LAND2030, and TEMAVE which were all significant were employed in the LM first. The results showed that constant coefficient models like LM did not predict fire risk accurately and could not reveal the spatiotemporal heterogeneity. The statistical results highlighted the weakness of the LM considering the low R-squared value.
With regard to GWR-based methods, the statistical performance improved when compared to OLS and the GTWR was the best model. The R-squared values were 0.2385, 0.2837, and 0.8705 for OLS, GWR and GTWR respectively as shown in
Table 5. More details on the distribution and varying significance of the coefficients for GTWR across space-time were illustrated and the varying distribution of the coefficients together with the correspondent
t-test values (at the significance level of 0.01 and 0.05) changed to some extent for the different periods.
With regard to the spatial distribution of the residuals, the semivariograms indicated spatial autocorrelation for GWR and LM up to a 12-km lag, with a relatively steady semivariance up to 100 km, beyond which it increased again to some extent. This is good proof that the various spatial clustering patterns changed at different spatial scales. The GTWR model showed lower values of semivariance than the GWR and the LM, as well as a flat semivariance line for the entire distance. The results showed the strong ability of GTWR to explain the spatial structure.
For the validation process, GTWR proved more robust because the model had the lowest RSS value, which indicates that for our dataset, the GTWR is the best of the fitted regression models. In addition, a deeper exploration of the GWR revealed the heterogeneity to some extent, although the gap between GWR and GTWR was significant. The results indicate that all seven selected variables are significant in the center areas which have the densest population while the number of significant variables changed hierarchically in some northern and eastern regions. The results show that the heterogeneity mainly exists in suburban and rural areas where human-related facilities or road construction are only clustered in some sub-centers of a city. GTWR can capture small changes while GWR cannot. This finding further illustrated the advantage of GTWR when compared to GWR and LM.
In addition, an in-depth analysis of the relationship between the change in predictors and the change in fire density was conducted. The results show that the variable that influences the change in fire density temporally is varying. This can be probably explained by the expansion of the urban region and the changes in the shape of the city. In addition, an increasing number of buildings and infrastructures have been constructed in the newly built area and the improvement in access by the population will indirectly contribute to the growth of fire occurrence. On the other hand, by implementing technology to control the fire risk in enterprises, the frequency of fire is less than before. Moreover, more sub-centers of fire risk will develop.
The findings in this paper reveal the advantages of using GTWR for explaining fire risk spatiotemporally. This approach, which integrates space and time, enables us to understand the dynamic change in fire risk. Further, we can also make accurate predictions by using the variables that have a high correlation with fire risk in city areas. Therefore, we can determine the areas with different significance levels, allowing us to make dynamic decisions to prevent fire occurrence. An additional finding of this study was that the calculation of the bandwidth used in GTWR will also influence the results and this aspect should be studied further in the future.