Forest Fire Prediction Based on Time Series Networks and Remote Sensing Images

Cao, Yue; Zhou, Xuanyu; Yu, Yanqi; Rao, Shuyu; Wu, Yihui; Li, Chunpeng; Zhu, Zhengli

doi:10.3390/f15071221

Open AccessArticle

Forest Fire Prediction Based on Time Series Networks and Remote Sensing Images

by

Yue Cao

¹

,

Xuanyu Zhou

¹,

Yanqi Yu

¹,

Shuyu Rao

¹,

Yihui Wu

¹,

Chunpeng Li

² and

Zhengli Zhu

^1,*

¹

College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China

²

Sports Training Academy, Nanjing Sport Institute, Nanjing 210014, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(7), 1221; https://doi.org/10.3390/f15071221

Submission received: 28 May 2024 / Revised: 8 July 2024 / Accepted: 10 July 2024 / Published: 14 July 2024

(This article belongs to the Section Natural Hazards and Risk Management)

Download

Browse Figures

Versions Notes

Abstract

:

Protecting forest resources and preventing forest fires are vital for social development and public well-being. However, current research studies on forest fire warning systems often focus on extensive geographic areas like states, counties, and provinces. This approach lacks the precision and detail needed for predicting fires in smaller regions. To address this gap, we propose a Transformer-based time series forecasting model aimed at improving the accuracy of forest fire predictions in smaller areas. Our study focuses on Quanzhou County, Guilin City, Guangxi Province, China. We utilized time series data from 2021 to 2022, along with remote sensing images and ArcGIS technology, to identify various factors influencing forest fires in this region. We established a time series dataset containing twelve influencing factors, each labeled with forest fire occurrences. By integrating these data with the Transformer model, we generated forest fire danger level prediction maps for Quanzhou County. Our model’s performance is compared with other deep learning methods using metrics such as RMSE, and the results reveal that the proposed Transformer model achieves higher accuracy (ACC = 0.903, MAPE = 0.259, MAE = 0.053, RMSE = 0.389). This study demonstrates that the Transformer model effectively takes advantage of spatial background information and the periodicity of forest fire factors, significantly enhancing predictive accuracy.

Keywords:

transformer; forest fire; prediction; deep learning

1. Introduction

The destructive power of forest fires is immense. Once a forest fire occurs, it is difficult to restore the forest ecosystem in a short period [1]. By strengthening forest fire prevention efforts, forest fires can be prevented, reducing the losses caused by forest fires to zero [2]. They not only provide timber and forest products needed for national construction and people’s livelihoods but also fulfill various missions such as releasing oxygen, regulating climate, nurturing water sources, maintaining soil and water, preventing wind erosion, beautifying the environment, purifying air, reducing noise, and promoting health [3]. At the same time, forests are also essential for stable and high-yield agriculture and animal husbandry. Forests are vital resources for human survival, possessing significant ecological, economic, and social value [4]. Therefore, the development of high-precision forest fire prediction technology plays a significant role in forest fire early warning [5]. Traditional methods for forest fire detection mainly include satellite remote sensing, aerial patrols, and near-ground monitoring, which includes methods such as manual lookout tower observation, wireless sensor monitoring, and remote video detection. Satellite-based forest fire identification primarily utilizes visible light bands, near-infrared bands, and other multispectral data for classification or analysis of the differences between multi-temporal data to identify forest fires [6].

Near-ground monitoring is the most commonly used method for forest fire monitoring, which mainly includes lookout tower monitoring, wireless sensor monitoring, and video surveillance [7]. Lookout tower monitoring is the most traditional method of manual monitoring, mainly relying on human vision and equipment such as telescopes to observe forest areas. However, lookout tower monitoring also has several disadvantages. Firstly, lookout tower observation can only be conducted at designated locations with limited observation fields, making it unable to comprehensively monitor large forest areas. Secondly, monitoring effectiveness is greatly affected by terrain and weather conditions. Additionally, due to reliance on human vision for data collection, there is a possibility of missed detections and false alarms [8].

Wireless sensors are a modern method of forest fire monitoring, involving the deployment of numerous sensors in forest areas to monitor environmental parameters in real-time and transmit data wirelessly to data processing centers for analysis and processing [9]. Wireless sensors can monitor various environmental parameters including temperature, humidity, air pressure, wind speed, and carbon dioxide concentration, and can also detect fire indicators such as smoke and flames [10]. However, wireless sensors also have some limitations. Due to factors such as transmission distance and signal interference, wireless sensors may experience high data transmission delays or losses in certain environments, affecting the accuracy and real-time nature of monitoring data [11].

The research on forest fire prediction has a long history. Early scholars used Poisson models and single weather parameters to predict forest fire occurrences [12]. With the increasing capacity of machine learning to handle large datasets, researchers have been employing this technology for data mining and analysis [13]. Compared to early fire prediction methods, machine learning methods can extract the essential features of more data and better reflect the relationships between data, thus achieving many achievements in forest fire prediction. In 2021, Meriame Mohajane et al. [14] utilized a mixed model of frequency ratio (FR) and five machine learning algorithms to predict forest fire occurrences in the Mediterranean region based on terrain, climate, vegetation, and anthropogenic factors, and compared the performance of various models.

With the widespread adoption of wireless sensor technology, Udaya Dampage et al. [15] deployed a wireless sensor network in forest areas to collect forest environmental information. They developed a forest fire detection system using machine learning techniques to process data, capable of obtaining the most accurate detection results with minimal latency. However, the selection of sensor nodes and base station locations poses a challenge, and the cost of deploying sensor networks is high. Aqil Tariq et al. [16] analyzed the relationship between forest fire events in the Margalla Mountains and socio-economic and environmental variables using machine learning models, revealing that human activities have the greatest impact on forest fire occurrences. Meanwhile, scholars such as Yongqi Pang and Yudong Li [17] also constructed forest fire prediction models using machine learning, incorporating factors such as meteorology, topography, vegetation, and socioeconomic status.

Currently, most forest fire detection algorithms based on deep learning focus on classifying fire targets. A forest fire method based on comprehensive factor optimization of watchtower deployment was proposed in 2020 [18]. Some scholars have proposed new ideas for forest fire detection models [19,20]. The study uses the particle swarm optimization algorithm to select the optimal hyperparameters for the random forest model, and selects the fire-driving factors based on collinearity testing and previous research results [21]. Yiqing Xu et al. [22] analyzed the key factors affecting the success rate of initial firefighting in the Liangshan area of China. Yanyan Sun et al. [23] used LightGBM to generate accurate fire susceptibility maps, improving prediction accuracy. A forest fire risk model is constructed by factors that have a great influence on the probability of inducing fire in Laoshan [24]. Fu Tianju et al. [25] proposed a deep convolutional neural network structure that prevents overfitting by randomly replacing neuron parameters. Based on the characteristics of forest fires, a deep network using multiscale dilated convolutions was proposed in the literature [26]. The novelty of this network lies in its ability to enhance the learning capability for fire features by increasing the receptive field through dilated convolutions. Batch normalization was employed in the convolutional layers to improve the traditional CNN, effectively addressing network overfitting and improving detection accuracy [27]. Wei Xin et al. [28] utilized a pre-trained VGG-16 network for feature extraction from forest fire images. This was followed by a deep convolutional long short-term memory network to sequentially fuse static and dynamic features. Finally, forest fires were classified and detected through fully connected layers. Another study introduced a forest fire detection method based on convolutional neural networks. This method incorporates spatial Transformer networks in the convolutional layers and uses entropy function thresholds in the softmax layer, leading to improved detection accuracy [29].

In recent years, researchers have applied object detection and semantic segmentation from the field of computer vision to forest fire detection. Compared to image classification, these two methods are more accurate in detecting fire areas. Researchers inserted real fire images or simulated fire images into forest backgrounds to artificially synthesize samples and used Faster R-CNN to detect forest fires [30]. To address the efficiency and accuracy issues of drone-based forest fire detection, a forest fire detection algorithm based on the YOLOv3 object detection network was designed, capable of pinpointing the exact location of the fire area and the confidence probability of the fire [31]. Considering the small size of early forest fire areas, a forest fire segmentation algorithm based on SqueezeNet as the main feature extraction network was proposed, which improves the segmentation accuracy of forest fire areas by fusing multiscale contextual information, effectively segmenting early forest fire areas as demonstrated by experimental results [32]. Foreign scholars such as Shamsoshoara et al. achieved pixel-level segmentation of forest fire images using an end-to-end semantic segmentation network U-Net, although this approach requires high accuracy in dataset annotation [33]. Demin Gao et al. [34,35] effectively combined Internet of Things technology with deep learning, providing new ideas for forest fire detection. Yun T. et al. [36] listed the future directions and challenges of deep learning technology in forest development.

In addition to some common forest fire prediction methods, in recent years, researchers from around the world have proposed other constructive fire prediction methods from other perspectives. Carmine Maffei and Massimo Menenti [37] have discovered a new method for predicting forest fire burning area and spreading speed based on the Vertical Humidity Index. Abdul Qadir et al. [38] believe that tropical lowlands and broad-leaved evergreen forests have the highest fire risk. Areas with high tree cover density, low altitude, and low slope have the second highest fire intensity. Multispectral and thermal remote sensing data combined with probability distribution models were used by Carmine Maffei et al. [39] to predict forest fire characteristics. Nimisha Ghosh et al. [40] proposed a framework for predicting forest fires based on the Internet of Things and fog computing. FLCTC is specifically implemented as federated deep learning based on long-term and short-term memory and is used for forest fire prediction [41]. Both linear regression and random forest methods were combined to predict forest fires [42]. Rohan Palz [43] proposed a low-cost system called Mazelink in his paper aimed at detecting and predicting forest fires. Chan Jin Lim et al. [44] predicted the intensity of forest fires and analyzed the likelihood of forest fires occurring in different regions by analyzing the fuel characteristics of forests in Yeongdong and Yeongseo regions of Gangwon Province, South Korea from November 2020 to December 2021. Nikolay Viktorovich Baranovskiy [45] innovatively combines the probability of fire occurrence with the registered and predicted number of fires.

Scholars around the world also have different thoughts and solutions on the probability of forest fires. Slobodan Milanovi et al. [46] innovatively identified the main explanatory variables affecting forest fires in Eastern Serbia by comparing logistic regression and random forest methods, and based on these methods, created a forest fire probability map. Michelle Farf et al. [47] developed a climate probability model for each ENSO stage based on the ERA5 reanalysis estimation of VPD. A method was proposed by Mekala et al. [48] for predicting the probability of forest fires by analyzing the relationship between three key environmental factors, namely humidity, oxygen concentration, and temperature, and the probability of forest fires. Qing Zhou et al. [49] explored the influence of key historical events on the drivers of forest fires in northern China to establish a probability model of forest fire occurrence. Thomas Kitzberger et al. [50] trained fire probability using a random forest model using 23 years of fire records and weather prediction variables for biophysical, vegetation, human activities, and seasonal fires. Yannis Maniatis et al. [51] conducted a probability map of fire risk in the Dodkanis Levkimi Sofli National Forest Park in Greece.

2. Research Area and Research Materials

2.1. Overview of the Research Area

Quanzhou County is under the jurisdiction of Guilin City, Guangxi Zhuang Autonomous Region, located in the northeast of Guangxi Zhuang Autonomous Region, with geographical coordinates between 110°37′ and 111°29′ east longitude and 25°29′ and 26°23′ north latitude, as shown in Figure 1.

The county is surrounded by mountains to the southeast, northwest, and west, with hills in the middle and small plains. The hilly terrain extends southwest and northeast along the direction of the Xiangjiang River and its tributaries, forming a deer-head-shaped terrain in the county, covering an area of approximately 900 square kilometers, which accounts for one-fourth of the total area of Quanzhou County. Quanzhou County has a subtropical monsoon climate with no frost for 298 days a year and an average annual temperature of 17.7 °C. Its main characteristics include strong solar radiation, ample sunshine for most months, abundant rainfall, but uneven seasonal distribution. The climate conditions throughout the year are as follows: prolonged cold in spring with frequent overcast and rainy days, late temperature rise; summer with frequent heavy rains and prevailing southwest winds; autumn with few rainy days and obvious drought; and dry winters with frequent northeast winds and frequent cold air invasions.

The vegetation in the county belongs to the subtropical evergreen broad-leaved forest belt, predominantly transitioning between evergreen broad-leaved forests and deciduous broad-leaved forests, with secondary types being artificial evergreen coniferous forests and natural evergreen coniferous forests. Only a small amount of primary forest is preserved in remote mountain valleys. Forests and grasslands are mainly distributed in the middle and low mountains and hills within the territory, with dense vegetation due to the warm and humid climate, resulting in a forest coverage rate of approximately 30.6%.

2.2. Dataset Overview

Forest fires are caused by various factors and are always influenced by both natural and anthropogenic factors [52]. Therefore, it is crucial to rapidly and accurately obtain information about their geographical environment and identify the spatiotemporal patterns of fire occurrences.

Additionally, it is essential to properly determine the influencing factors for forest fire prediction to improve the predictive accuracy of the model. This study selected four types of data sources, including terrain data, meteorological data, vegetation data, and data related to human activities, for research in Quanzhou County. The selection of these sources involved satellite imagery information and current records of fire information from previous years. Geographic and vegetation factors were obtained from remote sensing data. Weather information includes data on temperature, wind, rainfall, and other variables [53]. Data on human activities include population data and distances to roads, railways, and rivers [54].

For this study, data from April to September 2021–2022 were selected as the historical fire dataset. During this period, there were a total of 30 fires in Quanzhou County, with 24 fires occurring in the summer, accounting for 80% of the total number of fires. Building upon previous research and considering various influencing factors such as meteorological and geographical factors, fourteen forest impact factors were ultimately determined, as shown in Table 1.

2.2.1. Meteorological Influencing Factors

Daily values of five factors including temperature, atmospheric pressure, specific humidity, wind speed, and precipitation were selected as meteorological influencing factors [55]. Based on data provided by the China Integrated Meteorological Information Sharing System (CIMISS), the relationship between temperature and specific humidity and relative humidity (RH) is as follows.

R H = (T M P - 273.5) * P R S / [0.278 * (T M P - 273.5) + 0.622] / (6.112 e^{\frac{17.67 S H U}{S H U + 243.5}})

(1)

In the equation: TMP represents temperature in Kelvin (K); SHU represents specific humidity in kilograms per kilogram (kg/kg); PRS represents atmospheric pressure in Pascals (Pa). Due to the mapping relationship between temperature and specific humidity and relative humidity, to prevent collinearity of features, specific humidity is selected to represent the moisture content in the air.

2.2.2. Geographical Impact Factors

Slope and aspect are selected as two factors for terrain influencing factors [56]. Slope and aspect obtained from the digital elevation model (DEM) retrieved from Landsat satellites are shown in Figure 2a,b.

2.2.3. Human Activity Influencing Factors

Distances to roads, railways, and rivers are selected as three factors for human activity influencing factors [57]. The Euclidean distance from each grid to the road, railway, and river is calculated using the buffer tool in the ArcGIS 10.4 toolbox, as shown in Figure 2c–e.

2.2.4. Vegetation Influencing Factors

Vegetation moisture content, NDVI, ignitability, and vegetation coverage are selected as the four factors influencing vegetation [58,59]. Among them, the normalized difference vegetation index (NDVI) is represented, which can reflect the health of vegetation and vegetation coverage. The moisture content of canopy vegetation is obtained by weighing the dry mass and fresh mass of the plant canopy, as shown in Figure 2f [60]. The reason why we use land satellite sensors to evaluate the moisture content of ground forest fuels is based on previous research results. According to the research of Li Jia, etc. [61], a deep learning-based forest fuel moisture content inversion technique can be used to establish the relationship between remote sensing spectral reflectance and fuel moisture content using the MLP deep learning model, which can indirectly obtain the moisture content. NDVI, which is normalized to reduce the influence of radiation changes related to atmospheric conditions such as solar elevation angle, satellite observation angle, terrain, and cloud shadows, is a method of quantifying vegetation by calculating the difference between near-infrared reflectance (NIR) and red band reflectance (Red), as shown in the equation.

N D V I = \frac{N I R + Re d}{N I R - Re d}

(2)

Vegetation coverage is obtained through NDVI, as shown in Figure 2g. According to the “Technical Regulations for Forest and Grassland Fire Hazard Assessment” in the First National Comprehensive Risk Survey of Natural Disasters, plant species are classified by ignitability, as shown in Figure 2h. Vegetation coverage is obtained by converting vegetation remote sensing images into vegetation coverage, as shown in Figure 2i, with the specific calculation formula as follows:

F V C = \frac{N D V I - N D V I_{\min}}{N D V I_{max} - N D V I_{\min}}

(3)

where FVC represents vegetation coverage;

N D V I_{m i n}

is the NDVI value for bare soil or areas without vegetation cover, taken at the NDVI value when the cumulative percentage is 5%;

N D V I_{\max}

is the NDVI value for fully vegetated areas, taken at the NDVI value when the cumulative percentage is 95%.

2.3. Assessment of the Impact Factors of Forest Fires

Multicollinearity refers to the existence of complete or near-complete linear relationships among variables in an equation, which can lead to loss of significance tests and failure of model predictive capabilities. To further determine whether there are linear relationships among the forest fire influencing factors and prevent inaccuracies in model predictions due to relationships between factors, multicollinearity tests were conducted on the forest fire influencing factors, as shown in Table 2.

2.3.1. Variance Inflation Factor

Introduced by Marquardt in 1906, the variance inflation factor (VIF) is the reciprocal of tolerance, representing the increase in the variance of the regression coefficient estimated by the least squares method when there is collinearity among independent variables, compared to the variance of the regression coefficient estimated when there is no collinearity between independent variables. A larger VIF indicates a stronger degree of multicollinearity among variables. Like the correlation coefficient index for independent variables, diagnosing multicollinearity problems using VIF has no easily determined critical value. Some scholars suggest that when VIF is ≥5 or VIF is ≥10, severe collinearity among independent variables may be present. In the equation, VIF represents the goodness of fit of linear regression between the i-th feature and the remaining features.

R_{i}^{2}

is the sum of squares of the linear correlations between the independent variable and other independent variables, reflecting the degree of collinearity. The formula for calculating the variance influence factor is as follows:

V I F = \frac{1}{1 - R_{i}^{2}}

(4)

2.3.2. Tolerance

Tolerance is the proportion of residuals obtained when each independent variable is regressed against other independent variables as the dependent variable. A smaller tolerance value indicates a greater likelihood of multicollinearity between this independent variable and other independent variables. The value of TOL is reciprocal to the VIF mentioned earlier. If TOL is less than 0.1, it is considered that there is collinearity among the factors.

After testing, among the fourteen factors related to forest fires, the normalized difference in the vegetation index and vegetation coverage factor meet the critical value requirements and are considered collinear. They were screened out to improve the predictive performance of the model.

2.3.3. Pearson Correlation Coefficient

The Pearson correlation coefficient, used to reflect the degree of linear correlation between two random variables, also known as the product-moment correlation, is a method of calculating linear correlation. First, let us understand covariance, which is an indicator reflecting the degree of correlation between two random variables. If one variable increases or decreases simultaneously with another variable, then the covariance of these two variables is positive; otherwise, it is negative. The formula for covariance is as follows, where

x_{i}

and

y_{i}

are the observations of variables x and y,

x_{u}

and

y_{u}

are the averages of variables x and y:

cov (x, y) = \frac{\sum_{i = 1}^{n} (x_{i} - x_{μ}) (y_{i} - y_{μ})}{n - 1}

(5)

Assuming there are two variables, X and Y, the Pearson correlation coefficient between the two variables can be calculated using the following formula:

ρ_{X Y} = \frac{cov (x, y)}{σ_{X} σ_{Y}}

(6)

where

cov (x, y)

is covariance,

σ_{X}

is the standard deviation of X, and

σ_{Y}

is the standard deviation of Y. The Pearson correlation coefficient is obtained by dividing covariance by the product of the standard deviations of the two variables. After calculating the correlation coefficient, the strength of the correlation between variables can be judged based on the following ranges:

When the absolute value of the Pearson correlation coefficient is greater than 0.8, there is a high correlation between the two variables. The result of the experiment indicates a high correlation between NDVI and vegetation coverage, which is screened out to improve the predictive performance of the model.

2.4. Establishment of Datasets

After excluding factors with collinearity, twelve influencing factors were selected. After calculating, it was found that their average information gain rates were all greater than 0, which had an impact on the predictive ability of the model. All of them were retained for subsequent model input. Ignitability has the greatest impact on forest fires. Other factors that have a greater impact include temperature, wind speed, rainfall, relative humidity, etc.

Twelve influencing factors including meteorology, terrain, vegetation, and human activities were selected for normalization to eliminate scale effects. Below is the standardization equation, where

x_{min}

is the minimum value of variable x;

y_{min}

is the minimum value of variable y; and x and y are the independent and dependent variables, respectively:

y = 2 \times \frac{x - x_{min}}{x_{max} - x_{min}} - 1

(7)

Using ArcGIS software 10.4 to convert each feature into a raster, each feature was converted into a raster format with the same pixel size (1000 m × 1000 m), datatype (8-bit unsigned integer), and coordinate system (GCS WGS 1984), resulting in 3910 grids. The characteristics obtained from grid inversion were transformed into point elements, obtaining pixel values for each grid. Latitude and longitude were calculated using the GCS WGS 1984 coordinate system, and the combination of the date, latitude, and longitude was used as indicators to compose the dataset, including meteorology, terrain, vegetation, and human activities. It consists of twelve influencing factors and is a dataset with labels indicating whether a fire occurred.

In forest fire problems, datasets often exhibit imbalanced samples, with significantly more samples of non-fire occurrences than fire occurrences. Based on the first law of geography proposed by Waldo Tobler (1970), and considering the spatial clustering characteristics of fires, oversampling has been proven to be a robust solution to this problem, as referenced in [62]. Correlation between objects is related to distance. Generally, the closer the distance, the greater the correlation between fire points. Therefore, this study adopts a combined method of buffer analysis and oversampling to increase the number of fire events and mitigate the impact of sample imbalance to some extent. Initially, 20% of the 1,431,060 samples (286,212 samples) were partitioned as the test set. Then, the remaining samples were oversampled to form a buffer zone of 5 square kilometers, where pixels within the buffer zone were rescaled to 1 (fire), while pixels outside the buffer zone were rescaled to 0 (non-fire). Through the processing of forest fire data as described above, the forest fire dataset used in this study was obtained.

3. Research Method

Firstly, fourteen influencing factors related to forest fires were selected. These were then linked with historical fire data based on timestamps, latitude, and longitude to establish a raw dataset. Simultaneously, the dataset was preprocessed, and multicollinearity tests along with Pearson coefficients were used to screen the features of the factors. Afterward, the Transformer model structure was designed, which was then combined with learning the spatiotemporal properties of various forest fire-driving factors. Subsequently, their parameters were adjusted to train the model classifier. Finally, a sliding window was used to validate and evaluate the model’s performance. The model architecture is depicted in Figure 3.

3.1. Input Embedding

In the Transformer, the input of words is represented by x, which is obtained by adding word embedding and position embedding. The structure of the used Transformer is shown in Figure 4.

3.1.1. Word Embedding Layer

The word embedding layer is responsible for transforming natural language into unique word vector representations. The task of deep learning is to map high-dimensional raw data to low-dimensional manifolds, making the high-dimensional raw data separable after being mapped to low-dimensional manifolds, and this mapping is called embedding. Word embedding maps the word, w, to the vector, x. If two words have similar meanings, then the Euclidean distance between the two word vectors obtained after mapping is small.

3.1.2. Position Embedding Layer

In addition to word embedding, Transformer also requires the use of position embedding to represent the position of words in a sentence. In an RNN model, sequences are processed sequentially, where the state at time

H_{t}

depends on the state at time

H_{t - 1}

, inherently containing positional information within the sentence. However, the Transformer model utilizes global information, processing the entire sentence’s information at once by encoding all words in the sentence simultaneously. Therefore, it cannot capture sequential order information within the sequence. Hence, in Transformer, position embedding is needed to retain the relative or absolute position of words in the sequence. Compared to RNN, where each character is input one by one, naturally preserving the sequential relationship information of each character, the Transformer uses a self-attention mechanism to extract information.

In the Transformer, each word in a sentence is processed in parallel, considering all words’ influences on it. However, it does not consider the positional information among words, i.e., the context. Hence, positional information needs to be introduced. All the formula theories in model construction are derived from the article by A Vaswani et al. [63]. Positional encoding in the Transformer represents the positional information of each word. It is defined as follows:

P E_{p o s, i} = \{\begin{matrix} sin (w_{i} \times p o s) & , i is even \\ cos (w_{i} \times p o s) & , i is odd \end{matrix}

(8)

where

w_{i}

means

\frac{1}{10000^{\frac{2 i}{d_{m o d e l}}}}

, i means

0, 1, 2, 3 \dots, \frac{d_{m o d e l}}{2} - 1

, variable

p o s

represents the actual position of a word in a sentence,

d_{m o d e l}

represents the dimension of the word vector, which is 512 in this context, i belongs to [0,

d_{m o d e l}

), representing the i-th dimension of the positional vector, indicating the (i + 1)-th dimension of the word vector, and

P E_{p o s, i}

represents the positional encoding vector at the position in the sentence, which can be used to provide positional information for each word in the sentence. In other words, by injecting positional information for each word, we enhance the model input. Word embedding and positional embedding are added together to obtain the word representation vector X, which serves as the input vector for the encoder. This enables consideration of word order by the self-attention mechanism while inputting all words simultaneously.

3.2. Encoder Embedding

In the encoding phase, the process begins with the word vector layer input embedding and the positional encoding layer. After these initial steps, the final input flows through the self-attention layer, which includes multi-head attention, followed by residual connections and layer normalization, (add&norm); this is succeeded by a feed-forward neural network layer, again, passing through residual connections and layer normalization (add&norm), resulting in the output of the encoder, which will interact with the decoder later.

3.2.1. Self Attention

The key innovation of Transformer lies in its self-attention mechanism, which is the core cornerstone of its architecture [64]. It allows the model to not only consider information at the current position when generating the representation of the current word but also to fully consider the contextual information of all words in the input sequence and weight them for fusion. This enables the model to better capture dependencies between words and long-range dependencies. The computational efficiency of the attention mechanism is improved by using matrix operations and parallel computing. For a machine, the attention mechanism assigns weights (decimals between 0 and 1), where more important or more relevant areas are assigned higher weights. First, let us understand how to compute self-attention using vector calculations, and then we will look at its matrix implementation. The formula for calculating it is as follows:

A t t e n t i o n (Q, K, V) = s o f t max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(9)

where Q is the query matrix, K is the content you want to pay attention to,

Q K^{T}

is a dot product operation that calculates the attention weight of Q on V. The purpose of scaling through

\sqrt{d_{k}}

is to avoid having too much dot product, as when the dot product is too large, the gradient passing through softmax will be very small. Additionally, the advantage of softmax is that it facilitates gradient calculation for backpropagation while smoothing the results to the 0–1 range. The initial Q, K, and V are the same, obtained by adding word embedding and position embedding. It is worth noting that K and V in the second layer of the decoder come from the encoder, and Q comes from the output of the first layer of the decoder. This also leads to the inability to parallelize the decoder stage. As described above, this embodies the concept of the attention mechanism.

3.2.2. Multi-Head Attention

In Transformer, the attention module repeats parallel computations multiple times. Each parallel computation is referred to as an attention head. The attention module splits the parameter matrices of its query, key, and value into N segments, and each split segment independently passes through a separate attention head. It is important to note that: Splitting is purely logical. For the parameter matrices query, key, and value, there are no physical divisions into independent matrices corresponding to each attention head. Logically, each attention head corresponds to an independent portion of the query, key, and value. Each attention head does not have its own linear layer. Instead, all attention heads share a linear layer and operate on separate logical portions. It is defined as follows:

\begin{matrix} M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{o} \end{matrix}

(10)

where

h e a d_{i}

means attention (

{Q W_{i}}^{Q}, {K W_{i}}^{K}, {V W_{i}}^{V}

).

W_{i}^{Q} \in

R^{d_{m o d e l *} d_{k}}

,

W_{i}^{K} \in

R^{d_{m o d e l *} d_{k}}

,

W_{i}^{V} \in

R^{d_{r m o d e l *} d_{v}}

,

W_{i}^{O} \in

R^{d_{v *} d_{m o d e l}}

. If

h

= 8, then

d_{k}

=

d_{v}

=

d_{m o d e l} / h

= 64. The paper assumes

d_{m o d e l}

= 512, referring to the hidden dimension of the output, which is also equal to the word vector dimension. Each head calculates its own attention to learn different aspects of attention in parallel, which is then concatenated and multiplied by

W^{o}

to obtain the final feature representation. Now, with Q, K, and V matrices allocated to each head, these matrices are first used to compute attention scores, followed by merging the attention scores from each head. This merging operation is essentially the opposite of the splitting operation. It is accomplished by reshaping the result matrix to eliminate the n-heads dimension, resulting in the final attention scores. This enables a more detailed capture and expression of various relationships and subtle differences between each word.

Under the mechanism of multi-head attention, as observed, the embedding vectors of the input and target sequences are logically split into multiple heads. This means that different parts of the embedding vectors can represent different aspects of meaning for each word, and each word’s different meanings are related to other words in the sequence. This allows the Transformer to capture richer representations of sequences.

3.2.3. Add & Norm Layer

In the Transformer, after each sub-layer self-attention, feed-forward neural network, a residual module is applied, along with layer normalization. The ‘Add’ operation utilizes the residual neural network concept, where the input matrix a of multi-head attention is directly added to the output matrix b. This enables deeper training of the network, passing information from the previous layer to the next layer without loss, a method commonly used in image processing models like ResNet.

Norm refers to layer normalization. When optimizing with gradient descent, as the network depth increases, the feature distribution of input data continually changes. To ensure the stability of data feature distribution, normalization is applied. This allows for the use of larger learning rates, speeding up the convergence of the model. Additionally, normalization also has a certain regularization effect, making the training process smoother. Specifically, the main function of normalization is to normalize each layer’s features before they are input into the activation function, transforming them into data with mean 1 and variance 0. This helps prevent data from falling into the saturation region of the activation function, reducing the problem of gradient disappearance.

3.2.4. Feed-Forward Layer

The feed-forward layer is relatively simple, consisting of a two-layer fully connected layer. The first layer uses the ReLU activation function, while the second layer does not use an activation function. Notably, within the internal structure of multi-head attention, the main operations performed are matrix multiplication scaled dot-product attention, i.e., linear transformations. The learning capacity of linear transformations is weaker compared to non-linear transformations. Hence, a layer normalization layer is added after the attention layer to standardize vectors, moving data to the activation function’s effective area, and allowing the ReLU activation function to perform better.

W_{1}

,

b_{1}

are the weights and biases of the first layer.

W_{2}

,

b_{2}

are the weights and biases of the second layer. It is defined as follows:

F F N (x) = max (0, x W_{1} + b_{1}) W_{2} + b_{2}

(11)

Additionally, in the fully connected layer, the process of mapping data to a higher-dimensional space before mapping it back to a lower-dimensional space allows for the learning of more abstract features. This strengthens the expressive power of word features and enhances the relationships between words in the text.

3.3. Decoder Layer

After passing through the word vector layer and positional encoding layer, the final input flows through the masked multi-head attention layer (masks out all subsequent words after the current word), followed by residual connections and layer normalization (add&norm), the cross-attention layer. Multi-head attention interacts with the output of the encoder and the information from the decoder, where the Q matrix comes from the decoder, and the K and V matrices come from the output of the encoder, followed by residual connections and layer normalization (add&norm), the feed-forward neural network layer, and again through residual connections and layer normalization, resulting in the output of the decoder.

3.3.1. Attention Mask

The mask represents masking, where certain values are masked to have no effect during parameter updates. There are two types of masks in the Transformer model: a padding mask and sequence mask. A padding mask is used in all scaled dot-product attention mechanisms, while a sequence mask is only used in the decoder’s self-attention.

Since the lengths of input sequences vary within each batch, alignment of input sequences is necessary. Specifically, zeros are padded to the end of shorter sequences and longer sequences are truncated, discarding excess. Since these padded positions are meaningless, the attention mechanism should not focus on them, thus requiring some processing. The specific approach involves adding a very large negative number to the values at these positions. This causes the probabilities at these positions to approach 0 after applying softmax.

The sequence mask is used to prevent the decoder from seeing future information. That is, at time t, the decoder’s output should only depend on the outputs before time t, not on those after t. This requires a method to hide information from the future. The specific approach uses an upper triangular matrix to mask the decoder’s input.

3.3.2. Linear and Softmax Layer

In this layer, the last decoder in the decoder stack passes its output to the output component, which converts it into the final target sentence. The linear layer is a simple fully connected neural network that maps the output vector of the decoder stack to a longer vector, known as the logits vector, which represents word scores. Each word in the vocabulary has a score at each position in the sentence. The softmax layer converts these scores into probabilities summing to 1. At each position, the word index with the highest probability is selected, and this index is mapped to the corresponding word in the vocabulary. These words constitute the output sequence of the Transformer.

4. Experiment

4.1. Parameter Selection

Time series prediction problems are transformed into supervised learning problems, and the dataset is organized into common data types in supervised learning. In the Transformer model, six encoder and six decoder layers are set, with the number of neurons in each layer set to the number of grid points (3910) in the region. The AdaGrad optimization algorithm is used to find the optimal solution for the overall loss function, with a batch size of 256 and an initial learning rate of 0.001. A sliding window validation is performed on the validation set. The training model utilizes 141,730 samples of 3-day data to predict the next day’s 3910 samples. The length of each training set window for each fold is set to 141,730, with a prediction range of the subsequent 3910 samples, which serve as the validation set. The step size for each fold is 3910, continuously adjusting the learning rate to minimize model error. The lowest loss value is achieved when the learning rate is set to 0.00075, indicating that the model has reached the desired accuracy. The cross-entropy loss function is used to compare the losses between the training set and the validation set. Gradient computation of the loss function is performed, and the model is trained through backpropagation. The loss function comparison between the training set and validation set is shown in Figure 5.

Transformer is a deep neural network model with a large number of parameters. It is through the combination of these parameter values that the corresponding predictions are generated. The Transformer updates its parameter weights based on the prediction results through a process called backpropagation, which is an important step in pre-training and incurs significant computational cost as it involves multiple iterations to gradually approach convergence. The figure below illustrates that convergence is essentially achieved after 100 iterations, so setting the number of iterations to 100 enhances model training efficiency.

Defining an appropriate loss function is essential to evaluate the model’s performance during training. For machine translation tasks, the commonly used loss function is the cross-entropy loss function, which compares the differences between the English sentences generated by the model and the correct English sentences. During training, the model parameters, such as those in the fully connected neural network and the attention mechanism, are updated using the backpropagation algorithm and an optimizer, such as the AdaGrad optimizer, to minimize the loss function. Training is typically conducted using mini-batch training, where the training data are divided into multiple batches, each containing a certain number of samples.

4.2. Forest Fire Susceptibility Mapping

After training, our study converts the forest fire prediction results generated by the network model into discrete danger levels for easier interpretation. Since the model’s output is probabilities ranging from 0 to 1, it is not straightforward to determine the direct interpretation of forest fire danger levels. To address this issue, a classification scheme is designed to map these probabilities to five different fire danger levels. According to the technical specifications for China’s first national comprehensive risk survey of natural disasters, the Technical Regulations for Forest and Grassland Fire Hazard Assessment, the danger of forest fires is predicted daily, and the annual average value is drawn into a danger map based on five levels. The probability of fire prediction is very low in the range of 0 to 0.2 danger level, low in the range of 0.2 to 0.4 danger level, moderate in the range of 0.4 to 0.6 danger level, high in the range of 0.6 to 0.8 danger level, and very high in the range of 0.8 to 1 danger level. Once the danger levels for each grid cell in the dataset are obtained, we can visualize this information on a map. To provide a continuous and comprehensive view of the danger level across the entire study area, interpolation methods are used in the ArcGIS environment. Interpolation allows us to estimate the danger level at locations where direct observations are not available based on known values at nearby locations. The interpolation procedure generates a smooth surface where each point represents an estimated danger level, resulting in a comprehensive regional forest fire danger map that provides an easily understandable and intuitive representation of areas with different danger levels. These maps can help forest fire management authorities allocate resources more effectively and develop targeted prevention and control strategies.

In order to better verify the predictive effect of twelve factors on forest fires, we also made prediction charts for the forest fire danger levels of the entire county for four single factors, which are temperature, rainfall, vegetation moisture content, and aspect, as shown in Figure 6.

Temperature, rainfall, and vegetation moisture content will change significantly as time goes by, while the change of aspect is relatively small. The reasons for choosing these four single factors are as follows: Temperature—high temperatures cause vegetation and debris to lose moisture rapidly, becoming drier and more flammable. Rising temperatures increase the speed at which flames spread. In high-temperature environments, heat is more easily conducted and convected, facilitating the expansion of flames in all directions. Rainfall—rainfall can increase the moisture content of vegetation and debris, in forests, making these fuels less likely to burn. Moist fuel requires more energy to ignite, thereby reducing the risk of fire. Vegetation moisture content—vegetation with high moisture content requires more energy to ignite. Vegetation with high moisture content burns more slowly, and flames are less likely to spread quickly. High moisture content in vegetation lowers the temperature and intensity of the flames. Aspect—in the northern hemisphere, southward slopes receive more solar radiation, have higher temperatures, faster evaporation of water, lower humidity of air and fuel, and vegetation and ground fuels are more prone to drying, thereby fires more likely to occur and spread. Finally, we generated a forest danger level map with twelve influencing factors working together, as shown in Figure 7.

From the forest fire danger forecast map results, it can be observed that forest fire danger points are mainly distributed in the central and southern regions of the state. The forest fire susceptibility maps generated by the Transformer model and the long short-term memory model show relatively uniform danger levels, which are more consistent with the actual situation [65]. In fact, the probability of fire occurrence in various areas of the state is relatively small. In contrast, the early recurrent neural network deep learning model predicts a higher danger of forest fires in the entire Chongli area, leading to overfitting, and the differentiation of fire levels in many locations is not detailed enough, often merging into one. This indicates shortcomings in long-term forest fire danger prediction in traditional deep learning models.

4.3. Model Evaluation Metrics

Evaluation metrics are crucial factors in building classifier models and validating their performance. In this study, a regional forest fire classification prediction model is established, and the performance of the Transformer model is evaluated using the following four evaluation metrics, as shown in Figure 8.

4.3.1. Mean Absolute Percentage Error

MAPE is commonly used to measure the accuracy of predictions compared to actual values. It represents the average percentage error between predicted values and actual values. Generally, a MAPE less than 10% is considered a good prediction model, while a MAPE between 10% and 20% suggests acceptable prediction accuracy. However, if MAPE is greater than 20%, the prediction effect is not ideal, and further improvement of prediction model accuracy is needed, where n is the number of samples,

y_{i}

is the true value,

\hat{y_{i}}

is a predicted value. The calculation formula is as follows:

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{\hat{y_{i}} - y_{i}}{n} |

(12)

4.3.2. Mean Absolute Error

MAE, also known as mean absolute deviation, is another commonly used evaluation metric. It calculates the average absolute value of prediction errors for each sample. The advantage of MAE is that it provides an intuitive measure of the magnitude of the difference between model-predicted values and actual values. It does not consider the sign of prediction errors and focuses more on the magnitude of absolute errors, making it better at reflecting the deviation of prediction values. MAE ranges from

[0, + \infty)

, with a value of 0 indicating a perfect model when the predicted values match the actual values exactly; the larger the error, the larger the value, where n is the number of samples,

y_{i}

is the true value,

\hat{y_{i}}

is a predicted value. The calculation formula is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | \hat{y_{i}} - y_{i} |

(13)

4.3.3. Root Mean Square Error

RMSE is one of the commonly used metrics for evaluating the accuracy of prediction models. It measures the degree of deviation between predicted values and actual values by calculating the square root of the sum of squares of differences between predicted values and actual values, where n is the number of samples,

y_{i}

is the true value, and

\hat{y_{i}}

is a predicted value. The calculation formula is as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(14)

RMSE ranges from 0 to positive infinity, with smaller values indicating smaller prediction errors and stronger prediction capabilities of the model. In practical applications, RMSE is commonly used to evaluate the prediction accuracy of regression models.

4.3.4. Accuracy

Accuracy represents the proportion of correctly predicted samples to the total number of samples. TP: samples that are correctly predicted as positive and whose true class is positive. FP: samples that are incorrectly predicted as positive despite their true class being negative. FN: samples that are incorrectly predicted as negative though their true class is positive. TN: samples that are correctly predicted as negative and whose true class is negative. The calculation formula is as follows:

A ccuracy = \frac{T P + T N}{T P + T N + F P + F N}

(15)

5. Discussion

5.1. Model Comparison

Due to the presence of long-term temporal dependencies in forest fire-based time series data, we chose the Transformer model, which excels in handling time series data. The Transformer model utilizes self-attention mechanisms to capture dependencies between different positions in the input sequence. This enables the Transformer to effectively handle long-range dependencies in time series data, including trends, seasonality, and periodicity. Additionally, the Transformer model exhibits better scalability compared to traditional recurrent neural networks (RNNs) and long short-term memory networks (LSTMs), making it easier to handle longer time series data [66,67]. This advantage makes the Transformer particularly advantageous for time series prediction tasks requiring modeling over longer time spans. In the experiment of this article, we can clearly see that the model’s prediction accuracy from high to low is Transformer (90.3%), LSTM (89.5%), and RNN (87.7%), respectively. The Transformer has shown superior performance in other indicators such as MAE (0.053), MAPE (0.259), and RMSE (0.389). It can be seen that the Transformer can effectively complete forest fire prediction tasks, and has stronger generalization ability, noise resistance, and ability to alleviate overfitting compared to other traditional models.

RNN has its unique advantage in processing sequence data, where neurons remember previous inputs by transmitting state information. This enables RNNs to perform well in handling forest fire prediction problems, capturing temporal dependencies in forest fires. RNN can handle input and output sequences of different lengths. It can generate outputs of different lengths as needed and has flexibility in predicting results. RNNs can also share neural network parameters when processing sequences, thereby reducing the computational complexity of the model. RNN remembers previous information through its internal state, which gives it an advantage in processing forest fire data with long-term dependencies [68].

However, RNN also has some issues that cannot be ignored. When dealing with long-term forest fires, RNN is prone to problems such as vanishing or exploding gradients. During the training process, the weight matrix is continuously multiplied, and the product of gradients in long-term forest fire factor sequences may become very small or very large, leading to training difficulties. Traditional RNNs need to sequentially calculate the state of each time step when processing long sequences, resulting in low computational efficiency. This may make RNNs impractical when dealing with large-scale sequence data. Importantly, due to the sequential nature of RNN, the calculation of each time step must be carried out in order. This makes it difficult for RNN to parallelize effectively during training and prediction, limiting its application on large-scale forest fire datasets.

A long short-term memory (LSTM) network is a variant of a recurrent neural network (RNN) specifically designed to solve time series prediction problems. The LSTM network introduces a memory cell that can store and update information. The memory unit controls the reading, writing, and forgetting of information through forget gates, input gates, and output gates, thereby maintaining and updating relevant information in the sequence. By introducing gating mechanisms, LSTM networks can effectively solve the problems of vanishing gradients and long-term dependencies in traditional RNNs, enabling the model to better handle sequence data that requires long-term span memory. LSTM networks are suitable for handling various time series prediction problems, including but not limited to forest fire prediction, text generation, speech recognition, machine translation, etc. It can handle input and output of variable length sequences, adapting to data of different lengths and time intervals. In addition, the parameters of the LSTM network are shared in time steps, which effectively reduces the number of model parameters, reduces the risk of overfitting, and improves the computational efficiency of the model.

The drawbacks of LSTM cannot be ignored, as compared to traditional RNNs, LSTM networks have higher computational complexity. This is mainly due to the introduction of multiple gate control units and memory units, which increases the computational and storage costs. LSTM networks may face difficulties in parameter tuning. It has many adjustable parameters, including the weights and biases of forgetting gates, input gates, and output gates. The complexity of parameter tuning requires more experience and practice to optimize the performance of the model. It cannot be ignored that LSTM networks typically require a large amount of training data to fully leverage their advantages. For problems with limited data volume, the model may be prone to overfitting, leading to performance degradation.

The Transformer model established in this article combines the advantages of LSTM and improves on the disadvantages of LSTM. The Transformer model can calculate the entire sequence in parallel without the need to calculate in chronological order, which improves the computational efficiency of the model. In contrast, LSTM models require step-by-step calculations in chronological order and cannot be effectively parallelized. The Transformer model captures long-term dependencies in sequences through self-attention mechanisms. The self-attention mechanism allows the model to consider information from other positions in the sequence while calculating the representation of each position, thus better capturing the global context. In contrast, traditional RNNs or LSTMs pose significant challenges for long-term dependency modeling.

Due to the existence of the self-attention mechanism, the Transformer model is able to provide global context awareness for each position in the sequence [69]. This enables the model to better understand the importance and relationships of different positions in the sequence, thereby improving the accuracy of prediction. The attention mechanism in the Transformer model can be seen as a way of parameter sharing, which reduces the number of model parameters and the risk of overfitting during the training process. The Transformer model has strong adaptability and can handle sequence data of various lengths and time intervals. It is not limited by a fixed window length and can adaptively model sequences to adapt to different input and output lengths.

5.2. Further Development

Although Transformer has the aforementioned advantages in forest fire prediction, there are also some limitations. For example, the Transformer model may require more training data to fully leverage its advantages, and modeling certain sequence data may require deeper models and larger computing resources [70]. In addition, the interpretability of the Transformer model is relatively weak, not as intuitive as LSTM and RNN, and may be more difficult to explain the decision-making process of the model. Moreover, there are some limitations, such as the need for further research on the impact of other fire factors on fire prediction. Transformer models typically have large parameter sizes, requiring substantial computational resources for training and inference. This necessitates more computational resources and time when dealing with large-scale time series data. Additionally, due to the nature of self-attention mechanisms, Transformer models face challenges in computational and storage costs when handling longer sequences. Long sequences increase the computational complexity and memory consumption of the model, which limits its efficiency and performance in handling long sequence data.

Based on the findings and limitations identified in this study, several potential directions have been proposed for future research:

Small sample learning: For time series prediction problems with a small number of labeled samples, how to perform small sample learning in Transformer models is an important research direction. This may involve the application of techniques such as semi-supervised learning, transfer learning, and meta-learning to improve the model’s generalization ability and robustness.
Time correlation modeling: The current Transformer model has relatively weak handling of time correlation when modeling sequences. Future improvements can explore how to introduce time-related components into the Transformer, enabling the model to better understand and utilize temporal information.
Greater data size: A larger dataset can provide richer samples and wider data distribution, thereby helping the Transformer model better learn patterns and patterns in time series data. By training on large-scale datasets, the model can better generalize to new and unseen data, improving the accuracy and stability of predictions. More data helps the model better capture the true distribution of data, reducing excessive reliance on noise and specific samples in the training data. A larger dataset can provide more fine-grained time series patterns and correlation information. The Transformer model can learn more features and structures from these rich data, helping to discover more complex time dependencies and prediction patterns.

6. Conclusions

At the beginning of the paper, we introduced the hazards of forest fires and their related research work. We provided a brief introduction to traditional forest fire prediction methods, as well as more advanced machine learning and deep learning methods. Then we also included other methods for predicting forest fires, looking at the possibility of predicting fires from other perspectives. Finally, we introduced the latest research progress from around the world on the probability prediction of fires.

During the research process, initially, we obtained data from sources such as high-resolution imaging spectrometers and the Chinese Comprehensive Meteorological Information Service System for fourteen influencing factors. Subsequently, we conducted preprocessing operations such as band calculation, data integration, and clipping stitching using ArcGIS. We then normalized and rasterized the influencing factors to obtain pixel values for each grid. Next, we used oversampling and buffer analysis methods to address the issue of class imbalance in fire label categories. Finally, we conducted Pearson analysis and multicollinearity tests to select and validate fire factors, resulting in the final dataset with fire labels for all counties in the state.

Considering the event dependency, cyclical variation, and non-linear relationships in forest fire data, we defined the architecture of the Transformer model. We utilized the AdaGrad optimizer and backpropagation algorithm to adjust model parameters to minimize the cross-entropy loss function, thus improving prediction accuracy.

We evaluated the model performance using a validation set and assessed the model’s generalization ability on location data. Evaluation metrics such as root mean square error and mean absolute error were used to evaluate the accuracy of the model predictions. Finally, we created a forest fire probability prediction map for all counties in the state. This study provides valuable insights for forest fire prevention planning and fire avoidance, aiding forest fire management authorities in more effectively allocating resources and devising targeted prevention and control strategies.

Overall, the research significance and main achievements of this article are as follows:

Forest hazard level prediction based on small areas: We noticed that in the current research on forest fire warning, most of the research areas are set in super large geographic spaces such as boundary, state, county, and province, and the prediction of fires in small areas is still not accurate and detailed enough. Therefore, our research area only considered a county-level unit in Guilin City, Guangxi Province, China, and adopted advanced Transformer architecture models for fire warning, achieving good predictive performance.
Multi-angle influencing factors: Comprehensively consider which factors will promote forest fires from four aspects: meteorological factors, geographical factors, cultural factors, and vegetation factors. And these influencing factors also have universal applicability to forest fires around the world.
Effective early warning of forest fire hazard levels: The use of a Transformer for specific tasks in forest fire prediction has shown good results (ACC = 0.903). This model accurately predicts the time and potential severity of fires, demonstrating its practical application potential in natural disaster prevention and response.
Robust hyperparameters: It has been observed that the model exhibits robustness to different hyperparameter choices, such as batch size, time steps, and epochs. This characteristic makes the model suitable for various application contexts and data characteristics. In other time series-based global forest fire prediction tasks, these hyperparameters can also be attempted to migrate and use.

Forests are vital resources for human survival, possessing significant ecological, economic, and social value. Therefore, the development of high-precision forest fire prediction technology plays a significant role in forest fire early warning.

Author Contributions

Y.C. devised the programs and drafted the initial manuscript. Y.C. contributed to the writing and the experiments. X.Z. and S.R. helped with the data collection and data analysis. Y.W. and Y.Y. helped to improve the manuscript and modified the model in the later stage. C.L. revised the initial manuscript. Z.Z. and Y.C. designed the project and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a project funded by the Provincial General Project of College Student Practical Innovation Training Program (202310298113Y) and College Student Practical Innovation Training Program Project (2022NFUSPITP0237).

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Tomar, J.S.; Kranjcic, N.; Durin, B.; Kanga, S.; Singh, S.K. Forest Fire Hazards Vulnerability And Risk Assessment In Sirmaur District Forest Of Himachal Pradesh (India): A Geospatial Approach. ISPRS Int. J. Geo-Inf. 2021, 10, 447. [Google Scholar] [CrossRef]
Feurdean, A.; Vannière, B.; Finsinger, W.; Warren, D.; Connor, S.C.; Forrest, M.; Liakka, J.; Panait, A.; Werner, C.; Andrič, M.; et al. Fire hazard modulation by long-term dynamics in land cover and dominant forest type in eastern and central Europe. Biogeosciences 2020, 17, 1213–1230. [Google Scholar] [CrossRef]
Földi, L.; Kuti, R. Characteristics of forest fires and their impact on the environment. Acad. Appl. Res. Mil. Public Manag. Sci. 2016, 15, 5–17. [Google Scholar] [CrossRef]
Hanewinkel, M.; Hummel, S.; Albrecht, A. Assessing natural hazards in forestry for risk management: A review. Eur. J. For. Res. 2011, 130, 329–351. [Google Scholar] [CrossRef]
Tuyen, T.T.; Jaafari, A.; Yen, H.P.H.; Nguyen-Thoi, T.; van Phong, T.; Nguyen, H.D.; van Le, H.; Phuong, T.T.M.; Nguye, S.H.; Prakash, I.; et al. Mapping Forest Fire Susceptibility Using Spatially Explicit Ensemble Models Based On The Locally Weighted Learning Algorithm. J. Abbr. 2021, 63, 101292. [Google Scholar] [CrossRef]
Yuan, C.; Zhang, Y.; Liu, Z. A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can. J. For. Res. 2015, 45, 783–792. [Google Scholar] [CrossRef]
Barmpoutis, P.; Papaioannou, P.; Dimitropoulos, K.; Grammalidis, N. A review on early forest fire detection systems using optical remote sensing. Sensors 2020, 20, 6442. [Google Scholar] [CrossRef] [PubMed]
Sun, F.; Yang, Y.; Lin, C.; Liu, Z.; Chi, L. Forest fire compound feature monitoring technology based on infrared and visible binocular vision. J. Phys. Conf. Ser. 2021, 1792, 012022. [Google Scholar] [CrossRef]
Yuan, C.; Liu, Z.; Zhang, Y. Aerial images-based forest fire detection for firefighting using optical remote sensing techniques and unmanned aerial vehicles. J. Intell. Robot. Syst. 2017, 88, 635–654. [Google Scholar] [CrossRef]
Singh, P.K.; Sharma, A. An insight to forest fire detection techniques using wireless sensor networks. In Proceedings of the International Conference on Signal Processing, Computing and Control, Solan, India, 21–23 September 2017. [Google Scholar]
Alkhatib, A.A.A. Smart and low cost technique for forest fire detection using wireless sensor network. Int. J. Comput. Appl. 2013, 81, 12–18. [Google Scholar] [CrossRef]
Jiang, Y.; Zhuang, Q.; Mandallaz, D. Modeling large fire frequency and burned area in canadian terrestrial ecosystems with poisson models. Environ. Model. Assess. 2012, 17, 483–493. [Google Scholar] [CrossRef]
Stojanova, D.; Panov, P.; Kobler, A.; Džeroski, S.; Taškova, K.; Katerina, T. Learning to predict forest fires with different data mining techniques. In Proceedings of the Conference on Data Mining and Data Warehouses, Barcelona, Spain, 12–15 December 2006. [Google Scholar]
Mohajane, M.; Costache, R.; Karimi, F.; Pham, Q.B.; Essahlaoui, A.; Nguyen, H.; Laneve, G.; Oudija, F. Application of remote sensing and machine learning algorithms for forest fire map** in a Mediterranean area. Ecol. Indic. 2021, 129, 107869. [Google Scholar] [CrossRef]
Dampage, U.; Bandaranayake, L.; Wanasinghe, R.; Kottahachchi, K.; Jayasanka, B. Forest fire detection system using wireless sensor networks and machine learning. Sci. Rep. 2022, 12, 46. [Google Scholar] [CrossRef] [PubMed]
Ghaderizadeh, S.; Abbasi-Moghadam, D.; Sharifi, A.; Tariq, A.; Qin, S. Multiscale dual-branch residual spectral–spatial network with attention for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5455–5467. [Google Scholar] [CrossRef]
Pang, Y.; Li, Y.; Feng, Z.; Feng, Z.; Zhao, Z.; Chen, S.; Zhang, H. Forest Fire Occurrence Prediction in China Based on Machine Learning Methods. Remote Sens. 2022, 14, 5546. [Google Scholar] [CrossRef]
Zhang, F.; Zhao, P.; Xu, S.; Wu, Y.; Yang, X.; Zhang, Y. Integrating multiple factors to optimize watchtower deployment for wildfire detection. Sci. Total Environ. 2020, 737, 139561. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Li, J.; Zhang, L.; Liu, H.; Zhang, F. CNTCB-YOLOv7: An Effective Forest Fire Detection Model Based on ConvNeXtV2 and CBAM. Fire 2024, 7, 54. [Google Scholar] [CrossRef]
Zhang, L.; Li, J.; Zhang, F. An Efficient Forest Fire Target Detection Model Based on Improved YOLOv5. Fire 2023, 6, 291. [Google Scholar] [CrossRef]
Shi, C.; Zhang, F. A Forest Fire Susceptibility Modeling Approach Based on Integration Machine Learning Algorithm. Forests 2023, 14, 1506. [Google Scholar] [CrossRef]
Xu, Y.; Zhou, K.; Zhang, F. Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China. Forests 2023, 14, 740. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, F.; Lin, H.; Xu, S. A Forest Fire Susceptibility Modeling Approach Based on Light Gradient Boosting Machine Algorithm. Remote Sens. 2022, 14, 4362. [Google Scholar] [CrossRef]
Zhao, P.; Zhang, F.; Lin, H.; Xu, S. GIS-Based Forest Fire Risk Model: A Case Study in Laoshan National Forest Park, Nanjing. Remote Sens. 2021, 13, 3704. [Google Scholar] [CrossRef]
Fu, T.; Liu, X.; Liu, D.; Yang, Z. A deep convolutional feature based learning layer-specific edges method for segmenting OCT image. Ninth Int. Conf. Digit. Image Process. 2017, 10420, 480–484. [Google Scholar]
Li, T.; Zhao, E.; Zhang, J.; Hu, C. Detection of wildfire smoke images based on a densely dilated convolutional network. Electronics 2019, 8, 1131. [Google Scholar] [CrossRef]
Santos, C.F.G.D.; Papa, J.P. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Comput. Surv. 2022, 54, 1–25. [Google Scholar] [CrossRef]
Wei, X.; W, S.; Wang, Y. Forest fire smoke detection model based on deep convolutional long and short-term memory network. Comput. Appl. 2019, 39, 2883–2887. [Google Scholar]
Lin, J.; Lin, H.; Wang, F. A semi-supervised method for real-time forest fire detection algorithm based on adaptively spatial feature fusion. Forests 2023, 14, 361. [Google Scholar] [CrossRef]
Zhang, Q.; Lin, G.; Zhang, Y.; Xu, G.; Wang, J. Wildland forest fire smoke detection based on faster R-CNN using synthetic smoke images. Procedia Eng. 2018, 211, 441–446. [Google Scholar] [CrossRef]
Jiao, Z.; Zhang, Y.; Mu, L.; Mu, J.; Jiao, S.; Liu, H.; Liu, D. A yolov3-based learning strategy for real-time uav-based forest fire detection. In Proceedings of the 2020 Chinese Control and Decision Conference, Hefei, China, 22–24 August 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
Zhang, J.; Zhu, H.; Wang, P.; Ling, X. ATT squeeze U-Net: A lightweight network for forest fire detection and recognition. IEEE Access 2021, 9, 10858–10870. [Google Scholar] [CrossRef]
Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P.Z.; Blasch, E. Aerial imagery pile burn detection using deep learning: The FLAME dataset. Comput. Netw. 2021, 193, 108001. [Google Scholar] [CrossRef]
Gao, D.; Liu, Y.; Hu, B.; Wang, L.; Chen, W.; Chen, Y.; He, T. Time Synchronization Based on Cross-Technology Communication for IoT Networks. IEEE Internet Things J. 2023, 10, 19753–19764. [Google Scholar] [CrossRef]
Gao, D.; Ou, L.; Liu, Y.; Yang, Q.; Wang, H. DeepSpoof: Deep Reinforcement Learning-Based Spoofing Attack in Cross-Technology Multimedia Communication. IEEE Trans. Multimed. 2024. early access. [Google Scholar] [CrossRef]
Yun, T.; Li, J.; Ma, L.; Zhou, J.; Wang, R.; Eichhorn, M.P.; Zhang, H. Status, advancements and prospects of deep learning methods applied in forest studies. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103938. [Google Scholar] [CrossRef]
Maffei, C.; Menenti, M. Predicting forest fires burned area and rate of spread from pre-fire multispectral satellite measurements. ISPRS J. Photogramm. Remote Sens. 2019, 158, 263–278. [Google Scholar] [CrossRef]
Qadir, A.; Talukdar, N.R.; Uddin, M.M.; Ahmad, F.; Goparaju, L. Predicting forest fire using multispectral satellite measurements in Nepal. Remote Sens. Appl. Soc. Environ. 2021, 23, 100539. [Google Scholar] [CrossRef]
Maffei, C.; Lindenbergh, R.; Menenti, M. Combining multi-spectral and thermal remote sensing to predict forest fire characteristics. ISPRS J. Photogramm. Remote Sens. 2021, 181, 400–412. [Google Scholar] [CrossRef]
Ghosh, N.; Kumar, B.; Biswas, T.; Patnaik, S.; Paul, R. IoT Fog Based Framework to Predict Forest Fire. In Proceedings of the 2021 Smart City Challenges & Outcomes for Urban Transformation (SCOUT), Bhubaneswar, India, 25–26 December 2021; pp. 256–259. [Google Scholar]
Gao, D.; Wang, H.; Guo, X.; Wang, L.; Gui, G.; Wang, W.; Yin, Z.; Wang, S.; Liu, Y.; Wang, L.; et al. Federated Learning Based on CTC for Heterogeneous Internet of Things. IEEE Internet Things J. 2023, 10, 22673–22685. [Google Scholar] [CrossRef]
Guan, R. Predicting Forest Fire with Linear Regression and Random Forest. Sci. Eng. Technol. 2023, 44, 1–7. [Google Scholar] [CrossRef]
Pal, R. Mazelink: Detecting and Predicting Forest Fires. In Proceedings of the Indian Conference on Human Computer Interaction, Virtual, 19–21 November 2021; pp. 80–83. [Google Scholar]
Lim, C.J.; Chae, H. Predicting Forest Fire Danger Using Fuel Characteristics of Forest. J. Korean Soc. Hazard Mitig. 2022, 22, 125–132. [Google Scholar] [CrossRef]
Baranovskiy, N.V. Predicting Forest Fire Numbers Using Deterministic-Probabilistic Approach. In Predicting, Monitoring, and Assessing Forest Fire Dangers and RisksAdvances in Environmental Engineering and Green Technologies; IGI Global: Hershey, PA, USA, 2020; pp. 89–100. [Google Scholar]
Milanović, S.; Marković, N.; Pamučar, D.; Gigović, L.; Kostić, P.; Milanović, S.D. Forest Fire Probability Mapping In Eastern Serbia: Logistic Regression Versus Random Forest Method. Forests 2021, 12, 5. [Google Scholar] [CrossRef]
Farfán, M.; Dominguez, C.; Espinoza, A.; Jaramillo, A.; Alcántara, C.; Maldonado, V.; Tovar, I.; Flamenco, A. Forest fire probability under ENSO conditions in a semi-arid region: A case study in Guanajuato. Environ. Monit. Assess. 2021, 193, 684. [Google Scholar] [CrossRef] [PubMed]
Mekala, R.; Srinath, S.; Gokul, S.; Balavigneshwar, E.; Muralidharan, R. Forest Fire Probability Prediction based on Humidity and Temperature. In Proceedings of the 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2–4 March 2023; pp. 1–5. [Google Scholar]
Zhou, Q.; Zhang, H.; Wu, Z. Effects of Forest Fire Prevention Policies on Probability and Drivers of Forest Fires in the Boreal Forests of China during Different Periods. Remote Sens. 2022, 14, 5724. [Google Scholar] [CrossRef]
Kitzberger, T.; Tiribelli, F.; Barberá, I.; Gowda, G.H.; Morales, J.M.; Zalazar, L.; Paritsis, J. Projections of fire probability and ecosystem vulnerability under 21st century climate across a trans-Andean productivity gradient in Patagonia. SSRN Electron. J. 2022, 839, 156303. [Google Scholar] [CrossRef] [PubMed]
Maniatis, Y.; Doganis, A.; Chatzigeorgiadis, M. Fire Risk Probability Mapping Using Machine Learning Tools and Multi-Criteria Decision Analysis in the GIS Environment: A Case Study in the National Park Forest Dadia-Lefkimi-Soufli, Greece. Appl. Sci. 2022, 12, 2938. [Google Scholar] [CrossRef]
Román-Cuesta, R.M.; Gracia, M.; Retana, J. Environmental and human factors influencing fire trends in ENSO and non-ENSO years in tropical Mexico. Ecol. Appl. 2003, 13, 1177–1192. [Google Scholar] [CrossRef]
Mohammed, A.A.; Khamees, H.T. Categorizing and measurement satellite image processing of fire in the forest greece using remote sensing. Indones. J. Electr. Eng. Comput. Sci. 2021, 21, 843–853. [Google Scholar] [CrossRef]
Zumbrunnen, T.; Menendez, P.; Bugmann, H.; Conedera, M.; Gimmi, U.; Buergi, M. Human impacts on fire occurrence: A case study of hundred years of forest fires in a dry alpine valley in Switzerland. Reg. Environ. Chang. 2012, 12, 935–949. [Google Scholar] [CrossRef]
Tošić, I.; Mladjan, D.; Gavrilov, M.B.; Živanović, S.; Radaković, M.G.; Putniković, S.; Petrović, P.; Mistridželović, I. Potential influence of meteorological variables on forest fire risk in Serbia during the period 2000–2017. Open Geosci. 2019, 11, 414–425. [Google Scholar] [CrossRef]
Baltacı, U.; Yıldırım, F. Effect of slope on the analysis of forest fire risk. Hacet. J. Biol. Chem. 2020, 48, 373–379. [Google Scholar] [CrossRef]
Guo, F.; Mladjan, D.; Gavrilov, M.B.; Živanović, S.; Radaković, M.G.; Putniković, S.; Petrović, P.; Mistridželović, I.K.; Marković, S.B. Wildfire ignition in the forests of southeast China: Identifying drivers and spatial distribution to predict wildfire likelihood. Appl. Geogr. 2016, 66, 12–21. [Google Scholar] [CrossRef]
Maki, M.; Ishiahra, M.; Tamura, M. Estimation of leaf water status to monitor the risk of forest fires by using remotely sensed data. Remote Sens. Environ. 2004, 90, 441–450. [Google Scholar] [CrossRef]
Pimont, F.; Linn, R.R.; Dupuy, J.l.; Morvan, D. Effects of vegetation description parameters on forest fire behavior with FIRETEC. For. Ecol. Manag. 2006, 234, S120. [Google Scholar] [CrossRef]
Charizanos, G.; Demirhan, H. Bayesian prediction of wildfire event probability using normalized difference vegetation index data from an Australian forest. Ecol. Inform. 2023, 73, 101899. [Google Scholar] [CrossRef]
Li, J.; Lan, L.; Zhang, Z.; Yuan, W.; Gao, D.; Zong, S.; Ye, Q. Forest combustible moisture content inversion technology based on deep learning. For. Sci. 2022, 58, 47–58. [Google Scholar]
Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks.Neural Netw. Comput. Appl. 2017, 106, 249–259. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Tay, Y.; Bahri, D.; Metzler, D.; Juan, D.-C.; Zhao, Z.; Zheng, C. Synthesizer: Rethinking self-attention for transformer models. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; Volume 30, pp. 10183–10192. [Google Scholar]
Natekar, S.; Patil, S.; Nair, A.; Roychowdhury, S. Forest fire prediction using LSTM. In Proceedings of the 2021 2nd International Conference for Emerging Technology (INCET), Belagavi, India, 21–23 May 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
Burge, J.; Bonanni, M.; Ihme, M.; Hu, L. Convolutional LSTM neural networks for modeling wildland fire dynamics. arXiv 2020, arXiv:2012.06679. [Google Scholar]
Perumal, R.; Van Zyl, T.L. Comparison of recurrent neural network architectures for wildfire spread modelling. In Proceedings of the 2020 International SAUPEC/RobMech/PRASA Conference, Cape Town, South Africa, 29–31 January 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
Hansika, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37. [Google Scholar]
Ahmed, S.; Nielsen, I.E.; Tripathi, A.; Siddiqui, S.; Rasool, G.; Ramachandran, R.P. Transformers in time-series analysis: A tutorial. Circuits Syst. Signal Process. 2023, 42, 7433–7466. [Google Scholar] [CrossRef]

Figure 1. Quanzhou County, Guilin City, Guangxi Province, China.

Figure 2. Parameters influencing forest fire danger level.

Figure 3. Model architecture.

Figure 4. Structure of the used Transformer.

Figure 5. Loss function comparison.

Figure 6. Forest fire danger prediction results depend on a single forest fire influencing parameter.

Figure 7. Forest fire danger prediction results.

Figure 8. Parameters influencing forest fire danger.

Table 1. Data description of forest fire influence.

No.	Influence Factor	Scale/Resolution	Unit	Source
1	Temperature		°C	CIMISS
2	Atmospheric pressure		Pa	CIMISS
3	Relative humidity		kg·kg⁻¹	CIMISS
4	Wind speed		M·s⁻¹	CIMISS
5	Rainfall		Mm·h⁻¹	CIMISS
6	Vegetation coverage			MODIS
7	NDVI	500 m		MODIS
8	Moisture content	30 m		Landsat
9	Ignitability	30 m		Landsat
10	Slope	30 m		Landsat
11	Aspect	30 m		Landsat
12	Distance to railway	1:250,000	km	DSNLV
13	Distance to road	1:250,000	km	DSNLV
14	Distance to river	1:250,000	km	DSNLV

Table 2. Multicollinearity analysis of forest fire influencing factors.

No.	Influence Factor	VIF	TOL
1	Temperature	3.702	0.270
2	Atmospheric pressure	2.599	0.385
3	Relative humidity	1.676	0.597
4	Wind speed	1.108	0.902
5	Rainfall	1.499	0.667
6	Vegetation coverage	95.606	0.091
7	NDVI	95.019	0.097
8	Moisture content	1.485	0.673
9	Ignitability	1.385	0.722
10	Slope	1.950	0.513
11	Aspect	1.121	0.892
12	Distance to railway	2.963	0.338
13	Distance to road	1.814	0.551
14	Distance to river	2.609	0.383

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, Y.; Zhou, X.; Yu, Y.; Rao, S.; Wu, Y.; Li, C.; Zhu, Z. Forest Fire Prediction Based on Time Series Networks and Remote Sensing Images. Forests 2024, 15, 1221. https://doi.org/10.3390/f15071221

AMA Style

Cao Y, Zhou X, Yu Y, Rao S, Wu Y, Li C, Zhu Z. Forest Fire Prediction Based on Time Series Networks and Remote Sensing Images. Forests. 2024; 15(7):1221. https://doi.org/10.3390/f15071221

Chicago/Turabian Style

Cao, Yue, Xuanyu Zhou, Yanqi Yu, Shuyu Rao, Yihui Wu, Chunpeng Li, and Zhengli Zhu. 2024. "Forest Fire Prediction Based on Time Series Networks and Remote Sensing Images" Forests 15, no. 7: 1221. https://doi.org/10.3390/f15071221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forest Fire Prediction Based on Time Series Networks and Remote Sensing Images

Abstract

1. Introduction

2. Research Area and Research Materials

2.1. Overview of the Research Area

2.2. Dataset Overview

2.2.1. Meteorological Influencing Factors

2.2.2. Geographical Impact Factors

2.2.3. Human Activity Influencing Factors

2.2.4. Vegetation Influencing Factors

2.3. Assessment of the Impact Factors of Forest Fires

2.3.1. Variance Inflation Factor

2.3.2. Tolerance

2.3.3. Pearson Correlation Coefficient

2.4. Establishment of Datasets

3. Research Method

3.1. Input Embedding

3.1.1. Word Embedding Layer

3.1.2. Position Embedding Layer

3.2. Encoder Embedding

3.2.1. Self Attention

3.2.2. Multi-Head Attention

3.2.3. Add & Norm Layer

3.2.4. Feed-Forward Layer

3.3. Decoder Layer

3.3.1. Attention Mask

3.3.2. Linear and Softmax Layer

4. Experiment

4.1. Parameter Selection

4.2. Forest Fire Susceptibility Mapping

4.3. Model Evaluation Metrics

4.3.1. Mean Absolute Percentage Error

4.3.2. Mean Absolute Error

4.3.3. Root Mean Square Error

4.3.4. Accuracy

5. Discussion

5.1. Model Comparison

5.2. Further Development

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI