Wildfire Spread Prediction Using Attention Mechanisms in U2-NET

Xiao, Hongtao; Zhu, Yingfang; Sun, Yurong; Zhang, Gui; Gong, Zhiwei

doi:10.3390/f15101711

Open AccessArticle

Wildfire Spread Prediction Using Attention Mechanisms in U2-NET

by

Hongtao Xiao

^1,2,

Yingfang Zhu

^1,*

,

Yurong Sun

¹

,

Gui Zhang

¹ and

Zhiwei Gong

¹

College of Computer and Mathematics, Central South University of Forestry and Technology, Changsha 410004, China

²

School of Electronic Information, Central South University, Changsha 410075, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(10), 1711; https://doi.org/10.3390/f15101711

Submission received: 21 August 2024 / Revised: 9 September 2024 / Accepted: 25 September 2024 / Published: 27 September 2024

(This article belongs to the Section Natural Hazards and Risk Management)

Download

Browse Figures

Versions Notes

Abstract

:

Destructive wildfires pose a serious threat to ecosystems, economic development, and human life and property safety. If wildfires can be extinguished in a relatively short period of time after they occur, the losses caused by wildfires will be greatly reduced. Although deep learning methods have been shown to have powerful feature extraction capabilities, many current models still have poor generalization performance when faced with complex tasks. To this end, in this study, we considered introducing attention modules both inside and outside the nested U-shaped structure and trained a neural network model based on the U2-Net architecture to enable the model to suppress the activation of irrelevant areas. Compared with baseline models such as U-Net, our model has made great progress on the test set, with an F1 score improvement of at least 2.8%. The experimental results indicate that the model we proposed has certain practicality and can provide a significant scientific basis for forest fire management and emergency decision-making.

Keywords:

wildfire spread; U2-Net; attention module; deep learning

1. Introduction

Due to the rising frequency of wildfires caused by climate change and human activities, natural ecosystems such as forest ecosystems and wetland ecosystems are facing serious wildfire threats. Wildfires are a natural disaster that cause large amounts of soot emissions, thereby affecting the global carbon cycle and slowing the achievement of carbon neutrality [1,2,3]. For example, over the past decade, wildfires in the United States have burned more than 68 million acres of land, causing enormous economic losses. The fires release large amounts of greenhouse gases and air pollutants, and the accumulation of greenhouse gases leads to global warming, which in turn triggers weather extremes such as unusual rainfall, drying of vegetation, strong winds, high temperatures, and high-frequency lightning strikes. Therefore, there is an urgent need for effective predictive models to curb this destructive phenomenon.

Traditional wildfire spread models consist of two main categories: the first is based on physical and chemical processes, or physical models of physical processes only, such as the Wildland–Urban Fire Dynamics Simulator (WFDS) [4]. It is based on Computational Fluid Dynamics (CFD) principles to simulate phenomena such as air flow, heat transfer, and wildfire combustion in wildfires. The characteristic of this type of model is that it physically models the frontier part of the wildfire based on the principle of the conservation of energy and analyzes the impact of the wildfire on unburned combustibles through radiation and heat transfer. The second category is empirical or semi-empirical models based on statistical analysis of experimental data, such as Rothermell [5] in the U.S. and Canadian Forest Fire Behavior Prediction (CFBP) [6] in Canada, etc. The characteristic of this type of model is that it does not take into account the physical and chemical processes in the spread of wildfires. It is basically based on a large amount of historical fire data or experimental data to establish the model, or to establish an expression for the wildfire spread rate using some form of physical framework [7]. From the earliest Huygens–Fresnel principle models (such as FARSITE [8] and graph theory models) to Cellular Automata (CA) [9,10], the prediction principles of most of these prediction models require some prior knowledge and empirical formulas, which often have certain limitations on environmental conditions, making it difficult to guarantee their accuracy in the prediction of large-scale fires.

Conventional wildfire spread models struggle to ensure their effectiveness in large-scale fires, and with the emergence of large amounts of data and higher computing power, deep learning (DL)-based methods have been used for wildfire detection, segmentation, and classification tasks and have shown good generalization capabilities. The most widely used and successful image classification model for wildfire classification tasks is the convolutional neural network (CNN). Typically, a CNN accepts an input image and predicts the presence or absence of wildfire as the output of the model [11,12]. While the wildfire classification model predicts the presence of wildfires in the input image, we are unable to determine the location of the wildfire, whereas the wildfire detection algorithm determines the class of the detected object in the image as well as the boundaries of the detected object [13,14]. The wildfire segmentation task aims to classify the parts of the input image that belong to the “FIRE” class, and the output is a binary mask that can highlight the shape and location of the wildfire through visualization techniques [11].

Widely used convolutional neural network models for wildfire segmentation tasks mainly include Fully Convolutional Network (FCN) and encoder–decoder models. The FCN extracts the features of the input image through convolution and pooling operations while the output is a binary mask. FCN models for semantic segmentation significantly improve accuracy through pre-trained classifier weights, fusion of different layer representations, and end-to-end learning over the entire image [15]. Raw FCNs may result in the loss of detailed information when capturing contextual information across long distances [16]. The encoder–decoder model comprises an encoder and a decoder, where the encoder gradually reduces the dimensionality of the data and extracts useful features, while the decoder is designed to upsample the compressed features and transform them into a meaningful binary firemask [17,18].

Despite the great feature extraction capabilities of deep learning methods, many current models still struggle to effectively learn information-rich contextual features, resulting in poor generalization performance when facing complex tasks. Therefore, in this paper, we present a U2-Net model incorporating an attention mechanism that introduces an attention mechanism both inside and outside the nested U-shaped structure, aiming to solve the problems of U2-Net such as retaining invalid features and losing edge information in the connection.

The rest of this paper is arranged as follows: Section 2 presents previous related work. In Section 3, we introduce our deep learning model. Section 4 introduces the data source, data processing, and model results. The last section presents our conclusions.

2. Related Work

U-Net is a typical encoder–decoder architecture that performs well in semantic segmentation tasks [19,20]. Encoder and decoder features are combined by employing jump connections to better catch finer details and generate more accurate results. Akhloufi [3] used the U-Net model for input features to segment the wildfire region, an encoder to extract wildfire features from the fire input image, and a decoder to decode the compressed feature map and output it as a binary mask. Many researchers have improved some of the modules of U-Net. Zhang [21] and Huot [22] proposed a model that combines the advantages of deep residual learning and the U-Net architecture by using residual units instead of ordinary neural units, simplifying the training process of the model. Li et al. [23] proposed an improved U-Net segmentation structure in which the encoder is a densely connected CNN structure such that each feature extractor layer accepts as input all the features obtained from all the previous layers, which achieves feature reuse so that as many features as possible can be extracted without adding more parameters. Khryashchev et al. [24] used U-Net with ResNet-34 as the backbone for detecting and segmenting wildfire areas, increased the amount of training data by using multiple data enhancement techniques, and optimized the training process with an adaptive moment estimation algorithm. Wang et al. [25] proposed smoke-U-Net, a network model for smoke segmentation based on an improved U-Net. The model combines the attention mechanism and the residual module by adding the ResBlock, a residual block with a jump connection structure, and the SEBlock module based on the attention mechanism in the network structure, respectively.

In addition to improving the modules of U-Net, there are many research projects aiming to improve the U-Net architecture. Zhou et al. [26] proposed an improved U-Net architecture, U-Net++, which allows the optimizer to solve the optimization problem in a simpler way by reducing the semantic gap between the encoder and the decoder. Bochkov and Kataeva [27] proposed the wideUU-Net concatenative (wUU-Net) connection model based on the U-Net architecture. The wUU-Net consists of a modernization of U-Net, and UU-Net consists of two U-Net architectures with an additional skip between the decoder of the first U-Net and the encoder of the second U-Net connections, with improved accuracy relative to the U-Net model. Qin et al. [28] proposed a simple yet powerful two-level nested U-structure, U2-Net, consisting of residual U-blocks (RSUs) and external U-structures that come to connect the residual U-blocks, increasing the depth of the overall network architecture while yielding low computational cost [29].

Vaswani et al. [30] demonstrated the effectiveness of the attention mechanism in improving model performance, a landmark for a large number of subsequent studies and applications. Zhang et al. [31] proposed a U-shaped method ATT Squeeze U-Net (attention U-Net and SqueezeNet) with attention modules. The method removes the conventional encoder and uses eight fire modules with SqueezeNet as the backbone to extract wildfire features, while the decoder employs 3 × 3 and 1 × 1 convolutional layers, a ReLU activation function, and three corresponding DeFire modules, a framework that significantly reduces the parameter size of the model. Shirvani et al. [32] added attention gate units (AU-Net) and residual blocks and attention gate units (RAU-Net) to the U-Net architecture, which improves feature extraction by integrating residual blocks and incorporates an attention mechanism at jump junctions to remove irrelevant and invalid information.

3. Methodology

We categorize this wildfire spread prediction task as a semantic segmentation problem in image segmentation, where each region is predicted to contain a fire or not based on the previous day’s fire location as well as various influencing factors. We first introduce the basic principles of the U-Net model and then introduce the U2-Net model based on U-Net. At the same time, considering the characteristics of multi-channel input features of remote sensing images, we propose a U2-Net model that incorporates an attention mechanism to handle this task.

3.1. Attention U2-Net

The entire U-Net framework consists of contraction paths and expansion paths. The former follows the classical architecture of convolutional networks and consists mainly repeated applications of two 3 × 3 convolutions (unpadded convolutions), each of which is followed by a rectified linear unit (ReLU) and a 2 × 2 max-pooling operation (with stride 2) for downsampling. Furthermore, at each downsampling step, U-net’s contraction path doubles the number of feature channels. The expansion path will first upsample the feature map and then perform a 2 × 2 convolution (“up-convolution”) to halve the number of feature channels. This is followed by concatenation with the correspondingly cropped feature map in the contraction path as well as two 3 × 3 convolutions, with a ReLU and cropping after each convolution. At the end of the decoder, U-Net uses a 1 × 1 convolutional layer to transform the feature map into the final segmentation map [19,33].

The design of U-Net is the basis of the design of U2-Net. U2-Net comprises a two-layer U-structure. It inherits the encoding and decoding ideas of the U-Net network model but no longer utilizes a simple convolutional layer or deconvolutional layer for every sample. Instead, it embeds a whole U-shaped residual block structure (RSU). Specifically, the outer structure of U2-Net consists of a saliency map fusion module, a five-level decoder, and a six-level encoder. Each level between the encoder and the decoder contains the newly proposed RSU.

The U2-Net architecture represents each sample from the Wildfire Dataset as an input to the contraction operation and generates a probability list as output based on the number of classes. When the input data are processed by each RSU, each downsampling/upsampling path in the corresponding RSU consists of a 3 × 3 convolution layer, a batch normalization layer, a ReLu activation function, and a 2 × 2 maximum pooling/upsampling layer. The six sets of saliency probability maps obtained by the six different RSU modules De_5, De_4, De_3, De_2, De_1, and En_6 are fused to obtain the final fused feature map. Due to the different depths of the RSU modules, the final fused feature map contains rich content. Finally, the model prediction results and loss function values are obtained, and the value of the loss function is reduced through continuous iteration to improve the generalization ability of the model [34].

The U2-Net network structure is sophisticated and deep, and deeper layers can be used to acquire high-resolution feature maps without considering the computing cost. However, the model is prone to retaining invalid features and losing edge information in the connection [35]. The attention mechanism was first proposed by Bahdanau et al. [36] to address the problem of precisely sequentially encoding long sentences in text to a constant length [37]. Adding an attention mechanism to the model can actively suppress the activation of irrelevant areas at jump connections, thereby reducing redundant information [38,39]. Therefore, we propose a U2-Net model that incorporates an attention mechanism. The attention mechanism is introduced both inside and outside the nested U-shaped structure. The model structure is shown in Figure 1. Figure 2 shows how the attention mechanism can be integrated into the U2-Net model.

3.2. Loss Function

In the training process, we use deep supervision similar to Holistically nested Edge Detection (HED), whose effectiveness was verified in HED and Deeply Supervised Salient object detection (DSS). The computation of the loss function includes the loss of the side output feature maps and the loss of the final fused output feature maps, and the loss function is defined as follows [28,35]:

L o s s = \sum_{s = 1}^{S} ω_{s i d e}^{(s)} {l o s s}_{s i d e}^{(s)} + ω_{f u s e} {l o s s}_{f u s e}

(1)

where S = 6 denotes the number of side output feature maps,

{l o s s}_{s i d e}^{(s)}

indicates the loss of the side output feature map,

{l o s s}_{f u s e}

denotes the loss of the final output feature map,

ω_{s i d e}^{(s)}

and

ω_{f u s e}

denote the weight of each loss term, and

{l o s s}_{s i d e}^{(s)}

and

{l o s s}_{f u s e}

are binary cross-entropy loss functions. The formula for BCELoss is as follows:

l o s s = - \sum_{(h, w)}^{(H, W)} [P_{G T (h, w)} \log P_{p r e d (h, w)} + (1 - P_{G T (h, w)}) \log (1 - P_{p r e d (h, w)})]

(2)

where

P_{G T (h, w)}, P_{p r e d (h, w)}

indicates the pixel values where the predicted map’s ground truths are located, respectively. This loss term is computed with all the pixel points in that image. We tried to minimize the loss between the predicted value and ground truth during the training process and chose the final fused output feature map as our predicted value.

4. Experiments

4.1. Datasets and Experimental Setup

Data were collected from the work of the Google Research team [22]. The researchers presented the Wildfire Masks for the Next Day Wildfire Dataset. As shown in Figure 3, the dataset collects satellite data on wildfires in six areas: Elevation Data, Weather Data, Drought Data, Vegetation Data, Population Density Data, and Wildfire Masks. Specifically, the input data include elevation, wind direction (th), wind speed (vs), minimum temperature (tmmn), maximum temperature (tmmx), humidity (sph), precipitation (pr), drought index (pdsi), vegetation (NDVI), population density (population), energy release component (ERC), and firemask at time t (pfm), while the firemask at time t + 1 (fm) is used as the output. The resolution of a 64 × 64 size image is 1 km, and for each 1 km × 1 km area, the firemask consists of both ‘fire’ and ‘no fire’. We used 0 to fill the missing pixel points in each image. The dataset contains a total of 18,545 samples, and we divided the data into training, validation, and test sets in a ratio of 8:1:1.

For the task of forecasting the spread of forest fires, we consider the influences and the firemask at moment t as input features (denote ‘previous firemask’) and the firemask at moment t + 1 as labels (denote ‘firemask’). In addition to firemask, we performed clipping and normalization operations on the remaining input features, the purpose of which is to remove the effects of extreme values and magnitudes, respectively. Through Figure 3, it can be found that factors such as elevation and wind direction show great variations in the area of 64 km × 64 km, and the last two columns represent the firemask at the moment of t and the firemask at the moment of t + 1, respectively, where gray denotes “NO FIRE” and red denotes “FIRE”.

The network architecture was implemented in Python and executed on an NVIDIA GeForce RTX 3070 graphics processing unit (GPU) with 16 GB of memory. The initial learning rate was set to 0.001 and was adaptively decreased during training using the Adam optimizer. Besides that, we conducted comparative experiments with U-Net [20], Attention U-Net [20], ResUNet [22], U2-Net [28], and Attention U2-Net (Our Method). For the first four models, we ran their trained models on their proposed environments, while for our method, the hyperparameters are optimally sought using a grid-tuned parameterization method.

4.2. Model Evaluation Metrics and Baselines

There are several indicators used to evaluate the generalization ability of the model, including Precision, Recall, and Overall Accuracy (OA). Precision denotes the ratio of samples predicted to be positive that are indeed positive. Recall denotes the ratio of the actual number of positive samples in the predicted positive samples to the total number of positive samples. F is the binary feature map of the final fusion output and T is the true value. It is used for combined assessment of Precision and Recall. OA denotes the proportion of correctly predicted pixels among all pixels, denotes correctly classified pixels, and denotes the total number of pixels. The neural networks in this study were trained using Python’s Pytorch 2.2.2 and Tensorflow 2.16.1 libraries.

P r e c i s i o n = \frac{|F \cap T|}{|F|}

(3)

R e c a l l = \frac{|F \cap T|}{|T|}

(4)

F_{β} = \frac{(1 + β^{2}) \times P r e c i s i o n \times R e c a l l}{β^{2} \times P r e c i s i o n + R e c a l l}

(5)

O A = \frac{P_{T r u e}}{P_{t o t a l}}

(6)

4.3. Results and Discussion

In this section, the generalization capabilities of the U-Net [20], Attention U-Net [20], ResUNet [22], U2-Net [28], and Attention U2-Net models were analyzed. The effects of the baseline model on the test set and the Attention U2-Net model on the test set are shown in Figure 4 and Figure 5, respectively. From Figure 4b, c, we can see that the prediction results of the Attention U-Net model and the ResUNet model tend to be smoother. From Figure 5, we can see that the Attention U2-Net model tends to connect small fires in the wildfire spread prediction, which is more in line with the actual wildfire spread trend, but its recognition of small fires is low.

In Table 1, we can see that the accuracy of all models has reached more than 97%, and because our training data are labeled as unbalanced, accuracy is easily exaggerated, so we prefer to use F1 score to evaluate the advantages and disadvantages of the model. For the U-Net model, the introduction of the attention module improves the F1 score of the model by 5.1%. Interestingly, the F1 score of the U2-Net model is the lowest among all models, indicating that simply increasing the depth of the model architecture cannot improve the generalization performance and may cause overfitting. Compared with the U2-Net model, the F1 score of the model is improved by 10.3% by introducing the attention module both inside and outside the nested U-shaped structure, indicating that it is correct to introduce the attention module into the U2-Net structure. Compared with the F1 score of the ResUNet model, the F1 score of our model is improved by 2.8% because the Precision index is improved by 30.3%. For the Attention U2-Net model, the difference between Precision and Recall is 0.19, indicating that our model pays more attention to the prediction accuracy of the fire area.

4.4. Ablation Study

In order to analyze the impact of various input features on the prediction results, we perform ablation experiments using the model with the best F1 score, the Attention U2-Net model. We consider deleting each input feature and retraining the model using the remaining features. The metrics of the model are shown in Table 2. It can be found that the F1 score of the model decreases the most when the Prefiremask feature is deleted. Firemask has a large correlation with Prefiremask, indicating its importance as a predictor variable of the model. There is a small increase in the model’s F1 score when the Elevation and ERC features are removed, which we attribute to the stochastic nature of the model.

5. Conclusions

Based on the U-Net architecture, we propose an Attention U2-Net network for predicting the spread of wildfires, which introduces an attention module both inside and outside the nested U-shaped structure, allowing the model to suppress activation in irrelevant regions and thus reduce redundant information. The optimal hyperparameter combination determined through the experiments contributed significantly to the F1 score of the model in predicting wildfire spread. The experiments show that the Attention U2-Net network improves the F1 score in the test set of wildfire spread by at least 2.8% compared to benchmark models such as U-Net, and the experimental results indicate that the model can provide an important reference for fire managers to formulate effective suppression strategies. In addition, through ablation experiments, we found that the key influencing factor of the model was Prefiremask, indicating its importance as a predictor variable of the model.

Author Contributions

Conceptualization, H.X. and Y.Z.; methodology, H.X. and Z.G.; software, Y.S.; validation, H.X., Y.Z. and Y.S.; formal analysis, Z.G.; investigation, G.Z. and Y.S.; resources, Y.Z.; data curation, H.X.; writing—original draft preparation, H.X.; writing—review and editing, Y.Z. and Y.S.; visualization, Z.G.; supervision, Y.Z.; project administration, G.Z.; funding acquisition, G.Z. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 32271879 and the Forestry Reform and Development Fund of the National Forestry and Grassland Administration of China.

Data Availability Statement

The data used in this study were provided by Huot et al. and can be accessed on Kaggle.

Acknowledgments

We thank Huot et al. for their contribution to data collection. We also thank Wenkang Li and Rong Wang for their valuable suggestions for the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ding, Y.; Wang, M.; Fu, Y.; Zhang, L.; Wang, X. A Wildfire Detection Algorithm Based on the Dynamic Brightness Temperature Threshold. Forests 2023, 14, 477. [Google Scholar] [CrossRef]
Pereira, J.; Mendes, J.; Júnior, J.S.S.; Viegas, C.; Paulo, J.R. A Review of Genetic Algorithm Approaches for Wildfire Spread Prediction Calibration. Mathematics 2022, 10, 300. [Google Scholar] [CrossRef]
Akhloufi, M.A.; Couturier, A.; Castro, N.A. Unmanned Aerial Vehicles for Wildland Fires: Sensing, Perception, Cooperation and Assistance. Drones 2021, 5, 15. [Google Scholar] [CrossRef]
Mell, W.; Jenkins, M.; Gould, J.; Cheney, P. A Physics-Based Approach to Modeling Grassland Fires. Int. J. Wildland Fire 2007, 16, 1–22. [Google Scholar] [CrossRef]
Rothermel, R.C. A Mathematical Model for Predicting Fire Spread in Wildland Fuels; Intermountain Forest & Range Experiment Station, Forest Service, US Department of Agriculture: Missoula, MT, USA, 2017. [Google Scholar]
Van Wanger, C.; Stocks, B.; Lawson, B.; Alexander, M.; Lynham, T.; McAlpine, R. Development and Structure of the Canadian Forest Fire Behavior Prediction System; Information Report No. ST-X-3; Canadian Forestry Service: Ottawa, ON, Canada, 1992; 67p. [Google Scholar]
Ren, M.L.; Guo, Y.; Chen, B.X.; Fan, J.L.; Hu, T.X.; Sun, L. Prediction models of fire spread rate of Pinus koraiensis plantation’s surface fuel. Chin. J. Appl. Ecol. 2023, 34, 2091–2100. [Google Scholar]
Srivas, T.; Artés, T.; de Callafon, R.A.; Altintas, I. Wildfire Spread Prediction and Assimilation for FARSITE Using Ensemble Kalman Filtering. Procedia Comput. Sci. 2016, 80, 897–908. [Google Scholar] [CrossRef]
Sun, L.; Xu, C.; He, Y.; Zhao, Y.; Xu, Y.; Rui, X.; Xu, H. Adaptive Forest Fire Spread Simulation Algorithm Based on Cellular Automata. Forests 2021, 12, 1431. [Google Scholar] [CrossRef]
Rui, X.; Hui, S.; Yu, X.; Zhang, G.; Wu, B. Forest fire spread simulation algorithm based on cellular automata. Nat. Hazards 2018, 91, 309–319. [Google Scholar] [CrossRef]
Ghali, R.; Akhloufi, M.A. Deep Learning Approaches for Wildland Fires Remote Sensing: Classification, Detection, and Segmentation. Remote Sens. 2023, 15, 1821. [Google Scholar] [CrossRef]
Ghali, R.; Akhloufi, M.A. Deep Learning Approaches for Wildland Fires Using Satellite Remote Sensing Data: Detection, Mapping, and Prediction. Fire 2023, 6, 192. [Google Scholar] [CrossRef]
Majid, S.; Alenezi, F.; Masood, S.; Ahmad, M.; Gündüz, E.S.; Polat, K. Attention based CNN model for fire detection and localization in real-world images. Expert Syst. Appl. 2022, 189, 116114. [Google Scholar] [CrossRef]
Muhammad, K.; Ahmad, J.; Lv, Z.; Bellavista, P.; Yang, P.; Baik, S.W. Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 1419–1434. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation; Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Shah, K.; Pantoja, M. Wildfire Spread Prediction Using Attention Mechanisms in U-NET. In Proceedings of the 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Tenerife, Canary Islands, Spain, 19–21 July 2023; pp. 1–6. [Google Scholar]
Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Huot, F.; Hu, R.L.; Goyal, N.; Sankar, T.; Ihme, M.; Chen, Y.F. Next Day Wildfire Spread: A Machine Learning Dataset to Predict Wildfire Spreading from Remote-Sensing Data. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Li, S.; Dong, M.; Du, G.; Mu, X. Attention Dense-U-Net for Automatic Breast Mass Segmentation in Digital Mammogram. IEEE Access 2019, 7, 59037–59047. [Google Scholar] [CrossRef]
Khryashchev, V.; Larionov, R. Wildfire Segmentation on Satellite Images using Deep Learning. In Proceedings of the 2020 Moscow Workshop on Electronic and Networking Technologies (MWENT), Moscow, Russia, 11–13 March 2020; pp. 1–5. [Google Scholar]
Wang, Z.; Yang, P.; Liang, H.; Zheng, C.; Yin, J.; Tian, Y.; Cui, W. Semantic Segmentation and Analysis on Sensitive Parameters of Forest Fire Smoke Using Smoke-Unet and Landsat-8 Imagery. Remote Sens. 2022, 14, 45. [Google Scholar] [CrossRef]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018, Proceedings 4; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
Bochkov, V.S.; Kataeva, L.Y. wUUNet: Advanced Fully Convolutional Neural Network for Multiclass Fire Segmentation. Symmetry 2021, 13, 98. [Google Scholar] [CrossRef]
Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
Zhang, L.; Shen, Z.; Lin, W.; Zhang, D. U2Net-based Single-pixel Imaging Salient Object Detection. Curr. Opt. Photon. 2022, 6, 463–472. [Google Scholar]
Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 2017; pp. 5998–6008. [Google Scholar]
Zhang, J.; Zhu, H.; Wang, P.; Ling, X. ATT Squeeze U-Net: A Lightweight Network for Forest Fire Detection and Recognition. IEEE Access 2021, 9, 10858–10870. [Google Scholar] [CrossRef]
Shirvani, Z.; Abdi, O.; Goodman, R.C. High-Resolution Semantic Segmentation of Woodland Fires Using Residual Attention UNet and Time Series of Sentinel-2. Remote Sens. 2023, 15, 1342. [Google Scholar] [CrossRef]
Alsrehin, N.O.; Gupta, M.; Alsmadi, I.; Alrababah, S.A. U2-Net: A Very-Deep Convolutional Neural Network for Detecting Distracted Drivers. Appl. Sci. 2023, 13, 11898. [Google Scholar] [CrossRef]
Nadeem, S.A.; Hoffman, E.A.; Sieren, J.C.; Comellas, A.P.; Bhatt, S.P.; Barjaktarevic, I.Z.; Abtin, F.; Saha, P.K. A CT-Based Automated Algorithm for Airway Segmentation Using Freeze-and-Grow Propagation and Deep Learning. IEEE Trans. Med. Imaging 2020, 40, 405–418. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Lin, W.; Shen, Z.; Zhang, D.; Xu, B.; Wang, K.; Chen, J. CA-U2-Net: Contour Detection and Attention in U2-Net for Infrared Dim and Small Target Detection. IEEE Access 2023, 11, 88245–88257. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Abbas, S.F.; Duc, N.T.; Song, Y.-O.; Kim, K.; Lee, B. CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images. arXiv 2023, arXiv:2311.10224. [Google Scholar]
A Detailed Explanation of the Attention U-Net. Available online: https://towardsdatascience.com/a-detailed-explanation-of-the-attention-u-net-b371a5590831 (accessed on 1 May 2020).
Wu, Y.; Wu, Y. Application of Split Coordinate Channel Attention Embedding U2Net in Salient Object Detection. Algorithms 2024, 17, 109. [Google Scholar] [CrossRef]

Figure 1. The architecture diagram of Attention U2-Net.

Figure 2. The process of attentional mechanisms in Attention U2-Net.

Figure 3. Sample data visualization. Each row represents a sample. In addition to the label value and the predicted value, we also show the weather conditions in the area. The brighter the color, the larger the value.

Figure 4. The effect of each comparison model. (a–d) show the effects of the U-Net, Attention U-Net, ResUNet, and U2-Net models on the test set samples. Each row represents a sample. In addition to the label value and the predicted value, we also show the weather conditions in the area. The brighter the color, the larger the value.

Figure 5. The effect of our model. Each row represents a sample. In addition to the label value and the predicted value, we also show the weather conditions in the area. The brighter the color, the larger the value.

Table 1. Comparison of modeling effects.

	OA	Precision	Recall	F1
U-Net [20]	0.977	0.275	0.563	0.37
Attention U-Net [20]	0.978	0.294	0.574	0.389
ResUNet [22]	0.985	0.4	0.389	0.394
U2-Net [28]	0.986	0.377	0.358	0.367
Attention U2-Net	0.982	0.521	0.331	0.405

Table 2. Ablation study.

Removed Feature	Accuracy	Precision	Recall	F1
Elevation	0.98	0.519	0.343	0.413
ERC	0.982	0.474	0.364	0.412
PDSI	0.98	0.526	0.33	0.406
tmmn	0.981	0.492	0.343	0.404
sph	0.982	0.474	0.351	0.403
th	0.983	0.48	0.345	0.401
Population	0.981	0.504	0.332	0.400
pr	0.978	0.508	0.330	0.400
vs	0.979	0.523	0.323	0.399
Tmmx	0.981	0.492	0.336	0.399
NDVI	0.98	0.48	0.338	0.397
Prefiremask	0.982	0.424	0.325	0.368

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, H.; Zhu, Y.; Sun, Y.; Zhang, G.; Gong, Z. Wildfire Spread Prediction Using Attention Mechanisms in U2-NET. Forests 2024, 15, 1711. https://doi.org/10.3390/f15101711

AMA Style

Xiao H, Zhu Y, Sun Y, Zhang G, Gong Z. Wildfire Spread Prediction Using Attention Mechanisms in U2-NET. Forests. 2024; 15(10):1711. https://doi.org/10.3390/f15101711

Chicago/Turabian Style

Xiao, Hongtao, Yingfang Zhu, Yurong Sun, Gui Zhang, and Zhiwei Gong. 2024. "Wildfire Spread Prediction Using Attention Mechanisms in U2-NET" Forests 15, no. 10: 1711. https://doi.org/10.3390/f15101711

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wildfire Spread Prediction Using Attention Mechanisms in U2-NET

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Attention U2-Net

3.2. Loss Function

4. Experiments

4.1. Datasets and Experimental Setup

4.2. Model Evaluation Metrics and Baselines

4.3. Results and Discussion

4.4. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI