Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region

Hiremath, Hrishita; Kannan, Srinivasa Ramanujam

doi:10.3390/atmos15111284

Open AccessArticle

Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region

by

Hrishita Hiremath

and

Srinivasa Ramanujam Kannan

^*

School of Mechanical Sciences, Indian Institute of Technology Bhubaneswar, Bhubaneswar 752050, Odisha, India

^*

Author to whom correspondence should be addressed.

Atmosphere 2024, 15(11), 1284; https://doi.org/10.3390/atmos15111284

Submission received: 23 September 2024 / Revised: 22 October 2024 / Accepted: 23 October 2024 / Published: 27 October 2024

(This article belongs to the Section Biosphere/Hydrosphere/Land–Atmosphere Interactions)

Download

Browse Figures

Versions Notes

Abstract

:

The present study aims to develop a random forest algorithm-based classifier to predict the occurrence of fire events using observed meteorological parameters a day in advance. We considered the skin temperature, the air temperature close to the surface, the humidity close to the surface level, and soil moisture as important meteorological factors influencing forest fire occurrence. Twenty additional parameters were derived based on these four parameters that account for the energy exchanged in sensible and latent forms and the change in parameters in recent trends. We used the mutual information approach to identify critical meteorological parameters that carry significant information about fire occurrence the next day. The top nine parameters were then fed as input to the random forest algorithm to predict fire/no fire the next day. The weighted data sampling and SMOTE techniques were employed to address the class imbalance in the fire data class. Both techniques correctly classified fire incidents well, given the meteorological input from the previous days. This study also showed that as the class imbalance increases to 1:9, the performance based on the precision, recall, F1 score, and accuracy are maximum, showing the model’s ability to perform with class imbalance. Both techniques helped the random forest algorithm forecast fire instances as the data sample size increased.

Keywords:

forest fire; weighted data sampling; SMOTE; class imbalance; random forest

1. Introduction

Forest fires occur regularly in our country during the winter (post-monsoon) to summer (pre-monsoon) seasons. These events occur on a large spatial and temporal scale; for example, between November 2020 and June 2021, the Moderate Resolution Imaging Spectro-Radiometer (MODIS) detected an occurrence of 345,989 forest fires across various districts in India [1]. The Forest Survey of India (FSI) estimates that nearly 4% of our forest cover is highly prone to frequent forest fires, whereas 6% is very highly prone to fire [2]. Forest fires are a global concern due to the extensive damage they cause to ecosystems and communities. Further, they are an essential cause of land degradation, leading to biodiversity loss. While most fire events are controlled or restricted within the forest area, such events extend the forest floor and cause damage to adjoining civilian areas.

Odisha has 52,156 square kilometres of forest coverage, and forest land covers 33.5% of the overall geographical area, providing habitat for a diverse range of plants and fauna [3]. Odisha ranks ninth among Indian states in terms of the number of districts highly vulnerable to fire events. A total of 31–47% of the population lives in these vulnerable districts, reflecting their dependence on forests. Forest ecosystems face a substantial threat involving the loss of timber, fruit-bearing trees, medicinal plants, and wildlife habitats. Among the most active fire spots, Odisha recorded a remarkable 659 events in a concise span between February 1 and 8, 2023. The complex relationship between forest growth and wildfires is particularly noticeable in this region, where ground fires endanger the seedlings essential for regeneration during the monsoon season. This event disrupts the natural cycle, hindering the forest’s regenerating ability and posing a long-lasting ecological problem.

The FSI conducted a study based on spatial analysis of forest fire sites recorded between 2004 and 2021 to identify fire-prone forest areas in the country. The classification of central Odisha (which accounts for 10.66% of the fire-prone forest area in India) as an exceptionally high risk of fires underscores the region’s high susceptibility. The need to address this issue is even more urgent considering the recent incidents in Odisha, where forest fires have been widespread. Since October 2022, an extended drought has worsened the situation by promoting the rapid propagation of wildfires. The magnitude of the issue becomes evident when examining the data from the Forest Survey of India (FSI), which, as of 7 March 2023, had recorded 391 fire occurrences across the country [4] (https://fsiforestfire.gov.in/index.php, last accessed on 26 May 2024).

A complex interaction of meteorological factors and environmental circumstances determines the catastrophic consequences of wildfires. Understanding how these factors influence wildfire initiation and spread is critical for building effective early warning systems. This research investigates the complex interaction between meteorological data, environmental conditions, and wildfire incidences, explicitly focusing on forest fire incidences in Odisha, India. The temperature of air close to the surface, the surface or skin temperature, the relative humidity of air close to the surface, soil moisture, and precipitation are the dominant meteorological parameters that influence the occurrence of a fire event.

Numerous studies have reported the connection between meteorological parameters and the occurrence of fire events. In one of the earliest works involving remote sensing and geographic information systems (GISs), a study evaluated the likelihood of fires using GIS and spatial data [5]. The authors created a thorough database and mapped out areas with high fire risk. Another similar study examined the impact of climatic conditions on wildfires [6]. The authors used remote sensing data to analyse how climate variables, vegetation, and fire size interact. Their findings report that certain precipitation levels can lead to an increased risk of wildfires, highlighting the importance of geographical factors. A study specifically examined fires caused by human activities and examined the pre-monsoon conditions and the influence of different species on the fire patterns in the Himalayan region [7]. This study considered factors such as the gathering of biomass, the conditions before the monsoon season, and the levels of moisture present.

The need to use real-time monitoring, science process algorithms, and photography to improve forest fire preparedness and response was highlighted by [8]. Plant and land cover changes also significantly impact wildfires [9]. This was shown utilising digital photo categorisation and ISODATA to analyse these changes. Further, the Normalised Difference Vegetation Index (NDVI) and GISs were used to detect regions susceptible to drought and create a detailed representation of the fire damage intensity of the forest fire near Karabaglur, Turkey [10].

A time series analysis of remote sensing data to analyse fire disturbance and forest recovery across Canada shows that accounting for the temporal variability of the NDVI within unburned areas aided the definition of recovery times to pre-burn levels, which typically took five years or more following a fire [11]. An extensive knowledge of fire dynamics is obtained using remote sensing methods, especially in the summer, for wind pattern analysis and precipitation data to measure vegetation susceptibility. Furthermore, the awareness of drier weather patterns and human-induced fire hazards highlights the need for proactive forest fire control plans [9]. More studies suggest integrating burnt area estimates with ground operator data, further improving the reaction and emphasising the need for multidisciplinary approaches to battle wildfires. Geospatial techniques and metrics such as the NDVI and Differenced Normalised Burn Ratio (

d N B R

) are also used to detect drought-prone areas, i.e., high-temperature areas with low precipitation, using the two indices for burn severity mapping [12].

A predictive classifier was developed using machine learning to categorise rain and no-rain conditions from remote sensing observations [13]. In this study, the authors employed a dataset from remote sensing observations to train and evaluate the machine-learning models. A proactive approach to forecasting wildfires in Chapada das Mesas National Park achieved high accuracy using artificial neural networks and data-mining techniques [14]. Recently, an ensemble method for forest fire susceptibility modelling was developed, focusing on the Western Ghats section [15], in which the authors show that the land use land cover is an important factor having a significant role in explaining fire severity.

Forest fires are a global issue due to their destruction of ecosystems, lives, and economies. Recent studies were directed towards early forest fire detection to limit their spread using cutting-edge remote sensing and machine learning technology. A novel early-stage forest fire detection method was developed using Himawari-8 Advanced Himawari (AHI) images, a modified MOD14 algorithm, and a random forest classifier [16]. With further satellite sensors, this can precisely spot fires—especially in Australia. Improved monitoring systems to predict grassland fires in Inner Mongolia can be developed using remote sensing and random forest models [17]. A better fire control strategy is to utilise remote sensing data to monitor climatic elements and vegetation indices.

Using spectral reflectance data, random forest algorithms were shown to identify early drought in greenhouse tomatoes, addressing another drought stress issue [18]. The algorithm achieved over 85% accuracy by addressing collinearity and class imbalance, making it a reasonably affordable greenhouse irrigation technique. Data mining in education reveals that uneven data make prediction models challenging. To solve class imbalance in educational datasets, random and synthetic minority oversampling techniques were evaluated [19]. Oversampling performed better for mild imbalance and hybrid resampling for extreme imbalance. The spatial variability of Swedish forest fires using a random forest model connecting the topography, temperature, and socioeconomic factors to fire incidence was examined [20]. Their results help focused fire protection initiatives by offering vital fresh insights into the causes of forest fires in Swedish biogeographical zones.

Understanding the behaviour of forest fires and their consequences depends on the accuracy of wildfire prediction and modelling, as fire events could seriously endanger the surrounding areas and people. The synoptic atmospheric conditions at the surface and free troposphere are found to be associated with active fire months in the south central Chile region [21]. Using CiteSpace, research trends to identify gaps in wildfire forecasts were evaluated [22], revealing the significance of specific keywords such as “wildfire,” “prediction,” and “model,” in relation to trends in land use, precipitation, and vegetation. The authors advocated for adopting new data sources and advanced approaches to address these research gaps.

The geographical and temporal distribution of lightning-induced wildfires in Australia using ISS LIS and MODIS data revealed that lightning ignitions were infrequent, and thunderstorms did not influence peak wildfire activity [23]. During the dry Australian season, thunderstorms were found to have minimal impact on wildfires, with other factors playing a more significant role. A comparison of mid-latitude California’s wildfire emissions alongside high-latitude Krasnoyarsk Krai using a multi-dataset approach shows that high-latitude wildfires generated more pollutants including black carbon than mid-latitude ones [24]. This study underlined the need for thick vegetation to produce significant emissions and demanded more studies for better understanding.

Stochastic wind vectors were included in modified wildfire spread models to consider environmental uncertainty [25]. Including wind variability in wildfire models produced more accurate forecasts than deterministic models, which usually underlie underestimating wildfire spread risks. Their results show that environmental uncertainties help increase forecast accuracy and the control of wildfire risk.

Several studies were conducted on the impact of forest fires on the flora and fauna of the forest location. The study on the nature of the pyrogenic transformation of ecosystems to evaluate the success of the forest reproduction indicates the success of reforestation and, hence, a favourable forecast of post-fire recovery of light coniferous forests [26]. Understanding the relationship between the nature of damage and the response of the ecosystem components can allow us to predict the response of an ecosystem after forest fires [27].

A thorough survey of the literature shows that various approaches were used to try to detect patterns or predictive factors behind the occurrence of forest fire events. These approaches include using remote sensing observations coupled with GIS and meteorological observations. However, most of the earlier work points to a diagnostic mode of investigation that studies the underlying cause of a fire event. However, very few attempts were made to forecast the same. Despite being a recurrent phenomenon, existing research predominantly adopts a diagnostic approach, analysing the causes and patterns of past events. This study identifies a critical gap in the current literature: a proactive and predictive model for forest fires in Odisha. Hence, the present study aims to develop an integrated anomaly detection and early warning system that utilises meteorological parameters and historical wildfire data to anticipate and predict future incidents. So, the objectives of this study are as follows:

To consider basic meteorological parameters and derived parameters on a daily scale and study its effect on forest fire occurrence with different time lags using the mutual information approach.
Based on the mutual information approach, identify the most important parameters to train a random forest model to predict the occurrence of forest fire with one day in advance.
To conduct a detailed study on the performance and robustness of the random forest algorithm in addressing the class imbalance as well as the sample size.

2. Problem Description

The current study aims to predict forest fire occurrence when the meteorological observation data are available at the forest area a day in advance. When solar radiation is incident on a land surface, the same is absorbed in the form of sensible heat, thereby causing the land surface temperature to increase. Elevated temperatures contribute to greater evaporation, dry out plants, and make them more flammable. Further, when the surface gets hot enough, a part of that sensible heat is transferred to the surrounding air through convection, which causes the air temperature to increase. If there is moisture at the surface, the surface absorbs the incident solar radiation partly due to sensible heat, while the remaining is conserved as latent heat. The exchange of latent heat results in the evaporation of water and its subsequent mixing with the ambient air, leading to an increase in humidity near the surface. The complex interaction of land and atmosphere regulates the local weather conditions that govern the energy and moisture transport.

Another essential metric is soil moisture, which indicates the water in the soil. Dry soil indicates lesser availability of water to trees, making them dry and hence a good fuel source. Therefore, low soil moisture levels suggest a higher likelihood of ignition and prolonged fire spread. Adequate soil moisture, on the other hand, functions as a natural firebreak, slowing the spread of wildfires. The prolonged effect of drought conditions results in a lack of moisture availability at the surface, which causes the sensible heat component to dominate. Such a situation results in a very high surface temperature, which is favourable for fire incidents to occur in the presence of dry vegetation. However, precipitation in an area could mitigate this effect by reducing the sensible heat and overall land surface temperature. The present study considers the surface temperature, air temperature close to the surface, relative humidity, and soil moisture as critical meteorological parameters to forecast forest fire occurrence one day in advance. Though the literature suggests the inclusion of additional parameters such as topography, vegetation area, etc., we focus only on these four meteorological parameters due to their wide availability across all weather stations.

Since the relationship between the meteorological parameters and the occurrence of fire events is considered to have a time lag, the complex relationship between the two could best be “learned” by employing a non-parametric-based machine learning algorithm. Given the meteorological observations a day in advance, we considered the random forest model as a classifier to predict the future event as fire or no fire.

2.1. Data

The meteorological parameters such as the surface temperature, soil moisture, air temperature near the surface, and relative humidity were downloaded from the Reanalysis Data Services (RDS), provided by the National Centre for Medium-Range Weather Forecasting (NCMRWF), under the aegis of the Ministry of Earth Sciences. The RDS service provides the regional atmospheric reanalysis data over the Indian subcontinent obtained from the Indian Monsoon Data Assimilation and Analysis (IMDAA). The IMDAA system [28] is based on the UK Met Office’s four-dimensional variational data assimilation (4DVAR) and unified model. The IMDAA system provides the reanalysis data at a regional scale of 12 km with 63 vertical levels up to a height of 40 km, updated hourly. The meteorological data were obtained from the RDS from January to June between 2014 and 2020 for the Odisha region bounded between 17.49 N–22.34 N latitude and 81.27 E–87.29 E longitude. The study area showing the distribution of forest areas across the state of Odisha is shown in Figure 1. Data at six intervals, collected every four hours, are averaged to obtain the daily average temperature, soil moisture, and humidity values.

We considered the data provided by the Forest Survey of India (FSI) for the spatial occurrence of fire events. The Forest Survey of India (FSI) functions under the Ministry of Environment, Forest, and Climate Change, Government of India. It carries a principal mandate to conduct surveys and assess forest resources in the country. As part of various forest survey and assessment activities, the FSI has developed a Fire Alert System using near-real-time satellite data from the Moderate Resolution Imaging SpectroRadiometer (MODIS) Aqua and Terra satellites [29]. The daily fire alerts are issued at 1 km × 1 km spatial resolution at about 10:30 a.m. and 10:30 p.m. using the Aqua satellite and 1:30 AM and 1:30 AM from the Terra satellite to users automatically. The fire pixels identified by the MODIS platform generate a spatial database of archival forest fire events in the form of a fire flag matrix.

The original dataset consists of four parameters, as discussed in the above section: the surface temperature, air temperature close to the surface, soil moisture, and relative humidity. The daily average values of the four parameters and their daily maximum and minimum values are considered in our analysis. This results in twelve original parameters under consideration. However, the complex relationship between the meteorological variables and the occurrence of a fire event with a time lag requires additional derived parameters based on the original dataset. As such, twelve more variables were derived and added to the total variables list as shown in Table 1.

Data pre-processing ensures a comprehensive exploration of the dynamical aspect of the meteorological parameters, establishing a foundation for a thorough understanding of how they interact in wildfire incidents.

2.2. Fire Incident Data and Fire Flag Matrix

The archival fire data are downloaded from the FSI website, specifically from the MODIS database between 2014 to 2020, from January to June. While the MODIS fire data are available at a very high resolution of

1 km \times 1 km

, the meteorological parameters from the RDS are available at a coarser resolution of

12 km \times 12 km

. Hence, an RDS pixel is considered as a fire pixel if a MODIS-identified fire incident occurs within a distance of

6 km

from the centre point. The threshold distance ensures that the fire pixel is within the bounds of a given RDS pixel. Latitude and longitude distances were also set to a threshold of 0.06° each in the x and y directions to ensure the closest point was flagged. This approach results in the generation of a spatial Boolean matrix, henceforth called the fire flag matrix, wherein the spatial location in which fire is detected is set as ‘1’, and other place values are set to ‘0’.

3. Method

3.1. Mutual Information

The degree of statistical dependency or information sharing between two variables is measured by a metric called mutual information, or MI. MI is defined based on information gain proposed by [30]. Put otherwise, MI measures the extent to which knowledge of one variable can be used to infer the value of another. Lower mutual information denotes less dependence between the variables, whereas higher mutual information shows a more vital link or dependence. The meteorological parameters explained in Section 2.1 along with the fire flag matrix are utilised to find the mutual information between the continuous (meteorological variables) and discrete (fire flag) variables, respectively. The mutual information between these environmental parameters and wildfire occurrence will be more significant if there is a substantial correlation between the two. The MI function leverages a histogram to estimate the joint probability distribution, subsequently computing the marginal probability distributions, as shown by the following equation:

M I (X, Y) = \sum_{y \in Y} \sum_{x \in X} P (x_{i}, y_{i}) \times \log (\frac{P (x_{i}, y_{i})}{(P (x_{i}) \times P (y_{i})})

(1)

The resulting MI values for each input variable listed in Table 1 against the discrete fire flag matrix are stored and compared. The MI values obtained using Equation (1) provide valuable information on the predictive capabilities of these continuous variables, assisting in selecting features for the development of models or early warning systems for wildfire prediction.

3.2. Synthetic Profiles

Synthetic profile generation is performed to balance the class distribution of the dataset for model training and testing. Class imbalance occurs when the data of one class dominate over the other classes, for example, the number of no-fire pixels outnumbers the fire pixels significantly. In a highly imbalanced forest fire dataset, adding the synthetic profiles ensures that the random forest model receives a more representative set of examples for both classes, namely, the fire and no-fire classes, preventing bias towards the majority class. In the present study, we generate synthetic samples using two different approaches. The first approach is using the weighted sampling without replacement method, in which the algorithm draws samples from a finite set with associated weights, accommodating different probabilities for each element [31]. Key steps include initialising a binary search tree, associating values with nodes, and efficiently updating the tree during sampling. This, in turn, improves the model’s ability to generalise and make accurate predictions for underrepresented cases, such as forest fire incidents.

The second method is called the Synthetic Minority Over-sampling Technique (SMOTE). This approach is based on data augmentation designed to address class imbalance in machine learning datasets [32]. Mathematically, the process begins by identifying the minority class instances. For each data point

x_{i}

in the minority class, SMOTE selects k-nearest neighbours, denoted as

x_{i, 1}

,

x_{i, 2}, \dots x_{i, k}

within the feature space. A random neighbour, say

x_{i, j}

, is then chosen, and a synthetic sample

x_{s y n t h}

is generated using the formula:

x_{s y n t h} = x_{i} + λ (x_{i, j} - x_{i})

(2)

Here, in Equation (2),

λ

is a random value that varies between 0 and 1, introducing controlled randomness in the interpolation process. The desired oversampling ratio determines the number of synthetic samples created. Figure 2 shows the flowchart of generating a sample using the SMOTE technique with an illustrative example for k = 4.

3.3. Random Forest Classifier

The random forest technique is utilised to classify fire events using meteorological factors by employing a collection of decision trees. Each decision tree aims to minimise the impurity of its nodes by quantifying the entropy of data points. The first random forest algorithm was developed by [33] following the principles of stochastic modelling. The algorithm employs multiple decision trees in randomly selected subspaces of the feature space. The number of decision trees is not known a priori, and it depends on the complexity of the data. Using a systematic approach, we arrived at a random forest model with 300 decision trees—a parameter vital for the prediction ability of the ensemble. This iterative approach repeatedly divides the dataset according to feature requirements, to maximise the uniformity within the resulting subsets. The random forest algorithm combines predictions from several trees, guaranteeing reliable learning even when individual trees have limited predictive power. Regarding fire classification, the algorithm’s implementation entails combining data, balancing classes, and carefully calculating the training and testing samples. The model’s performance is evaluated using precision, recall, and F1-score metrics, which consider the dataset’s intrinsic imbalance caused by the limited number of fire events. These evaluation metrics are described in the next section. The algorithm addresses the class imbalance by selectively selecting synthetic profiles, providing a sophisticated method for reflecting the complexities of fire incidents in meteorological settings. “Selectively selecting” in this sense is the ability of the algorithm to carefully select synthetic profiles most fitting for the features of fire events. This guarantees that the training data capture the several and complicated circumstances under which fires arise, therefore enhancing the accuracy and resilience of the model in forecasting actual fire incidents.

3.4. Evaluation Metrics

The class imbalance dataset is often challenging to evaluate. Often, the accuracy of the classifier is used as a metric. However, if over 95% of the data are dominated by one class over the other, it is possible to achieve greater accuracy, even when the classifier misses all the true incidents. Hence, in the evaluation phase, the random forest classifier’s performance is assessed focusing on the precision, recall, accuracy, and F1-score metrics. These metrics are explained briefly in what follows:

Precision (

P r

) quantifies the ratio of correctly classified instances as fire (True Positives, TP) to the total predicted fire instances, including the wrong predictions.

Mathematically,

P r = \frac{T P}{T P + F P}

(3)

Recall (

R e

) assesses the model’s ability to identify all actual fire occurrences correctly,

R e = \frac{T P}{T P + F N}

(4)

The F1-score is a comprehensive metric that balances precision and recall through their harmonic mean, as given by the following expression

F 1 = \frac{2 \times P r \times R e}{P r + R e}

(5)

These metrics, derived from the confusion matrix, provide a nuanced understanding of the classifier’s effectiveness in delineating fire incidents. The accuracy, calculated as the ratio of correctly predicted instances to the total cases, offers a broader perspective but must be interpreted cautiously in imbalanced datasets.

Mathematically, accuracy is

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(6)

The metrics collectively, as shown in Equations (3)–(6), constitute a rigorous quantitative assessment, ensuring a comprehensive evaluation of the random forest classifier’s performance in discerning fire patterns from the meteorological parameters.

4. Results

4.1. Mutual Information Analysis to Find Significant Parameters

Using the meteorological parameters, namely, soil moisture, the air temperature close to the surface, relative humidity, and the skin temperature, we considered 24 different parameters as listed in Table 1. The meteorological parameters such as soil moisture (SM), relative humidity (RH), and temperatures (

T 2

and

T S

) taken on the ‘i’th day provide critical information about environmental conditions that is likely to affect the future weather scenario. In addition, parameters based on the difference in temperature between two consecutive days carry critical information about the rate at which the soil surface becomes heated up, while the difference between the maximum and minimum values indicates the heat built up in sensible and latent forms over a given day.

These parameters directly influence the fire behaviour and are crucial for accurately predicting wildfire occurrence and spread. We examined the information content of each derived variable within the framework of the Mutual Information (MI) analysis about fire incidents. The rationale behind the choice of various derived parameters is to capture detailed and temporal connections that could enhance the reliability of the predictive model.

Finding mutual information for the parameters observed for several years is more important than seeing it for individual years. This is because combining data from multiple years makes the analysis less susceptible to the influence of outliers or anomalies in any particular year, such as excess rainfall or severe drought. Additionally, it enables the identification of long-term trends and seasonal variations in wildfire occurrence, which may not be apparent when analysing data from individual years. This comprehensive approach enhances the accuracy and reliability of wildfire detection models based on random forest or other machine learning techniques.

The MI values are calculated for each of the 24 variables and the fire flag matrix. Since the meteorological parameters are continuous, they are binned against their minimum and maximum values with a user-specified number of bins. This allows us to calculate the joint and individual probabilities based on the frequency of occurrence of the respective values in a particular bin. Based on a trial-and-error approach, we found that a number of bins equal to 35 consistently provided greater values for nearly all parameters. So, we fixed the bins to 35 while calculating the individual and joint probabilities of various input parameters. The mutual information between each variable and the fire flag matrix for different combinations of years’ data is shown in Figure 3.

From Figure 3, it can be seen that almost all meteorological variables carry significant information when the data for 2015 to 2017 were considered. From this, we conclude that the data from 2015–2017 are sufficient enough to capture the dynamical relationship between the meteorological factors and the fire flag, accounting for intra- and inter-annual effects. By setting a threshold MI of 0.001 for the data from 2015–2017, we conclude that the first nine parameters alone contain more information sufficient enough for developing a good classifier. The mutual information values for each of the parameters is seen in Figure 4.

4.2. Random Forest

Based on the previous result, we consider the nine significant parameters that can help us to detect the occurrence of a fire event a day in advance reliably. The following section shows the effect of imbalance on the performance of the random forest classifier.

4.2.1. Effect of Class Imbalance on Random Forest Classifier Performance

This case presents the impact of varying the ratio of positive (fire, class 1) to negative (non-fire, class 0) instances during training and testing. The synthetic fire flag values having 1s were kept fixed at 10,000 samples, and the number of actual 0s (no fire incidents) was increased to achieve imbalance ratios of fire—with no fire ranging from 1:1 to 1:15, where 1:1 is an equal representation of both classes, while 1:15 represents 1 fire flag data for every 15 no-fire events. Testing was conducted with the entire set of actual 1s from 2015 to 2017 as selected after the MI analysis, while the synthetic 0s were varied to maintain various imbalance ratios. In Figure 5 and Figure 6, for both the methods to generate synthetic profiles, the classification model’s performance metrics, including the average accuracy, precision, recall, and F1 score, were plotted for the test data, which were not used for training the classifier model.

From both Figure 5 and Figure 6, we observe that the performance of the random forest classifier depends on how we handle the imbalance of classes. Distinct trends are shown when comparing the two methods, which show how well the SMOTE and data sampling approaches handle class imbalance in random forest (RF) models. With data sampling, the average accuracy and precision consistently increase as the imbalance ratio increases from 1:1 to 1:15. The model effectively categorises the event as fire or no fire, even with imbalance. When the ratio imbalance rises, so does the SMOTE performance. The recall and F1 scores decrease significantly when the ratio exceeds 6. This implies that although SMOTE improves the model’s classification performance at the beginning, as the imbalance ratio rises, it is difficult to compromise recall and precision. Recall is a trade-off, though; it worsens as the ratio increases. This implies that even if the model improves at accurately categorising members of the majority class, it can have trouble identifying the members of the minority class.

4.2.2. Effect of Training Sample Size on Random Forest Classifier Performance

In this second case, our study focuses on maintaining a fixed training and testing ratio of 1:1 while varying the size of the synthetic 1s in the training set to match the imbalance in the dataset. The actual 1s remained constant at 4117 (pertinent to the number of all fire incidents in the database) samples each for testing. The training set comprised varying numbers of synthetic 1s, ranging from 2500 to 120,000. However, testing is done using the entire set of actual 1s and 0s from 2015 to 2017. The results highlight the model’s robustness and capability to adapt to different quantities of synthetic data as can be seen from Figure 7 and Figure 8. We note that as the size of the synthetic 1s is increased, the model demonstrated superior performance, achieving higher accuracy, precision, recall, and F1 scores. This study provides insights into the impact of varying synthetic sample sizes on model outcomes while maintaining a balanced training and testing ratio.

Interesting insights are obtained by comparing the performance of the weighted sampling and SMOTE techniques in random forest (RF) models with different training sample sizes. With data sampling, the average accuracy, precision, recall, and F1 score consistently increase from 5000 to 240,000 training samples. The robustness and capacity to generalise effectively with bigger training volumes of the model are demonstrated by its superior performance across every factor. On the other hand, using SMOTE, the performance improves initially as the training sample size grows, but after a while, the performance saturates. The model shows good accuracy, precision, recall, and F1 score results; however, there is not as much improvement as with data sampling.

4.2.3. Testing Performance of Random Forest Algorithm Using Best Imbalance Ratio and Sample Size

From Section 4.2.1 and Section 4.2.2, we observed that the performance of the random forest algorithm peaked when the imbalance ratio was 1:9 with a sample size of 120,000 data. The trained random forest algorithm is used to predict the occurrence of forest fire with the entire dataset covering the 2014 to 2020 time period with actual meteorological data without using synthetic profiles. This allows us to integrate the random forest classifier to detect an anomaly in the meteorological data observed from a local weather station that can act as an early warning system to the forest fire control unit.

Figure 9 shows the performance metrics of the random forest classifier when all the real 1s and 0s are used. The model itself is trained using synthetic 1s and 0s generated using the weighted data sampling and SMOTE techniques as explained in the earlier sections. We can see that the random forest algorithm predicted the forest fire occurrence very well. From Figure 9, we also observe that addressing the imbalance ratio using the weighted data sampling approach shows a much better performance than the SMOTE technique.

5. Discussion

Forest fire prediction has been approached in the literature using various techniques, often relying on remote sensing, GIS, and meteorological data. Nevertheless, the majority of these studies adopt a diagnostic approach, examining historical events without a significant emphasis on predicting future occurrences. To address this gap, our work sought to create a predictive model utilising fundamental meteorological parameters and derived variables to forecast forest fire incidents in Odisha. The mutual information method was utilised to determine critical characteristics affecting fire behaviour, which were subsequently employed to train a random forest model for predicting fire outbreaks one day in advance. Furthermore, we concentrated on addressing issues associated with class imbalance and identifying ideal sample sizes for model training, which were essential for improving the resilience and dependability of the predictions. We illustrate the main distinctions in methodology, data, and outcomes by contrasting our study’s methods and findings with those of a number of well-known forest fire prediction studies.

For comparison, Ref. [34] suggested using the relative humidity and cumulative precipitation as input features to predict fire/no-fire events. The authors have used the artificial neural network (ANN) and support vector machine (SVM) approaches to predict the occurrence of fire events. Their results showed a prediction accuracy of 93–94% using the SVM technique and 89-91% using ANN. In [35], the authors considered the NDVI, land surface temperature, and thermal anomaly as input to predict the occurrence of forest fires using ANN and SVM. The input parameters were acquired from MODIS satellite observations. Their model was able to detect forest fires with 98.32% accuracy. In both of these works, the authors did not discuss the class imbalance effect and other metrics to evaluate the model performance such as precision, recall, and F1 score. Ref. [36] used nine spatially explicit explanatory variables, namely, elevation, slope angle, aspect, average annual temperature, drought index, river density, land cover, and distance from roads and residential areas. The authors evaluated four different models, namely, Bayes Network, Naive Bayes, Decision Trees, and Multivariate Logistic Regression, for prediction and mapping of fire susceptibility areas across the Pu Mat National Park, Vietnam. Their results show that the Bayes Network model outperformed the other models with an area under the receiver operating characteristic score of 0.96.

The approaches employed by [14] and our investigation diverge considerably in approach and emphasis. The authors apply ANN and Classification Rules (CR) for wildfire prediction, utilising a three-layer ANN including 13 hidden neurones and rule-based models to categorise occurrences as “wildfire active” or “not active.” The ANN attained an accuracy of 84.79% and a precision of 40.01%, whereas the CR model exhibited a slightly lower accuracy of 83.71% but a significantly higher precision of 66.06%. Conversely, our study evaluates data sampling and SMOTE, two synthetic oversampling methodologies, to improve classification performance across multiple metrics including accuracy, precision, recall, and F1 score. Data sampling consistently surpassed SMOTE, attaining an average accuracy of between 0.92 and 0.99 and precision from 0.88 to 0.98, whereas SMOTE’s accuracy varied from 0.88 to 0.97 and precision from 0.85 to 0.96. In contrast to DeSouza et al., who concentrate on rule-based classifications and artificial neural networks, our research highlights the impact of sampling approaches on classification metrics, specifically demonstrating that data sampling outperforms in precision and F1 score, although SMOTE is superior in recall. Our study’s methodology demonstrates superior techniques for managing imbalanced datasets, while [14] emphasise the significance of interpretability and rule-based models in predictive tasks.

In contrast, the authors of [37] create a forest fire danger forecasting system (FFDFS) to forecast fire danger in northern Canada by utilising precipitable water, NDVI, NMDI, and surface temperature taken from MODIS. With a 95.51% classification accuracy for fires falling into “moderate” to “extremely high” hazard categories, their algorithm predicts fire danger across five categories (from extremely high to low). Their approach offers useful information about overall fire risk over a longer period of time, but it is not intended to forecast the exact frequency of fire occurrences on a daily basis, which is the main goal of our research. Additionally, the problem of class imbalance—a crucial component in raising the predictive accuracy of our random forest model—is not addressed in their study.

In a similar study [38], the wildfire susceptibility in Irkutsk Oblast, Russia, was mapped using random forest models. In order to create risk maps for regions that are prone to fire, their study takes into account a variety of factors, such as vegetation type, human activity, and meteorological data. Similar to our findings, they claim a high accuracy (0.89), F1-score (0.88), and AUC (0.96). However, instead of accurately forecasting the likelihood of a fire, they continue to map susceptibility. Furthermore, even though their model is strong, it does not specifically address class imbalance, a problem that our study thoroughly tackles using data sampling strategies, producing a more balanced performance across precision, recall, and F1 scores.

A different strategy is used by [39] based on the LightGBM model to predict fires in China’s Central–South region. With accuracy, precision, and F1 scores above 85% and AUC values above 89%, their work focuses on spatial analysis utilising GIS to predict the likelihood of fire. They attain great predictive accuracy, as in the present study, but instead of concentrating on short-term event prediction, the authors prioritise risk zoning and large-scale spatial grouping. Unlike us, they do not employ comprehensive meteorological data for daily forecasts, and their model does not thoroughly address the problem of class imbalance.

In order to assess fire risk characteristics and forecast fire occurrence in central China, [40] employs a deep learning methodology, combining convolutional neural networks (CNNs) with geographic information systems (GISs). The scores we attain are comparable to the high accuracy (86.00%), precision (88.00%), recall (87.00%), and AUC (90.50%) displayed by their model. Their research, however, is less concerned with daily forecasts based on meteorological anomalies and more with zoning management techniques and the extraction of spatial features. CNNs are excellent for spatial analysis, but our method, which combines mutual information with a random forest model, performs better in short-term forecasting and gives us an advantage for real-time fire event forecasting.

Our research, in contrast to past studies, is on incorporating meteorological parameters into a predictive model that anticipates forest fire occurrences and proactively addresses data issues such class imbalance. This makes it possible for us to perform more evenly across a range of parameters, with precision and F1 scores continuously over 0.85. Our model is distinct in that it can forecast fire events one day ahead of time, giving early warning systems a crucial lead time, whereas many other studies concentrate on susceptibility mapping, fire danger classification, or spatial analysis.

Despite the exclusion of additional parameters such as the topography, aspect ratio, vegetation index, etc., the techniques developed in the present study are capable of integrating any anomaly detection in the atmospheric parameters to predict the occurrence of forest fire and take suitable actions to mitigate them. However, further studies can help us to increase the lead time by up to a few days in advance.

6. Conclusions

This study aims to predict the occurrence of a wildfire event rather than classifying them for diagnostic purposes. For this, we considered four essential meteorological parameters that influence the occurrence of fire events such as the surface and air temperatures, relative humidity, and soil moisture. These parameters are chosen due to their wide availability with reduced uncertainty across many places around the world. We derived 20 additional parameters that represent the heat energy in various forms, using the four essential parameters. Our first objective was to identify key meteorological parameters that exhibit significant correlations with the occurrence of a fire event a day in advance. The mutual information approach was used for this purpose to study the influence of each meteorological variable on its predictive capability of the fire event. Based on the mutual information study, nine meteorological parameters are identified that carry significant information about the fire event.

The nine meteorological parameters observed between 2015 and 2017 are used to train the random forest classifier model. The forecast is achieved by feeding the meteorological data from the ith day as input to the random forest classifier, while the output is the classification in the form of fire or no fire on the i+1th day. The performance of the classifier depends on the class imbalance in the dataset. To address the imbalance in the two classes, we used the weighted data sampling and SMOTE approaches. Both the approaches show consistent improvement in balancing the two classes.

The last objective was realised by conducting an extensive study on the effect of class imbalance and sample size on the performance of the random forest algorithm. Using four different metrics, namely, the accuracy, precision, recall, and F1 score, we found that as the class imbalance ratio increases, the performance of the classifier improves in both weighted data sampling as well as the SMOTE approach. The study on the effect of sample size shows that as the sample size for training the classifier increases, both the sampling techniques show consistently good performance. The evaluation metric values in both cases approach close to 1 as the sample size increases. The results clearly demonstrate model durability and adaptation to synthetic data amounts. SMOTE and weighted data sampling addressed class imbalance and increased model performance with higher training sizes, but the weighted data sampling technique satisfied the project goals better with the real dataset.

Author Contributions

H.H.: Methodology, Software, Validation, Formal analysis, Resources, Writing—original draft preparation. S.R.K.: Conceptualization, Writing—review and editing, Supervision, Data Curation, Project Administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data used in the present study are available as open source. The RDS data can be downloaded from https://rds.ncmrwf.gov.in/, accessed on 6 June 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Parza, P.S.A.; Zehra, K. Forest Fires and Climate Change: Causes, Effect, and Management. Disaster Dev. 2022, 11, 107–123. [Google Scholar]
Kumar, S.; Chaudhary, A.; Biswas, T.; Ghosh, S. Identification of Fire Prone Forest Areas Based on GIS analysis of Archived Forest Fire Points Detected in the last Thirteen Years. Tech. Inf. Ser. 2019, 1, 7. [Google Scholar]
Diswal, D. Highlights of Odisha Forestry Sector 2024. Technical Report. 2024, p. 2-II. Available online: https://odishaforest.in/admin/data/documents/publication_file_1682200290.pdf (accessed on 26 May 2024).
Forest Fire Alerts System 3.0. Forest Survey of India. Available online: https://fsiforestfire.gov.in/index.php (accessed on 26 May 2024).
Jain, A.; Ravan, S.A.; Singh, R.K.; Das, K.K.; Roy, P.S. Forest fire risk modelling using remote sensing and geographic information system. Curr. Sci. 1996, 70, 928–933. [Google Scholar]
Sati, S.P.; Juyal, N. Recent forest fire in Uttarakhand. Curr. Sci. 2016, 111, 1893. [Google Scholar]
Singh, R.D.; Gumber, S.; Sundriyal, R.C.; Ram, J.; Singh, S.P. Chir pine forest and pre-monsoon drought determine spatial, and temporal patterns of forest fires in Uttarakhand Himalaya. Trop. Ecol. 2024, 65, 32–42. [Google Scholar] [CrossRef]
Jha, C.S.; Gopalakrishnan, R.; Thumaty, K.C.; Singhal, J.; Reddy, C.S.; Singh, J.; Pasha, S.V.; Middinti, S.; Praveen, M.; Murugavel, A.R.; et al. Monitoring of forest fires from space–ISRO’s initiative for near real-time monitoring of the recent forest fires in Uttarakhand, India. Curr. Sci. 2016, 110, 2057. [Google Scholar]
Reddy, C.; Sudhakar, K.; Navatha, B.; Rachel, M.S.R.; Murthy, P. Manikya Reddy. Forest Fire Monitoring in Sirohi District, Rajasthan Using Remote Sensing Data. Curr. Sci. 2009, 97, 1287–1290. [Google Scholar]
Öncü, G.; Çorumluoğlu, Ö. Assessment of Forest Fire Damage Severity By Remote Sensing Techniques. Int. J. Environ. Geoinform. 2023, 10, 151–158. [Google Scholar] [CrossRef]
Goetz, S.J.; Fiske, G.J.; Bunn, A.G. Using satellite time-series data sets to analyze fire disturbance and forest recovery across Canada. Remote Sens. Environ. 2006, 101, 352–365. [Google Scholar] [CrossRef]
Rakholia, S.; Mehta, A.; Suthar, B. Forest fire monitoring of Shoolpaneshwar Wildlife Sanctuary, Gujarat, India using geospatial techniques. Curr. Sci. 2020, 119, 1974–1981. [Google Scholar]
Anand, A.; Kannan, S.R. Rain/no-rain classification from combined radar—Radiometer data using machine learning. Remote Sens. Appl. Soc. Environ. 2022, 25, 100682. [Google Scholar] [CrossRef]
de Souza, F.T.; Koerner, T.C.; Chlad, R. A data-based model for predicting wildfires in Chapada das Mesas National Park in the State of Maranhao. Environ. Earth Sci. 2015, 74, 3603–3611. [Google Scholar] [CrossRef]
Babu, K.N.; Gour, R.; Ayushi, K.; Ayyappan, N.; Parthasarathy, N. Environmental drivers and spatial prediction of forest fires in the Western Ghats biodiversity hotspot, India: An ensemble machine learning approach. For. Ecol. Manag. 2023, 540, 121057. [Google Scholar] [CrossRef]
Maeda, N.; Tonooka, H. Early Stage Forest Fire Detection from Himawari-8 AHI Images Using a Modified MOD14 Algorithm Combined with Machine Learning. Sensors 2023, 23, 210. [Google Scholar] [CrossRef]
Chang, C.; Chang, Y.; Xiong, Z.; Ping, X.; Zhang, H.; Guo, M.; Hu, Y. Predicting Grassland Fire-Occurrence Probability in Inner Mongolia Autonomous Region, China. Remote Sens. 2023, 15, 2999. [Google Scholar] [CrossRef]
Fang, S.-L.; Cheng, Y.-J.; Tu, Y.-K.; Yao, M.-H.; Kuo, B.-J. Exploring Efficient Methods for Using Multiple Spectral Reflectance Indices to Establish a Prediction Model for Early Drought Stress Detection in Greenhouse Tomato. Horticulturae 2023, 9, 1317. [Google Scholar] [CrossRef]
Wongvorachan, T.; He, S.; Bulut, O. A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information 2023, 14, 54. [Google Scholar] [CrossRef]
Cimdins, R.; Krasovskiy, A.; Kraxner, F. Regional Variability and Driving Forces behind Forest Fires in Sweden. Remote Sens. 2022, 14, 5826. [Google Scholar] [CrossRef]
McWethy, D.B.; Garreaud, R.D.; Holz, A.; Pederson, G.T. Broad-Scale Surface and Atmospheric Conditions During Large Fires in South-Central Chile. Fire 2021, 4, 28. [Google Scholar] [CrossRef]
Pan, M.; Zhang, S. Visualization of Prediction Methods for Wildfire Modeling Using CiteSpace: A Bibliometric Analysis. Atmosphere 2023, 14, 1009. [Google Scholar] [CrossRef]
Safronov, A.N. Spatio-Temporal Assessment of Thunderstorms’ Effects on Wildfire in Australia in 2017–2020 Using Data from the ISS LIS and MODIS Space-Based Observations. Atmosphere 2022, 13, 662. [Google Scholar] [CrossRef]
Shikwambana, L.; Habarulema, J.B. Analysis of Wildfires in the Mid and High Latitudes Using a Multi-Dataset Approach: A Case Study in California and Krasnoyarsk Krai. Atmosphere 2022, 13, 428. [Google Scholar] [CrossRef]
Masoudian, S.; Sharples, J.; Jovanoski, Z.; Towers, I.; Watt, S. Incorporating Stochastic Wind Vectors in Wildfire Spread Prediction. Atmosphere 2023, 14, 1609. [Google Scholar] [CrossRef]
Atutova, Z.V. Post-fire restoration of pine forests in the Badary area, Tunkinskiy National Park, Russia. Nat. Conserv. Res. 2023, 8, 22–32. [Google Scholar] [CrossRef]
Vilkova, V.V.; Kazeev, K.S.; Privizentseva, D.A.; Nizhelsky, M.S.; Kolesnikov, S.I. Activity in post-pyrogenic soils in the Utrish State Nature Reserve (Russia) in the early succession stages. Nat. Conserv. Res. 2023, 8, 10–23. [Google Scholar] [CrossRef]
Rani, S.I.; Arulalan, T.; George, J.P.; Rajagopal, E.N.; Renshaw, R.; Maycock, A.; Barker, D.; Rajeevan, M. IMDAA: High Resolution Satellite-Era Reanalysis for the Indian Monsoon Region. J. Clim. 2021, 34, 5109–5133. [Google Scholar] [CrossRef]
Justice, C.O.; Giglio, L.; Roy, D.; Boschetti, L.; Csiszar, I.; Davies, D.; Korontzi, S.; Schroeder, W.; O’Neal, K.; Morisette, J. MODIS-Derived Global Fire Products. In Land Remote Sensing and Global Environmental Change; Remote Sensing and Digital Image Processing; Springer: New York, NY, USA, 2010; Volume 11. [Google Scholar] [CrossRef]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Wong, C.K.; Easton, M.C. An Efficient Method for Weighted Sampling without Replacement. SIAM J. Comput. 1980, 9, 111–113. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar]
Ho, T.K. Random Decision Forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [Google Scholar]
Sakr, G.E.; Elhajj, I.H.; Mitri, G. Efficient forest fire occurrence prediction for developing countries using two weather parameters. Eng. Appl. Artif. Intell. 2011, 24, 888–894. [Google Scholar]
Sayad, Y.O.; Mousannif, H.; Moatassime, H.A. Predictive modeling of wildfires: A new dataset and machine learning approach. Fire Saf. J. 2019, 104, 130–146. [Google Scholar] [CrossRef]
Pham, B.T.; Jaafari, A.; Avand, M.; Al-Ansari, N.; Du, T.D.; Yen, H.P.H.; Phong, T.V.; Nguyen, D.H.; Le, H.V.; Mafi-Gholami, D.; et al. Performance Evaluation of Machine Learning Methods for Forest Fire Modeling and Prediction. Symmetry 2020, 12, 1022. [Google Scholar] [CrossRef]
Chowdhury, E.H.; Hassan, Q.K. Development of a New Daily-Scale Forest Fire Danger Forecasting System Using Remote Sensing Data. Remote Sens. 2015, 7, 2431–2448. [Google Scholar] [CrossRef]
Nikolaychuk, O.; Pestova, J.; Yurin, A. Wildfire Susceptibility Mapping in Baikal Natural Territory Using Random. Forests 2024, 15, 170. [Google Scholar] [CrossRef]
Hai, Q.; Han, X.; Vandansambuu, B.; Bao, Y.; Gantumur, B.; Bayarsaikhan, S.; Chantsal, N.; Sun, H. Predicting the Occurrence of Forest Fire in the Central-South Region of China. Forests 2024, 15, 844. [Google Scholar] [CrossRef]
Guo, Y.; Hai, Q.; Bayarsaikhan, S. Utilizing Deep Learning and Spatial Analysis for Accurate Forest Fire Occurrence Forecasting in the Central Region of China. Forests 2024, 15, 1380. [Google Scholar] [CrossRef]

Figure 1. The forest map of the state of Odisha (reproduced from https://www.gisodisha.nic.in/Statem/Forest.pdf, last accessed on 26 May 2024).

Figure 2. Flowchart for SMOTE method with an example.

Figure 3. Mutual information between each meteorological variable with respect to fire flag for various combinations of years.

Figure 4. MI of various meteorological parameters for 2015–2017 data.

Figure 5. Effect of imbalance in classes on the performance of random forest classifier by weighted sampling technique.

Figure 6. Effect of imbalance in classes on the performance of random forest classifier by SMOTE method.

Figure 7. Effect of sample size for a fixed imbalance ratio on random forest classifier by weighted sampling technique.

Figure 8. Effect of sample size for a fixed imbalance ratio on random forest classifier by SMOTE technique.

Figure 9. Testing the performance of random forest classifier with real dataset from 2014 to 2020 time period.

Table 1. List of variables along with the time lag considered for the complex variables.

Parameter	Identifier	Brief Description of the Parameter
${R H}_{m a x, i}$	maxRH	Maximum relative humidity on the i’th day, in %
${R H}_{m i n, i}$	minRH	Minimum relative humidity on the ith day, in %
${R H}_{a v e, i}$	og_avgRH	Average relative humidity on the ith day, in %
${S M}_{m a x, i}$	maxSM	Maximum soil moisture on the ith day
${S M}_{m i n, i}$	minSM	Minimum soil moisture on the ith day
${S M}_{a v e, i}$	og_avgSM	Average soil moisture on the ith day
${T 2}_{m a x, i}$	maxT2m	Maximum temperature at 2 m on the ith day
${T 2}_{m i n, i}$	minT2m	Minimum temperature at 2 m on the ith day
${T 2}_{a v e, i}$	og_avgT2m	Average temperature at 2 m on the ith day
${T S}_{m a x, i}$	maxTsk	Maximum skin temperature on the ith day
${T S}_{m i n, i}$	minTsk	Minimum skin temperature on the ith day
${T S}_{a v e, i}$	og_avgTsk	Average skin temperature on the ith day
${R H}_{a v e, i 12}$	avg_RH	${R H}_{a v e, i - 1} - {R H}_{a v e, i - 2}$
${R H}_{s h}$	RH_sh	${R H}_{m a x, i - 1} - {R H}_{m i n, i - 1}$
${∆ R H}_{m a x}$	RHmaxDelta	${R H}_{m a x, i - 1} - {R H}_{m a x, i - 2}$
${∆ S M}_{a v e}$	Avg_SM	${S M}_{a v e, i - 1} - {S M}_{a v e, i - 2}$
${S M}_{s h}$	SM_sh	${S M}_{m a x, i - 1} - {S M}_{m i n, i - 1}$
${∆ T S}_{m a x}$	SMmaxDelta	${T S}_{m a x, i - 1} - {T S}_{m a x, i - 2}$
${T 2}_{s h}$	sh	${T 2}_{m a x, i - 1} - {T 2}_{m i n, i - 1}$
${∆ T 2}_{a v e}$	td	${T 2}_{a v e, i - 1} - {T 2}_{a v e, i - 2}$
${∆ T 2}_{m a x}$	T2mmaxDelta	${T 2}_{m a x, i - 1} - {T 2}_{m a x, i - 2}$
${T S}_{s h}$	Tsksh	${T S}_{m a x, i - 1} - {T S}_{m i n, i - 1}$
${∆ T S}_{a v e}$	Tsktd	${T S}_{a v e, i - 1} - {T S}_{a v e, i - 2}$
${∆ T S}_{m a x}$	TskmaxDelta	${T S}_{m a x, i - 1} - {T S}_{m a x, i - 2}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hiremath, H.; Kannan, S.R. Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region. Atmosphere 2024, 15, 1284. https://doi.org/10.3390/atmos15111284

AMA Style

Hiremath H, Kannan SR. Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region. Atmosphere. 2024; 15(11):1284. https://doi.org/10.3390/atmos15111284

Chicago/Turabian Style

Hiremath, Hrishita, and Srinivasa Ramanujam Kannan. 2024. "Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region" Atmosphere 15, no. 11: 1284. https://doi.org/10.3390/atmos15111284

APA Style

Hiremath, H., & Kannan, S. R. (2024). Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region. Atmosphere, 15(11), 1284. https://doi.org/10.3390/atmos15111284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrated Anomaly Detection and Early Warning System for Forest Fires in the Odisha Region

Abstract

1. Introduction

2. Problem Description

2.1. Data

2.2. Fire Incident Data and Fire Flag Matrix

3. Method

3.1. Mutual Information

3.2. Synthetic Profiles

3.3. Random Forest Classifier

3.4. Evaluation Metrics

4. Results

4.1. Mutual Information Analysis to Find Significant Parameters

4.2. Random Forest

4.2.1. Effect of Class Imbalance on Random Forest Classifier Performance

4.2.2. Effect of Training Sample Size on Random Forest Classifier Performance

4.2.3. Testing Performance of Random Forest Algorithm Using Best Imbalance Ratio and Sample Size

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI