Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification

Chávez-Durán, Álvaro Agustín; Olvera-Vargas, Miguel; Figueroa-Rangel, Blanca; García, Mariano; Aguado, Inmaculada; Ruiz-Corral, José Ariel

doi:10.3390/f13121970

Open AccessArticle

Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification

by

Álvaro Agustín Chávez-Durán

^1,2,

Miguel Olvera-Vargas

^1,*

,

Blanca Figueroa-Rangel

¹

,

Mariano García

²

,

Inmaculada Aguado

²

and

José Ariel Ruiz-Corral

³

¹

Centro Universitario de la Costa Sur, Universidad de Guadalajara, Avenida Independencia Nacional 151, Autlán de Navarro, Jalisco 48900, Mexico

²

Universidad de Alcalá, Departamento de Geografía, Geografía y Medio Ambiente, Environmental Remote Sensing Research Group, Calle Colegios 2, 28801 Alcalá de Henares, Spain

³

Centro Universitario de Ciencias Biológicas y Agropecuarias, Universidad de Guadalajara, Calle Ramón Padilla Sánchez 2100, Zapopan, Jalisco 45110, Mexico

^*

Author to whom correspondence should be addressed.

Forests 2022, 13(12), 1970; https://doi.org/10.3390/f13121970

Submission received: 21 October 2022 / Revised: 8 November 2022 / Accepted: 12 November 2022 / Published: 22 November 2022

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Accurate description of forest fuels is necessary for developing appropriate fire management strategies aimed at reducing fire risk. Although field surveys provide accurate measurements of forest fuel load estimations, they are time consuming, expensive, and may fail to capture the inherent spatial heterogeneity of forest fuels. Previous efforts were carried out to solve this issue by estimating homogeneous response areas (HRAs), representing a promising alternative. However, previous methods suffer from a high degree of subjectivity and are difficult to validate. This paper presents a method, which allows eliminating subjectivity in estimating HRAs spatial distribution, using artificial intelligence machine learning techniques. The proposed method was developed in the natural protected area of “Sierra de Quila,” Jalisco, and was replicated in “Sierra de Álvarez,” San Luis Potosí and “Selva El Ocote,” Chiapas, Mexico, to prove its robustness. Input data encompassed a set of environmental variables including altitude, average annual precipitation, enhanced vegetation index, and forest canopy height. Four, three, and five HRAs with overall accuracy of 97.78%, 98.06%, and 98.92% were identified at “Sierra de Quila,” “Sierra de Álvarez,” and “Selva El Ocote,” respectively. Altitude and average annual precipitation were identified as the most explanatory variables in all locations, achieving a mean decrease in impurity values greater than 52.51% for altitude and up to 36.02% for average annual precipitation. HRAs showed statistically significant differences in all study sites according to the Kruskal–Wallis test (p-value < 0.05). Differences among groups were also significant based on the Wilcoxon–Mann–Whitney (p-value < 0.05) for all variables but EVI in “Selva El Ocote.” These results show the potential of our approach to objectively identify distinct homogeneous areas in terms of their fuel properties. This allows the adequate management of fire and forest fuels in decision-making processes.

Keywords:

fire management; forest fuels; homogeneous response areas; machine learning

Graphical Abstract

1. Introduction

Forest fires are an ecological factor of great importance, not only as agents of destruction, but also as ecosystem shapers. For a forest fire to occur, three basic factors are required: forest fuels, favorable environmental conditions, and a starting factor [1]. Forest fuels are the core component of fire management, as they are the key elements of the fire environment triangle that can be managed in fire risk reduction strategies [2].

A forest fuel complex consists of live and dead material, interacting between them and spatially distributed in three dimensions [3]. In countries such as Mexico, they are usually divided according to the United States Department of Agriculture, Forest Service classification as ground (duff), surface (biomass within 2 m above the ground surface, litter, herb, and woody) and canopy (biomass above the surface fuel layer, shrub, and tree) fuels [4]. Field data are of great importance in the generation of statistical models for the mapping of fuels; however, it requires to be collected in situ [5]. In situ data collection requires the development of field strategies to represent a broad variation in study areas. At a global or continental level, forest fuel estimation is carried out by selecting areas by ecoregion type, dominant land cover, and potential vegetation [4,5]. However, at a country or regional level, this activity is more complex due to the inherent fuel diversity [4].

In recent decades, many forest fuel characterization efforts focused directly on variables that feed fuel classification and fire behavior modeling systems [6,7,8,9], however, a shift in the approach is necessary to improve forest fuel estimations. Ecosystems’ biophysical characteristics are used to describe the ecological phenomena governing fuel dynamics, where most environmental variables are scale-independent and can be useful to predict fuel characteristics across many spatial scales. Complex biophysical processes occur in ecosystems, determining dynamics of production, deposition, and decomposition fuel processes [4]. Therefore, it is necessary to develop methods to allow use of variables related to these processes for fuel load estimations.

Some techniques use sampling designs to discern the intra-class variability in order to estimate the appropriate number of samples [10]. Considering that fuel characteristics are associated with environmental variables, it is advisable to determine strata with a relatively homogeneous distribution based on stratified sampling design. For this purpose, the following two aspects must be considered: (a) the classification criteria and (b) the methodologies to implement them spatially [11].

Studies in different disciplines implemented a strategy called homogeneous response areas (HRAs) where site characterization surveyed in the field can be extrapolated to other areas with similar conditions. For instance, Pinzari et al. (2018) [12] use it in socioeconomic discipline to the analysis of health care variation using population census data. Ullah et al. (2019) [13] use it in the construction of homogeneous climatic regions with meteorological and climatic data. In Mexico, HRAs are used for the assessment of technological change in corn cultivation using climatological, topographic, and edaphic data [14]. HRAs are implemented to recognize forest management zonation using vegetation, topography, and soil types data [15].

However, the use of HRAs for forest fuel management is poorly documented. One of the few works was carried out to support forest fuel sampling design and to evaluate the response of different ecosystems to the impact of forest fires in Mexico, using vegetation types and altitudinal ranges data [6]. Furthermore, Velasco et al. (2018) implemented this practice for the sampling of forest fuels in Chiapas State, Mexico, also based on vegetation types and altitudinal ranges.

In all cases, HRAs’ identification was based upon map algebra operations. However, the main limitation of this methodology is the subjectivity in the classification criteria, since the number of HRAs is not substantiated with quantitative data [15]. The layers are classified using a priori ranges [6], where vegetation characteristics response and their homogeneity level are unknown. Moreover, class borders are rigid and the extension of transition zones among classes is also unknown, despite being of great importance to design appropriate sampling strategies. Despite the shortfalls described, HRA identification is important to provide representative descriptions of forest fuels in the study areas. Moreover, current powerful computers enable the application of machine learning techniques to provide objective identification of HRAs, substantiated with quantitative data.

Recent developments in machine learning spawned many algorithms designed to solve a plethora of classification problems [16]. Algorithms, such as K-means, K-nearest neighbor (KNN), support vector machine (SVM), and random forest are widely implemented to solve classification problems [17,18,19,20,21]. However, their use in HRA estimation for forest fuels management is practically null.

The goal of this paper was to develop a methodology eliminating subjectivity in the process of mapping HRAs’ spatial distribution, based on geospatial data using a hybrid method that combines unsupervised and supervised classification machine learning techniques. The working hypothesis proposed that once HRAs are identified, it will be possible to structure sampling designs to characterize forest fuels, enabling its extrapolation within the HRAs, which represent similar conditions, to provide a representative description of the forest fuels for the study sites. The results will allow researchers, academics, and forestry managers to streamline decision-making processes for forest fuels and fire management.

2. Materials and Methods

2.1. Study Area

The area selected for the development of the proposed methodology was the natural protected area of “Sierra de Quila,” Jalisco, which extends over 15,192.50 ha [22], located in west-central México; bounding coordinates are 20°14.65′ N to 20°21.67′ N and −103°56.79′ W to −104°7.98′ W (Figure 1), with an altitudinal range from 1357 to 2544 m [23]. (A) “Sierra de Quila” hosts a large number of flora and fauna species, maintaining important ecological processes and biological connectivity [24]. The climate is mostly temperate sub-humid and semi-warm sub-humid [25], with cambisol, regosol, and feozem soils [26]. Vegetation is mainly represented by mixed temperate forest with different species of pines and oaks, such as Pinus douglasiana, Pinus devoniana, Pinus lumholtzii, Quercus magnoliifolia, Quercus rugosa, and Quercus castanea [27]. Frequent surface fires of low severity characterize the potential fire regime of the area [28]. Nevertheless, due to fire suppression and fighting policies implemented for many decades, the potential fire regime was altered. There were devastating crown fires with strong implications for the ecosystem and lost forest fighters [29]. In order to restore fire regimes, fire management actions are being implemented, reinforcing the need of studies regarding fuel loads [30]. Studies were carried out to estimate loads of dead fuels, which range from 4.32 to 130.53 Mg ha⁻¹ [6,31]. The area is also under constant human activities, such as timber harvesting and firewood production [32].

To verify its robustness, the method was implemented and evaluated in two additional locations (Figure 1): (B) “Sierra de Álvarez,” located in San Luis Potosí State and (C) “Selva El Ocote” Biosphere Reserve located in Chiapas State. “Sierra de Álvarez” encloses 16,900 ha [33], has an altitudinal range from 1877 to 2716 m [23], and the climate is temperate semi-arid [25] with lithosol, rendzin, and feozem soils [26]. Vegetation mainly comprises temperate oak forest such as Quercus laeta, Quercus mexicana and Quercus diversifolia [27]. The potential fire regime is frequent surface fires of moderate severity [28]. “Selva El Ocote” encompasses 101,288.15 ha [34], has an altitudinal range from 167 to 1544 m [23], and its climate is mainly warm humid and warm sub-humid [25], with mostly luvisol and lithosol soils [26]. Vegetation is mainly constituted by tropical forest with species such as Switenia macrophylla, Cedrela odorata, and Brosimum alicastrum [27]. The potential fire regime is moisture limited infrequent fires [28]. The contrasting characteristics of climate, vegetation type, and potential fire regime of the locations (“Sierra de Álvarez,” and “Selva El Ocote”), with respect to our study area (“Sierra de Quila”), allow for testing of the proposed methodology, confirming its replicability in different environments.

The proposed methodology consists of the development of a hybrid method that combines the benefits of unsupervised cluster analysis with supervised classification techniques. Data input includes climate, altitude, and canopy characteristics. A sample is drawn using 1% of the input data to estimate the optimal number of classes using an unsupervised clustering approach, and then the spatial classification is performed over the whole dataset based on a random forest framework. Figure 2 shows the flowchart of the methodology.

2.2. Data Description

The following geospatial variables were used to define the HRAs:

I.: Climate data: It is considered as an important element determining vegetation characteristics, and it is fundamental for forest fuel production [35]. Climate information regarding average annual precipitation and average annual temperature was processed using the Agroclimatic Information System for Mexico-Central America (SIAMEXCA); this consists of a historical series of climatic databases from 1961 to 2010 with spatial resolution of 185 m [36].
II.: Altitudinal gradient: It is an important element that influences diversity and species composition of ecosystems [37]. Altitude information was derived from the 30 m spatial resolution digital elevation model (DEM) provided by the Shuttle Radar Topography Mission (SRTM) using single-pass C-band interferometric synthetic aperture radar (InSAR) techniques [23].
III.: Canopy characteristics: Forest canopy height and forest canopy cover are highly important to estimate forest fuel loads because they significantly describe the structure of the fuel complex [38,39], and potential crown fire propagation [7,40]. The highest values of tree height and canopy cover, probably determined the scarce vegetation in the understory, mainly as a result of solar radiation transmitted through the canopy [41,42].

Forest canopy height was estimated from the Global Land Analysis and Discovery (GLAD) through the 30 m spatial resolution of global forest canopy height map (GFCH), which is a product of the integration of the Global Ecosystem Dynamics Investigation lidar forest structure measurements [43] and Landsat data analysis [44]. The enhanced vegetation index (EVI) was used as a proxy of forest canopy cover. Landsat 8 Level 2, Collection 2, Tier 1 surface reflectance collection data with a spatial resolution of 30 m was processed and downloaded [44], using Google Earth Engine (GEE) [45]. Dry season imagery was used, as it provided a greater contrast between the largest trees and the understory [46]. The selected scenes and their dates were: path 29, row 46, dated 29 April 2021 for “Sierra de Quila;” path 28, row 45, dated 22 April 2021 for “Sierra de Álvarez” and path 22, row 48, dated 12 April 2021 for “Selva El Ocote”. Furthermore, a topographic correction was carried out using the sun-canopy-sensor correction with the c parameter (SCS + C) [47], according to Equation (1).

L_{n, b} = L_{b} \frac{\cos \propto \cos \emptyset + C_{b}}{\cos i + C_{b}}

(1)

where

L_{b}

is the reflectance of each Landsat 8 band;

\propto

is the slope terrain;

\emptyset

is the solar zenith angle;

i

is the illumination angle, and

C_{b}

is the quotient between the slope and the intercept of the linear regression equation between

L_{b}

and

\cos i

.

After applying the topographic correction, the enhanced vegetation index (EVI) was computed, according to Equation (2) [48].

E V I = G (\frac{ρ N I R - ρ R}{ρ N I R + C_{1} \times ρ R - C_{2} \times ρ B + L})

(2)

where

ρ B

,

ρ R

, and

ρ N I R

= blue, red, and near infrared reflectance, respectively;

G

is a gain factor;

C_{1}

and

C_{2}

are the coefficients of the aerosol resistance, and

L

is a soil-adjustment factor. The parameter values are 2.5, 6, 7.5, and 1, respectively.

The acquired variables were stacked into a set of raster layers, using the 30 m spatial resolution of the Landsat data. Climate layers were resampled applying the nearest neighbor method [49]. Finally, environmental variable values were extracted from a sample representing 1% of the study site; the sample was randomly collected within the forested areas from each location using land use and vegetation data from INEGI (2021).

2.3. Identification of HRAs in Each Study Area

Before carrying out the unsupervised classification of the HRAs, data normality was tested using the Anderson–Darling test [50]. In addition to checking for collinearity among the variables, the Spearman correlation coefficient was performed. The criterion to eliminate the correlated variables was spatial resolution, simplicity in the field to implement forest fuel sampling designs, and the ecological context surrounding forest fires, such as atmospheric pressure, wind speed, relative humidity, and dew–rainfall probability [51].

Subsequently, an unsupervised classification using the K-means clustering method was carried out. This method is one of the most popular machine learning algorithms used [52]. The algorithm seeks to group data sets to discover potential underlying patterns. The clustering is carried out iteratively, minimizing the sum of distances between each observation and the centroid of its cluster. The algorithm requires the previous designation of the number of groups to be processed [53].

In order to identify the optimal number of classes, K-means clustering was run using data normalization and Euclidean distance as similarity measure [54,55]. These statistical methods allow classification improvement. The optimal number of clusters was verified using percentage of variance explained (PVE) and average silhouette value (ASV). This method computes silhouette coefficients (ranging from −1 to 1) of each observation to estimate similarity among clusters [56,57]. Ideally, the goal is to achieve the highest PVE before the ASV starts to drop off.

According to the Levene test [58], there was no homogeneity of variances among HRAs for all variables in the three locations (p-value < 0.05) and their distributions were not normal. To verify the existence of significant differences among HRAs, the non-parametric test Kruskal–Wallis was used. This test is based on ranks and it is a valid test to verify differences for data with not-normal distribution or homogeneity of variances between groups [59]. Finally, to make comparisons by pairs of groups, the Wilcoxon–Mann–Whitney test was used. This one is a non-parametric test, valid to verify differences in pairs of groups for data with not-normal distribution [60].

Subsequently, descriptive statistics were computed for each group to characterize them. Due to the fact that the data did not present normal distribution, the median was used as a central tendency measure. Once the samples were classified, they were labeled according to their corresponding K-means cluster, where each cluster corresponds to a distinct HRA. Labeled samples were used to develop a model to estimate the spatial distribution of HRAs using a supervised classification machine learning technique. For training data, 70% of the samples were used and the remaining 30% were used as validation data.

Statistical analyses and K-means classifications, were carried out using the following R libraries “cluster, factoextra, mclust, clustertend, readr, caret and nortest” [61,62,63,64,65,66,67] available in R-project [68].

2.4. Mapping the Spatial Distribution of HRAs in the Study Areas

The previous analysis allowed us to determine the optimal number of clusters, that is, the number of HRAs to be identified in each study area. In order to map spatial distribution of HRAs, we carried out a supervised classification. Specifically, a random forest (RF) machine learning supervised classification [69] was carried out, using a Python programming language [70]. Several studies focused on the comparison of different machine learning techniques, concluding that there is no better method than another, and the decision to use a particular method depends on the data and the study goals [19,21]. The RF method is one of the most efficient machine learning supervised classification algorithms [71,72]. It consists of an ensemble of hundreds of decision trees to assign a class to each pixel. Thus, each decision tree gets a class to each observation and the class with the highest frequency is the prediction of the algorithm [73].

A RF classification model was built using the “Sierra de Quila” training data. Five hundred decision trees, bootstrap samples, and Gini as the function to measure the quality of a split, were used in its training [74]. The minimum number of samples required to split an internal node was set to 2 and the minimum number of samples in a leaf node was set to 1, without weights associated with classes [75]. One important characteristic of the RF algorithm is its ability to provide information on the importance of each variable used to classify the data. The importance of each predictor variable was evaluated through the calculation of mean decrease in impurity index [76].

The algorithm was calibrated and applied using the following libraries: numpy [77], pandas [78], matplotlib [79], scikit [80], and GDAL OSGeo library [81] for spatial data management. The classification probability was plotted in order to verify the spatial distribution of the percentage of success from the classes [74]. Classification accuracy was verified by a confusion matrix, using the 30% validation data from K-means clustering [82].

3. Results

According to Anderson–Darling normality tests, all variables in all locations did not follow a normal distribution (p-value < 0.05). The Spearman correlation coefficient showed a very high correlation only between altitude and temperature (rho= −0.98; p-value < 0.05) in the three locations. Therefore, according to the criterion to eliminate correlated variables, it was decided to keep altitude and to discard temperature.

3.1. Identification of HRAs in Each Study Area

Different HRAs were obtained for each location, according to the optimal number of clusters based on ASV and PVE. We identified four HRAs for “Sierra de Quila” (ASV = 0.56 and PVE = 91%), three HRAs for “Sierra de Álvarez” (ASV = 0.58 and PVE = 88%), and five HRAs for “Selva El Ocote” (ASV = 0.54 and PVE = 94%). Evident differences among HRAs in all locations are observed according to input variables (Figure 3).

K-means HRAs “Sierra de Quila.” A clear gradient in altitude, precipitation, canopy height, and EVI was evident among HRAs. Median values for altitude ranges from 1598 to 2205 m asl with slight interquartile variation among samples inside every HRA; precipitation median values from 861 to 896 mm; canopy height vegetation from 10 to 20 m. Canopy cover vegetation, represented by EVI, displayed several outliers in all HRAs; median values for EVI varies from 0.23 to 0.44 with HRA1 and HRA2 presenting higher variation than HRA3 and HRA4 (Figure 3A).

K-means HRAs “Sierra de Álvarez.” In this location, only altitude and precipitation presented a similar gradient than “Sierra de Quila”; however, median values for canopy height vegetation and EVI were mostly similar among the three HRAs with many outliers in EVI. Median values for altitude ranged from 2073 to 2381 m asl and precipitation from 496 to 544 m (Figure 3B).

K-means HRAs “Selva El Ocote.” Only altitude showed a gradient in median values among HRAs ranging from 411 to 1208 m asl. The rest of the variables, particularly EVI, were almost similar among HRAs (Figure 3C).

Results from the Kruskal–Wallis test yielded p-value < 0.05 for all variables in all locations, indicating the existence of significant differences among HRAs. The Wilcoxon–Mann–Whitney test also reported p-value < 0.05 for almost all variables, indicating the existence of differences between pairs of variables, except for EVI in “Selva El Ocote,” where HRA1, HRA2, and HRA3 did not reach significant differences with p-value > 0.30. Results test help to be objective substantiated with quantitative data, obtaining the statistical significance differences among HRAs, eliminating the use of a priori ranges and class number to estimations [6].

3.2. Spatial Distribution of HRAs

Implementation of the trained RF classification model over the set of raster layers for each location allowed accurate estimations of HRA spatial distribution (Figure 4). The overall accuracy by the RF supervised classification achieved 97.78%, 98.06, and 98.42% for “Sierra de Quila,” “Sierra de Álvarez,” and “Selva El Ocote,” respectively (Table 1, Table 2 and Table 3). Average producer accuracy by HRA was 97.99% for all locations, achieving 100% in HRA1 of “Sierra de Álvarez.” Therefore, the average omission errors were less than 2.01% in all locations. On the other hand, the average user accuracy by HRA was 98.07% for all locations, achieving 100% in HRA1 of “Sierra de Quila,” HRA3 of “Sierra de Álvarez,” and HRA1 of “Selva El Ocote.” In this way, the average commission errors were less than 1.93% in all locations (Table 1, Table 2 and Table 3).

In “Sierra de Quila,” the most predominant HRA was HRA2 with 40.88% of the pixels assigned to this HRA, followed by HRA1 (26.66%), HRA3 (20.28%) and HRA4 (12.18%). The highest average annual precipitation as well as highest height and canopy cover trees are located in the HRA with the highest altitude (HRA1). These woodland features can influence scarce presence of vegetation in the understory. Height and canopy cover of trees decrease in the rest of the HRAs which can increase vegetation in the understory [41,42], favoring the presence of ladder fuels. On the other hand, in “Sierra de Álvarez” and “Selva El Ocote,” average annual precipitation increases as altitude decreases. Height and canopy cover trees decrease and increase as precipitation changes, but not in a well-defined pattern.

Results from the analysis of the importance reveal that the most outstanding variable for the three study sites was altitude, with 74.09%, 52.51% and 80.34% for “Sierra de Quila,” “Sierra de Álvarez” and “Selva El Ocote,” respectively. Average annual precipitation was the second most important variable with 17.19%, 36.02% and 14.98% for “Sierra de Quila,” “Sierra de Álvarez” and “Selva El Ocote,” respectively. EVI and forest canopy height complemented the lower order of importance variables for all locations (Table 4).

The spatial distribution of probabilities for the classification analysis complemented the HRAs (Figure 5). According to the validation dataset, the probability values presented a median of 0.64, 0.99 and 0.99 for “Sierra de Quila,” “Sierra de Álvarez,” and “Selva El Ocote,” respectively. In “Sierra de Quila,” 19.05% of data present a probability value higher than 0.90; 62.12% between 0.6 and 0.9 and, 18.83% less than 0.6. In “Sierra de Álvarez,” 75.62% higher than 0.90; 21.21% between 0.6 and, 0.9 and 3.17% less than 0.6. In “Selva El Ocote” 86.35% higher than 0.90; 11.90% between 0.6 and 0.9, and 1.75% less than 0.6. In all cases, the lowest probabilities were found at the boundaries of HRAs, due to the smooth transitions of the input data among the classes.

4. Discussion

Spatial distribution of HRAs for the forest areas in “Sierra de Quila,” “Sierra de Álvarez,” and “Selva El Ocote,” was obtained with high accuracy levels. The number of HRAs is defined from the data according to the site characteristics of each location, based on the identified optimal number of K-means clusters rather than user defined. If more clusters are identified, it means that the study area is more heterogeneous [13]. The spatial distribution of the HRAs was estimated through the RF method, using training data obtained by K-means. This allows the classification ranges not to be rigid or established a priori; on the contrary, they are flexible according to the site characteristics of each location; moreover, the probability of the class complements the classes themselves. This allows for identifying the transition zones that nature presents, supplementing the abrupt changes of the classifications. The methodology proposed here contrasts with those used by Flores-Garnica et al. (2008) and Velasco-Herrera et al. (2018), who used map algebra for spatial distribution of HRAs, where the ranges and number of classes are rigid and a priori established. Moreover, those works present abrupt changes among HRAs without the possibility of identifying the natural among classes.

4.1. Identification of HRAs in Each Study Area

K-means clustering is an unsupervised classification method widely used in data analysis; however, the algorithm requires the previous establishment of the number of clusters [53]. In this research, K-means classification was carried out using cluster multivariate analysis techniques [83] with a detailed process; therefore, each location achieved different cluster numbers according to their different characteristics. A combination of average silhouette value and percentage of variance explained made it possible to identify the optimal number of clusters for each location and achieve the maximum possible precision in each classification, eliminating subjectivity in selecting the number of classes to use [55].

On the other hand, the Kruskal–Wallis test showed that the clustering approach distinguished significantly different groups, with different environmental and vegetation characteristics. This eliminates the subjectivity in the selection of the classification ranges of variables. The most significant differences were found on the environmental variables precipitation and altitude, both with important influence on vegetation and fuels characteristics [35,84,85].

According to climatic and forest canopy characteristics of HRAs, the relationship between altitude and vegetation structure differed among the three study areas. For “Sierra de Quila,” zones of higher altitude corresponded with higher precipitation, canopy height, and canopy cover vegetation. This pattern corresponds to the typical effect of altitude on temperature and rainfall due to adiabatic effects [86,87]. However, in “Sierra de Álvarez,” an opposite relationship between altitude and vegetation conditions was found, which can be explained considering that “Sierra de Álvarez” is a dry tropical mountain with low cloudiness and low rainfall, but high solar radiation, which induces higher leaf-level transpiration rates in highlands than in lowlands [88]. In “Selva El Ocote,” no trend of climate and vegetation conditions was found as a function of altitude variation. The different behavior found, despite using the same environmental variables as input, reinforces the applicability of our method to different areas with different site characteristics, not being limited by a subjective number of classes and thresholds to delimit the HRAs.

4.2. Spatial Distribution of HRAs in the Study Areas

The development and implementation of RF classification models, enable the accurate mapping of HRAs spatial distribution across the study areas. Models trained with site characteristics at each location, attained overall accuracy greater than 97% in all cases, assuring the robustness and replicability of the hybrid method. The achieved precision was satisfactory for the data and study goals.

According to the mean decrease in impurity index, altitude was the most important variable, yielding the highest levels in all locations. This coincides with results reported by Rzedowski (2006) [89], who describes altitude as one of the main factors related to climate variation in Mexico. In the same way Figueroa-Rangel and Olvera-Vargas (2022) [90] found that altitude explained community plant species composition at both regional and local scales. Moreover, average annual precipitation reached the second highest levels in all locations. This may be explained because it is the most important factor that originates soil water balance [91], which, together with the permanent wilting point of the plants, entails differences in vegetation structure characteristics [92].

Having accurate information on the spatial distribution of HRAs enables the efficient identification of areas with similar characteristics, making it possible to properly structure forest fuel field sampling designs. Moreover, the spatially explicit information on the classification probability, complements the identification and discrimination of those areas that do not comply with established levels. As it could be expected, the lowest probability values were found at the boundaries of the HRAs, due to the gradual, rather than abrupt changes, in fuel characteristics observed in the field. This agrees with data found by Foody (2002) [93], who attributes the low probabilities values to the fact that the environmental data used are datasets difficult to be assigned to a single class, since they are continuous variables with no delimited boundaries. These ecological transition zones could be sub classified as ecotones, allowing the possibility of carrying out altitudinal distribution studies based on time series of environmental changes [94]. Establishment of sampling sites on HRAs with a high value of classification probability will be of great help to ensure the homogeneity inside the HRAs. In “Sierra de Quila” a study was carried out where dead fuel loads range from 13.92 to 130.53 Mg ha⁻¹, and 72.11 m distance among sampling sites have 116.61 Mg ha⁻¹ difference [31]. Nevertheless, the aforementioned sampling sites are located in an area with low values of HRA’s classification probability, which could explain the heterogeneity of the fuel loads obtained. On the other hand, significant differences among HRAs allow the establishment of sampling sites on HRAs with different characteristics of forest fuels.

The developed methodology provides an important approach with an ecological context to understand, to explore, and to design sampling strategies for fuel load estimations according to their natural dynamics. The relationship between biophysical processes and organic matter accumulation can be used to estimate forest fuel characteristics. Variables used belong to complex biophysical processes, so it is difficult to measure their relationship with fuels individually, since they involve interactions among water–soil–plant–atmosphere, such as water balance or nutrient flux mainly in production, deposition, and decomposition of forest fuels [4]. Sampling designs with an ecological context become relevant in the face of a new era in forest fuel loads estimations.

5. Conclusions

The proposed methodology uses a machine learning hybrid method that combines the strengths of unsupervised cluster analysis with supervised classification techniques. It made it possible to obtain accurate classifications of the HRAs for the different study areas, eliminating subjectivity in the classification criteria. Spatial distribution of HRAs for the forest areas in “Sierra de Quila,” “Sierra de Álvarez,” and “Selva El Ocote,” was obtained with overall accuracy levels greater than 97%. It was possible to estimate the optimal number of HRAs in each location, the HRAs spatial distributions, their probability values of classifications, and the importance of the variables used in the model. The proposed methodology enables the establishment of field sampling sites for forest fuel characterization that could be extrapolated within each HRA, as they have similar conditions. This will allow researchers, academics, and forest managers to streamline objective and robust decision-making processes, for forest fuels and fire management.

Author Contributions

Conceptualization, M.G., I.A., M.O.-V., B.F.-R., J.A.R.-C. and Á.A.C.-D.; data curation, J.A.R.-C. and Á.A.C.-D.; formal analysis, M.G., I.A., M.O.-V., B.F.-R., J.A.R.-C. and Á.A.C.-D.; funding acquisition, M.O.-V.; methodology M.G., I.A., M.O.-V., B.F.-R. and Á.A.C.-D.; software, Á.A.C.-D.; validation, M.G., I.A., M.O.-V., B.F.-R., J.A.R.-C. and Á.A.C.-D.; visualization, M.G., I.A. and Á.A.C.-D.; writing—original draft, Á.A.C.-D.; writing—review and editing, M.G., I.A., M.O.-V., B.F.-R., J.A.R.-C. and Á.A.C.-D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Committee of Science and Technology (CONACyT) of Mexico. Unique Curriculum Vitae Scholarship (CVU): 167647. Grant for Excellence in Teaching Staff of the Community of Madrid (EPU-DPTO/2020/008).

Acknowledgments

We would like to thank the anonymous reviewers for their constructive suggestions, which helped to improve our paper. To Department support: Grant for Excellence in Teaching Staff of the Community of Madrid (EPU-DPTO/2020/008). To the Natural Protected Area of “Sierra de Quila,” Jalisco, Mexico for their active participation and operational support in carrying out this work. To the National Institute of Forestry, Agricultural and Livestock Research of Mexico (INIFAP) for the facilities granted in the research tasks. To the National Committee of Science and Technology (CONACyT) of Mexico for their support in carrying out this work. To the Oscar Reyes Cárdenas for his suggestions and comments which helped to improve our paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pyne, S.J.; Andrews, P.L.; Laven, R.D. Introduction to Wildland Fire, 2nd ed.; Wiley: New York, NY, USA, 1996. [Google Scholar]
Sullivan, A.L. Wildland Surface Fire Spread Modelling, 1990–2007. 3: Simulation and Mathematical Analogue Models. Int. J. Wildland Fire 2009, 18, 387–403. [Google Scholar] [CrossRef]
Rothermel, R.C. A Mathematical Model for Predicting Fire Spread in Wildland Fuels; Intermountain Forest & Range Experiment Station, Forest Service, US Department of Agriculture: Ogden, UT, USA, 1972.
Keane, R.E. Wildland Fuel Fundamentals and Applications; Springer International: New York, NY, USA, 2015. [Google Scholar]
McKenzie, D.; Raymond, C.L.; Kellogg, L.K.B.; Norheim, R.A.; Andreu, A.G.; Bayard, A.C.; Kopper, K.E.; Elman, E. Mapping Fuels at Multiple Scales: Landscape Application of the Fuel Characteristic Classification SystemThis Article Is One of a Selection of Papers Published in the Special Forum on the Fuel Characteristic Classification System. Can. J. For. Res. 2007, 37, 2421–2437. [Google Scholar] [CrossRef]
Flores-Garnica, J.G.; Chávez-Durán, A.A.; Rubio-Camacho, E.A.; Villela Gaytán, S.A.; Xelhuantzi-Carmona, J.; Frías-Gómez, J.G. Evaluación de La Respuesta de Diferentes Ecosistemas Forestales a Los Incendios Forestales. In Informe Técnico y Financiero Segunda Etapa; Clave CONACyT: 71400; Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias: Guadalajara, Mexico, 2008. [Google Scholar]
Prichard, S.J.; Andreu, A.G.; Ottmar, R.D.; Eberhardt, E. Fuel Characteristic Classification System (FCCS) Field Sampling and Fuelbed Development Guide; Forest Service, US Department of Agriculture: Portland, OR, USA, 2019. [Google Scholar] [CrossRef]
Chávez-Durán, Á.A.; Flores-Garnica, J.G.; Luna-Luna, M.; Centeno-Erguera, L.R.; Alarcón-Bustamante, M.P. Caracteriza-ción y Clasificación de Camas de Combustibles Prioritarias En México Para Planificar El Manejo Del Fuego. Informe Técnico Fondo Sectorial CONACyT-CONAFOR. Referencia: CONAFOR-2012-C01-175523; Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias: Tepatitlán de Morelos, México, 2014.
Morfin-Rios, J.E.; Alvarado-Celestino, E.; Jardel-Pelaez, E.J.; Vihnanek, R.E.; Wright, D.K.; Michel-Fuentes, J.M.; Wright, C.S.; Ottmar, R.D.; Sandberg, D.V.; Najera-Diaz, A. Photo Series for Quantifying Forest Fuels in Mexico: Montane Subtropical Forests of the Sierra Madre Del Sur and Temperate Forests and Montane Shrubland of the Northern Sierra Madre Oriental; Pacific Wildland Fire Sciences Laboratory; University of Washington, College of Forest Resources: Seattle, WA, USA, 2008; Volume, Special Pub. No. 1. [Google Scholar]
Taherdoost, H. Sampling Methods in Research Methodology; How to Choose a Sampling Technique for Research. Int. J. Acad. Res. Manag. 2016, 5, 18–27. [Google Scholar] [CrossRef]
Velasco-Herrera, J.A.; Flores-Garnica, J.G.; Maárquez-Azuúa, B.; Loópez, S. Áreas de Respuesta Homogénea Para El Muestreo de Combustibles Forestales. Rev. Mex. Cienc. For. 2018, 4, 41–54. [Google Scholar] [CrossRef][Green Version]
Pinzari, L.; Mazumdar, S.; Girosi, F. A Framework for the Identification and Classification of Homogeneous Socioeconomic Areas in the Analysis of Health Care Variation. Int. J. Health Geogr. 2018, 17, 42. [Google Scholar] [CrossRef]
Ullah, H.; Akbar, M.; Khan, F. Construction of Homogeneous Climatic Regions by Combining Cluster Analysis and L-moment Approach on the Basis of Reconnaissance Drought Index for Pakistan. Int. J. Climatol. 2019, 40, 324–341. [Google Scholar] [CrossRef]
Palacios-Corona, V.; Vázquez-García, M.; González-Eguiarte, D.R.; Villarreal-Farías, E.; Byerly-Murphy, K.F. Technical Diagnosis for Technology Change in the Corn Crop. TERRA Latinoam. 2007, 25, 321–332. [Google Scholar]
Reyes-Cárdenas, O.; Flores-Garnica, J.G.; Treviño-Garza, E.J.; Aguirre-Calderón, O.A.; Cárdenas-Tristán, A. Zonificación Forestal Bajo El Concepto de Áreas de Respuesta Homogénea En El Centro de México. Investig. Geográficas. 2019, 98. [Google Scholar] [CrossRef]
Ghayour, L.; Neshat, A.; Paryani, S.; Shahabi, H.; Shirzadi, A.; Chen, W.; Al-Ansari, N.; Geertsema, M.; Pourmehdi Amiri, M.; Gholamnia, M.; et al. Performance Evaluation of Sentinel-2 and Landsat 8 OLI Data for Land Cover/Use Classification Using a Comparison between Machine Learning Algorithms. Remote Sens. 2021, 13, 1349. [Google Scholar] [CrossRef]
Kwan, C.; Ayhan, B.; Budavari, B.; Lu, Y.; Perez, D.; Li, J.; Bernabe, S.; Plaza, A. Deep Learning for Land Cover Classification Using Only a Few Bands. Remote Sens. 2020, 12, 2000. [Google Scholar] [CrossRef]
Zhu, L.; Spachos, P. Towards Image Classification with Machine Learning Methodologies for Smartphones. Mach. Learn. Knowl. Extr. 2019, 1, 59. [Google Scholar] [CrossRef]
Yuvalı, M.; Yaman, B.; Tosun, Ö. Classification Comparison of Machine Learning Algorithms Using Two Independent CAD Datasets. Mathematics 2022, 10, 311. [Google Scholar] [CrossRef]
Xie, G.; Niculescu, S. Mapping and Monitoring of Land Cover/Land Use (LCLU) Changes in the Crozon Peninsula (Brittany, France) from 2007 to 2018 by Machine Learning Algorithms (Support Vector Machine, Random Forest, and Convolutional Neural Network) and by Post-Classification Comparison (PCC). Remote Sens. 2021, 13, 3899. [Google Scholar] [CrossRef]
Zagajewski, B.; Kluczek, M.; Raczko, E.; Njegovec, A.; Dabija, A.; Kycko, M. Comparison of Random Forest, Support Vector Machines, and Neural Networks for Post-Disaster Forest Species Mapping of the Krkonoše/Karkonosze Transboundary Biosphere Reserve. Remote Sens. 2021, 13, 2581. [Google Scholar] [CrossRef]
Comisión Nacional de Áreas Naturales Protegidas (CONANP). Recategorización Del Área de Protección de Flora y Fauna “Sierra de Quila”; Diario Oficial: Mexico City, Mexico, 2000; pp. 1–5. [Google Scholar]
Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
Santiago-Pérez, A.L.; Ayón-Escobedo, A.; Rosas-Espinoza, V.C.; Rodríguez-Zaragoza, F.A.; Toledo-González, S.L. Estructura Del Bosque Templado de Galería En La Sierra de Quila, Jalisco. Rev. Mex. Cienc. For. 2014, 5, 92–109. [Google Scholar]
García, E. Climas, Clasificación de Köppen Modificado Por García; Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO): Mexico City, Mexico, 1998. [Google Scholar]
Instituto Nacional de Estadística y Geografía (INEGI). Conjunto de Datos Vectorial Edafológico. Serie II Continuo Nacional; INEGI Aguascalientes: Aguascalientes, Mexico, 2013.
Instituto Nacional de Estadística y Geografía (INEGI). Conjunto de Datos Vectoriales de Uso Del Suelo y Vegetación, Serie VII; INEGI: Aguascalientes, Aguascalientes, Mexico, 2021.
Jardel-Pelaez, E.J.; Pérez-Salicrup, D.; Alvarado-Celestino, E.; Morfin-Rios, J.E. Principios y Criterios Para El Manejo Del Fuego En Ecosistemas Forestales: Guía de Campo; Comisión Nacional Forestal: Guadalajara, Mexico, 2014. [Google Scholar]
Jiménez-Luquín, E. Sierra de Quila: ¿Cómo ha ido cambiando los últimos 25 años desde la tragedia? In Memorias. I Foro de conocimiento, uso y gestión del Área Natural Protegida Sierra de Quila; Villavicencio-García, R., Santiago-Pérez, A.L., Rosas-Espinoza, V.C., Hernández-López, L., Eds.; Universidad de Guadalajara. Centro Universitario de Ciencias Biológicas y Agropecuarias. Departamento de Producción Forestal: Guadalajara, Mexico, 2011; pp. 1–134. [Google Scholar]
Secretaría del Medio Ambiente y Desarrollo Territorial (SEMADET). Plan Estatal de Manejo Del Fuego En El Estado de Jalisco Primera Etapa; SEMADET: Guadalajara, Mexico, 2018.
Chávez-Durán, Á.A.; Bustos-Santana, A.; Chávez-Durán, H.M.; Flores-Garnica, J.G.; Rubio-Camacho, E.A.; Xelhuantzi-Carmona, J. Distribución espacial de cargas de combustibles en una parcela de muestreo de Pino–Encino. Rev. Mex. Cienc. For. 2021, 12, 1–22. [Google Scholar] [CrossRef]
Comisión Nacional Forestal (CONAFOR). Unidad de Manejo Forestal 1407, Sierra de Quila. Estudio Regional Forestal; CONAFOR: Guadalajara, Mexico, 2007.
Comisión Nacional de Áreas Naturales Protegidas (CONANP). Recategorización Del Área de Protección de Flora y Fauna “Sierra de Álvarez”; Diario Oficial: Mexico City, Mexico, 2000; pp. 1–5. [Google Scholar]
Comisión Nacional de Áreas Naturales Protegidas (CONANP). Recategorización de La Reserva de La Biosfera “Selva El Ocote”; Diario Oficial: Mexico City, Mexico, 2000; pp. 1–13. [Google Scholar]
Rodríguez-Trejo, D.A. Incendios de Vegetación. Su Ecología Manejo e Historia. Volumen 1; Biblioteca Básica De Agricultura (BBA): Colegio de Postgraduados, Mexico, 2014. [Google Scholar]
Ruiz-Corral, J.A.; Medina-García, G.; García-Romero, G.E. Sistema de Información Agroclimático Para México-Centroamérica (SIAMEXCA). Rev. Mex. Cienc. Agrícolas 2018, 9, 1–10. [Google Scholar] [CrossRef]
Stevens, G.C. The Elevational Gradient in Altitudinal Range: An Extension of Rapoport’s Latitudinal Rule to Altitude. Am. Nat. 1992, 140, 893–911. [Google Scholar] [CrossRef]
García-Cimarras, A.; Manzanera, J.A.; Valbuena, R. Analysis of Mediterranean Vegetation Fuel Type Changes Using Multitemporal LiDAR. Forests 2021, 12, 335. [Google Scholar] [CrossRef]
Bajocco, S.; Dragoz, E.; Gitas, I.; Smiraglia, D.; Salvati, L.; Ricotta, C. Mapping Forest Fuels through Vegetation Phenology: The Role of Coarse Resolution Satellite Time-Series. PLoS ONE 2015, 10, e0119811. [Google Scholar] [CrossRef] [PubMed]
Keane, R.E.; Reinhardt, E.D.; Scott, J.; Gray, K.; Reardon, J. Estimating Forest Canopy Bulk Density Using Six Indirect Methods. Can. J. For. Res. 2005, 35, 724–739. [Google Scholar] [CrossRef]
Mestre, L.; Toro-Manríquez, M.; Soler, R.; Huertas-Herrera, A.; Martínez-Pastur, G.; Lencinas, M.V. The Influence of Canopy Layer Composition on Understory Plant Diversity in Southern Temperate Forests. For. Ecosyst. 2017, 4, 1–13. [Google Scholar] [CrossRef]
Casals, P.; Valor, T.; Besalú, A.; Molina-Terrén, D. Understory Fuel Load and Structure Eight to Nine Years after Prescribed Burning in Mediterranean Pine Forests. For. Ecol. Manag. 2016, 362, 156–168. [Google Scholar] [CrossRef]
Global Ecosystem Dynamics Investigation (GEDI). Ecosystem Lidar. Available online: https://gedi.umd.edu/ (accessed on 27 June 2022).
United States Geological Survey (USGS). Landsat Missions. Available online: https://www.usgs.gov/core-science-systems/nli/landsat (accessed on 28 June 2022).
Google Earth Engine (GEE). A Planetary Scale Platform for Earth Science Data and Analysis. Available online: https://earthengine.google.com/ (accessed on 28 June 2022).
Tun-Dzul, F.; Vester, H.; García, R.; Schmook, B. Estructura Arbórea y Variabilidad Temporal Del NDVI En Los “Bajos Inundables” de La Península de Yucatán, México. Polibotánica 2008, 25, 69–90. [Google Scholar]
Soenen, S.A.; Peddle, D.R.; Coburn, C.A. SCS+C: A Modified Sun-Canopy-Sensor Topographic Correction in Forested Terrain. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2148–2159. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the Radiometric and Biophysical Performance of the MODIS Vegetation Indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Baboo, S.; Devi, R. An Analysis of Different Resampling Methods in Coimbatore, District. Glob. J. Comput. Sci. Technol. 2010, 10, 61–66. [Google Scholar]
Thode, H.C. Testing for Normality; Chemical Rubber Company Press: New York, NY, USA, 2002. [Google Scholar]
Xiong, Q.; Luo, X.; Liang, P.; Xiao, Y.; Xiao, Q.; Sun, H.; Pan, K.; Wang, L.; Li, L.; Pang, X. Fire from Policy, Human Interventions, or Biophysical Factors? Temporal–Spatial Patterns of Forest Fire in Southwestern China. For. Ecol. Manag. 2020, 474, 118381. [Google Scholar] [CrossRef]
Lv, Z.; Liu, T.; Shi, C.; Benediktsson, J.A.; Du, H. Novel Land Cover Change Detection Method Based on K-Means Clustering and Adaptive Majority Voting Using Bitemporal Remote Sensing Images. IEEE Access 2019, 7, 34425–34437. [Google Scholar] [CrossRef]
Meng, Y.; Liang, J.; Cao, F.; He, Y. A New Distance with Derivative Information for Functional K-Means Clustering Algorithm. Inf. Sci. 2018, 463–464, 166–185. [Google Scholar] [CrossRef]
Anderson, M.J.; Ellingsen, K.E.; McArdle, B.H. Multivariate Dispersion as a Measure of Beta Diversity. Ecol. Lett. 2006, 9, 683–693. [Google Scholar] [CrossRef] [PubMed]
Murtagh, F.; Legendre, P. Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? J. Classif. 2014, 31, 274–295. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Garcia-Lopes, H.E.; De-Sevilha-Gosling, M. Cluster Analysis in Practice: Dealing with Outliers in Managerial Research. Rev. Adm. Contemp. 2021, 25. [Google Scholar] [CrossRef]
Fox, J.; Weisberg, S. An R Companion to Applied Regression, 3rd ed.; Sage: Thousand Oaks, CA, USA, 2018. [Google Scholar]
Vargha, A.; Delaney, H.D. The Kruskal-Wallis Test and Stochastic Homogeneity. J. Educ. Behav. Stat. 1998, 23, 170–192. [Google Scholar] [CrossRef]
Bonamente, M. Statistics and Analysis of Scientific Data; Springer Science and Business Media: New York, NY, USA, 2017. [Google Scholar]
Maechler, M.; Rousseeuw, P.; Struyf, A.; Hubert, M.; Hornik, K.; Studer, M.; Roudier, P.; González, J.; Kozlowski, K.; Schubert, E.; et al. Package ‘cluster.’ Finding Groups in Data. Available online: https://cran.r-project.org/web/packages/cluster/cluster.pdf (accessed on 28 June 2022).
Kassambara, A.; Mundt, F. Package “factoextra”. Extract and Visualize the Results of Multivariate Data Analyses. Available online: https://cran.r-project.org/web/packages/factoextra/factoextra.pdf (accessed on 28 June 2022).
Fraley, C.; Raftery, A.E.; Scrucca, L.; Murphy, T.B.; Fop, M. Package “mclust”. Title Gaussian Mixture Modelling for Model Based Clustering, Classification, and Density Estimation. Available online: https://cran.r-project.org/web/packages/mclust/mclust.pdf (accessed on 27 June 2022).
Wright, K.; YiLan, L.; RuTong, Z. Package ‘clustertend.’ Check the Clustering Tendency. Available online: https://cran.r-project.org/web/packages/clustertend/clustertend.pdf (accessed on 28 June 2022).
Wickham, H.; Hester, J.; Francois, R.; Bryan, J.; Bearrows, S.; Jylänki, J.; Jørgensen, M. Package ‘readr.’ Read Rectangular Text Data. Available online: https://cran.r-project.org/web/packages/readr/readr.pdf (accessed on 28 June 2022).
Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenkel, B.; Benesty, M.; et al. Package “caret”. Classification and Regression Training. Available online: https://cran.r-project.org/web/packages/caret/caret.pdf (accessed on 28 June 2022).
Gross, J.; Ligges, U. Package ‘northest’. Tests for Normality. Available online: https://cran.r-project.org/web/packages/nortest/nortest.pdf (accessed on 28 June 2022).
R Core Team. R: A Language and Environment for Statistical Computing; Version 4.1.2; R Foundation for Statistical Computing: Vienna, Austria; Available online: https://www.R-project.org/ (accessed on 28 June 2022).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Python. Python Software Foundation. Available online: https://www.python.org/ (accessed on 28 June 2022).
Biau, G. Analysis of a Random Forests Model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar]
Pal, M. Random Forest Classifier for Remote Sensing Classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
Prasad, A.M.; Iverson, L.R.; Liaw, A. Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems 2006, 9, 181–199. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn Machine Learning in Python. Random Forest Classifier. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (accessed on 28 June 2022).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn Machine Learning in Python. Forests of Randomized Trees. Available online: https://scikit-learn.org/stable/modules/ensemble.html#forest (accessed on 28 June 2022).
Boonprong, S.; Cao, C.; Chen, W.; Bao, S. Random Forest Variable Importance Spectral Indices Scheme for Burnt Forest Recovery Monitoring Multilevel RF-VIMP. Remote Sens. 2018, 10, 807. [Google Scholar] [CrossRef]
Numpy. The Fundamental Package for Scientific Computing with Python. Available online: https://numpy.org (accessed on 28 June 2022).
Pandas. Pandas: Powerful Python Data Analysis Toolkit. Available online: https://pandas.pydata.org/ (accessed on 28 June 2022).
Matplotlib. Matplotlib: Visualization with Python. Available online: https://matplotlib.org/ (accessed on 28 June 2022).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Geospatial Data Abstraction (GDAL). Translator Library for Raster and Vector Geospatial Data Formats. Available online: https://gdal.org/ (accessed on 27 June 2022).
Comber, A.; Fisher, P.; Brunsdon, C.; Khmag, A. Spatial Analysis of Remote Sensing Image Classification Accuracy. Remote Sens. Environ. 2012, 127, 237–246. [Google Scholar] [CrossRef]
Palacio, F.X.; Apodaca, M.J.; Crisci, J.V. Análisis Multivariado Para Datos Biológicos: Teoría y Su Aplicación Utilizando El Lenguaje R; Fundación de Historia Natural Félix de Azara: Buenos Aires, Argentina, 2020. [Google Scholar]
Ruiz-Corral, J.A.; Medina-García, G.; González-Acuña, I.J.; Flores-López, H.E.; Ramírez-Ojeda, G.; Ortiz-Trejo, C.; Byerly-Murphy, K.F.; Martínez-Parra, R.A. Requerimientos Agroecológicos de Cultivos, 2nd ed.; Instituto Nacional de Investigaciones Forestales Agrícolas y Pecuarias. INIFAP CIRPAC. Campo Experimental Centro Altos de Jalisco: Tepatitlán de Morelos, Mexico, 2013.
Ulukan, H. Agronomic Adaptation of Some Field Crops: A General Approach. J. Agron. Crop Sci. 2008, 194, 169–179. [Google Scholar] [CrossRef]
Ambaum, M.H.P. Thermal Physics of the Atmosphere. A Volume in Developments in Weather and Climate Science, 2nd ed.; Royal Meteorological Society Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar] [CrossRef]
Chen, B.X.; Sun, Y.F.; Zhang, H.B.; Han, Z.H.; Wang, J.S.; Li, Y.K.; Yang, X.L. Temperature Change along Elevation and Its Effect on the Alpine Timberline Tree Growth in the Southeast of the Tibetan Plateau. Adv. Clim. Chang. Res. 2018, 9, 185–191. [Google Scholar] [CrossRef]
Leuschner, C. Are High Elevations in Tropical Mountains Arid Environments for Plants? Ecology 2000, 81, 1425–1436. [Google Scholar] [CrossRef]
Rzedowski, J. Vegetación de México, 1st digital ed.; Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO): Pátzcuaro, Mexico, 2006. [Google Scholar]
Figueroa-Rangel, B.L.; Olvera-Vargas, M. Environmental and Spatial Processes Shaping Quercus dominated Forest Communities in the Neotropics. Ecosphere 2022, 13, e4103. [Google Scholar] [CrossRef]
Sabaruddin, L.; Arafah, N.; Syaf, H.; Leomo, S.; Corina-Rak, T.; la Fua, J. Analysis of Soil Water Balance to Determine Planting Time of Crops on Dryland, Indonesia. Pak. J. Biol. Sci. 2021, 24, 241–251. [Google Scholar] [CrossRef]
Romero, R. Relaciones Agua Planta En El Sistema Suelo-Planta-Atmósfera. In Manejo y Fertilidad de Suelos; Moron, A., Martino, D., Sawchik, J., Eds.; Instituto Nacional de Investigación Agropecuaria, Uruguay: Montevideo, Uruguay, 1996. [Google Scholar]
Foody, G.M. Status of Land Cover Classification Accuracy Assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Alfaro-Ramírez, F.U.; Arredondo-Moreno, J.T.; Pérez-Suárez, M.; Endara-Agramont, Á.R. Pinus Hartwegii Lindl. Treeline Ecotone: Structure and Altitudinal Limits at Nevado de Toluca, Mexico. Rev. Chapingo Ser. Cienc. For. Y Del Ambiente 2017, 23, 261–273. [Google Scholar] [CrossRef]

Figure 1. Map showing the locations of study areas: (A) “Sierra de Quila, Jalisco”; (B) “Sierra de Álvarez, San Luis Potosí,” and (C) “Selva El Ocote, Chiapas, México”.

Figure 2. Methodology flow chart to estimate the spatial distribution of HRAs.

Figure 3. Boxplot showing quartiles of altitude (m asl); average annual precipitation (mm); enhanced vegetation index (EVI) and forest canopy height (m) by HRAs in (A) “Sierra de Quila,” (B) “Sierra de Álvarez” and (C) “Selva El Ocote,” p-value < 0.05 according to Kruskal Wallis test.

Figure 4. HRAs spatial distribution; (A) “Sierra de Quila”; (B) “Sierra de Álvarez” and (C) “Selva El Ocote”.

Figure 5. Probability maps resulted from classification analysis; (A) “Sierra de Quila,” (B) “Sierra de Álvarez,” and (C) “Selva El Ocote”.

Table 1. Confusion matrix of the validation dataset in “Sierra de Quila.” Prod.Accu = Producer Accuracy and User.Accu = User Accuracy.

	HRA1	HRA2	HRA3	HRA4	Total	Prod.Accu
HRA1	119	3	0	0	122	97.54%
HRA2	0	175	2	0	177	98.87%
HRA3	0	0	91	3	94	96.81%
HRA4	0	0	2	55	57	96.49%
Total	119	178	95	58	97.78%	Overall
User.Accu	100.00%	98.31%	95.79%	94.83%

Table 2. Confusion matrix of the validation dataset in “Sierra de Álvarez.” Prod.Accu = Producer Accuracy and User.Accu = User Accuracy.

	HRA1	HRA2	HRA3	Total	Prod.Accu
HRA1	132	0	0	132	100.00%
HRA2	3	117	0	120	97.50%
HRA3	0	4	104	108	96.30%
Total	135	121	104	98.06%	Overall
User.Accu	97.78%	96.69%	100.00%

Table 3. Confusion matrix of the validation dataset in “Selva El Ocote.” Prod.Accu = Producer Accuracy and User.Accu = User Accuracy.

	HRA1	HRA2	HRA3	HRA4	HRA5	Total	Prod.Accu
HRA1	232	3	0	0	0	235	98.72%
HRA2	0	686	10	0	0	696	98.56%
HRA3	0	3	725	6	0	734	98.77%
HRA4	0	0	12	544	4	560	97.14%
HRA5	0	0	0	3	366	369	99.19%
Total	232	692	747	553	370	98.42%	Overall
User.Accu	100%	99.13%	97.05%	98.37%	98.92%

Table 4. Variable importance according to the mean decrease in impurity index by location. Altitude (m); average annual precipitation (AAP in mm); enhanced vegetation index (EVI) and forest canopy height (FCH in m).

	Altitude	AAP	EVI	FCH
“Sierra de Quila”	74.09%	17.19%	7.54%	1.18%
“Sierra de Álvarez”	52.51%	36.02%	8.12%	3.35%
“Selva El Ocote”	80.34%	14.98%	2.37%	2.31%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chávez-Durán, Á.A.; Olvera-Vargas, M.; Figueroa-Rangel, B.; García, M.; Aguado, I.; Ruiz-Corral, J.A. Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification. Forests 2022, 13, 1970. https://doi.org/10.3390/f13121970

AMA Style

Chávez-Durán ÁA, Olvera-Vargas M, Figueroa-Rangel B, García M, Aguado I, Ruiz-Corral JA. Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification. Forests. 2022; 13(12):1970. https://doi.org/10.3390/f13121970

Chicago/Turabian Style

Chávez-Durán, Álvaro Agustín, Miguel Olvera-Vargas, Blanca Figueroa-Rangel, Mariano García, Inmaculada Aguado, and José Ariel Ruiz-Corral. 2022. "Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification" Forests 13, no. 12: 1970. https://doi.org/10.3390/f13121970

APA Style

Chávez-Durán, Á. A., Olvera-Vargas, M., Figueroa-Rangel, B., García, M., Aguado, I., & Ruiz-Corral, J. A. (2022). Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification. Forests, 13(12), 1970. https://doi.org/10.3390/f13121970

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mapping Homogeneous Response Areas for Forest Fuel Management Using Geospatial Data, K-Means, and Random Forest Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Description

2.3. Identification of HRAs in Each Study Area

2.4. Mapping the Spatial Distribution of HRAs in the Study Areas

3. Results

3.1. Identification of HRAs in Each Study Area

3.2. Spatial Distribution of HRAs

4. Discussion

4.1. Identification of HRAs in Each Study Area

4.2. Spatial Distribution of HRAs in the Study Areas

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI