A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands

Rodrigues, Julia; Dias, Mauricio Araújo; Negri, Rogério; Hussain, Sardar Muhammad; Casaca, Wallace

doi:10.3390/land13091427

Open AccessArticle

A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands

by

Julia Rodrigues

¹

,

Mauricio Araújo Dias

^2,*

,

Rogério Negri

³

,

Sardar Muhammad Hussain

⁴

and

Wallace Casaca

¹

São Paulo State University (UNESP), Institute of Biosciences, Humanities and Exact Sciences (IBILCE), São José do Rio Preto 15054-000, Brazil

²

São Paulo State University (UNESP), Faculty of Science and Technology (FCT), Presidente Prudente 19060-900, Brazil

³

São Paulo State University (UNESP), Science and Technology Institute (ICT), São José dos Campos 12245-000, Brazil

⁴

Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS), Faculty of Basic Sciences (FBS), Quetta 87300, Pakistan

^*

Author to whom correspondence should be addressed.

Land 2024, 13(9), 1427; https://doi.org/10.3390/land13091427

Submission received: 6 August 2024 / Revised: 1 September 2024 / Accepted: 2 September 2024 / Published: 4 September 2024

(This article belongs to the Special Issue GeoAI for Land Use Observations, Analysis and Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

The integrated use of remote sensing and machine learning stands out as a powerful and well-established approach for dealing with various environmental monitoring tasks, including deforestation detection. In this paper, we present a tunable, data-driven methodology for assessing deforestation in the Amazon biome, with a particular focus on protected conservation reserves. In contrast to most existing works from the specialized literature that typically target vast forest regions or privately used lands, our investigation concentrates on evaluating deforestation in particular, legally protected areas, including indigenous lands. By integrating the open data and resources available through the Google Earth Engine, our framework is designed to be adaptable, employing either anomaly detection methods or artificial neural networks for classifying deforestation patterns. A comprehensive analysis of the classifiers’ accuracy, generalization capabilities, and practical usage is provided, with a numerical assessment based on a case study in the Amazon rainforest regions of São Félix do Xingu and the Kayapó indigenous reserve.

Keywords:

anomaly detection; Google Earth Engine; machine learning; neural networks

1. Introduction

Covering an extensive area of over 8.5 million km², Brazil ranks as the fifth-largest nation globally. Due to its continental size, the country is home to six highly biodiverse biomes, including the Amazon Rainforest, Atlantic Forest, Cerrado, Caatinga, Pampas and Pantanal [1]. Among these, the Amazon biome stands out as the world’s largest tropical forest, widely recognized for its immense biodiversity and great environmental significance globally. Indeed, it plays a crucial role in the global climate, significantly contributing to reducing various ecological issues, including carbon dioxide sequestration, climate regulation, and the distribution of rainfall and air masses [2].

Despite its importance in regulating the Earth’s climate and ecological systems, the Amazon rainforest has been threatened by harmful human activities, ranging from large wildfires to massive deforestation, primarily driven by cattle ranching and intensive mining [3,4]. According to the Brazilian Amazon Rainforest Monitoring Program by Satellite (PRODES) [5,6], the tree cover loss from 1 August 2021 to 31 July 2022 was 11,568 km², with most of it occurring in legally protected lands. These lands, divided into indigenous territories and conservation units, represent 44% of the Brazilian Amazon rainforest [7].

Thanks to the existence of public policies and these reserved areas, deforestation has remained relatively stable in such areas until 2018. However, from 2018 to 2021, the annual percentage rate of gross forest loss in these territories was twice as high as that in non-designated lands, with part of this loss attributed to the weakening of forest policies in Brazil by the federal government from 2019 to 2022 [8]. It is not a surprise that reserved areas have continued to be under pressure, with many local conflicts involving illegal activities such as slash-and-burn and logging. These anthropic actions have been the precursors to deforestation, preceding the creation of cattle to mark territories [9], as well as illegal mining camps [10]. For example, deforestation within Brazilian indigenous territories increased by 129% from 2013 to 2021, with 59% of CO₂ emissions (around 96 million tons) produced during this period being emitted between 2019 to 2021, highlighting the uncontrolled desertification process of the Amazon biome [11].

Given the recurrent human interference in the Amazon biome, including its law-protected areas, the systematic monitoring of deforestation is indispensable. One way to accomplish this on a large scale is by applying Remote Sensing (RS) technologies, which allow for the continuous tracking of forest degradation. Representative examples include near-real-time detection of tree loss, precise quantification of affected areas, and early identification of deforestation [12,13,14,15,16]. Another effective tool is Machine Learning (ML) [17], as it enables the design of new data-driven methods to identify potential changes in forest zones by analyzing intrinsic features and trends in remotely sensed data [18]. Neural Networks (NN) [19,20] and popular classification algorithms such as Random Forest (RF) [21] and Support Vector Machines (SVM) [19,22] are among the most effective methods used for environmental analysis. In contrast, Anomaly Detection (AD) [23], although a less utilized class of machine learning techniques, has found applications across various fields, including health sciences [24], social monitoring [25] and fault detection [26]. More recently, AD has also been applied in remote sensing to detect temporal changes on the Earth’s surface [27], map fires [28] and monitor algae proliferation [29].

Over the last decade, prior research has employed ML with fresh RS data taken from multiple sources of government-maintained repositories, such as PRODES [5] and the Institute of People and the Environment of the Amazon (IMAZON) [30]. Camara et al. [31] assessed deforestation in the Brazilian Amazon from 2008 to 2021, by unifying different government datasets. They found that a significant portion of the forest degradation occurred in private lands, the majority of which was illegal. Das Neves et al. [32] applied the RF algorithm on public data acquired from IMAZON to analyze the impact of both hidden and official road networks on forest loss in the state of Pará, Brazil, from 1988 to 2018. They discovered that clandestine roads were the most critical contributors to forest exploitation, with official roads following closely behind. Jakimow et al. [33] also applied an RF-based approach to study the relationship between forest loss and fire occurrences in a large area of the Brazilian Amazon from 2014 to 2020. They highlighted an atypical rise in burned areas and forest loss post-2018, particularly in agrarian settlements, conservation units and medium/large rural properties. Dallaqua et al. [34] utilized the SVM classifier for detecting deforestation in the Brazilian Legal Amazon. In contrast to previous works, their approach relied on the ForestEyes Project [35], which involved volunteers classifying remote sensing images, resulting in annotated data that were taken as the training set for classification. Lastly, deep neural networks have also been successfully used for mapping deforestation. A good representative of this is the work by Adarme et al. [20], which aimed to compare the performance of three convolutional networks: Early Fusion [36], Siamese Network [37] and Convolutional Support Vector Machine [38].

Despite their immeasurable importance in supporting the Amazon rainforest, most existing works do not assess the impact of deforestation on protected conservation units. Instead, they focus on evaluating vast forest areas, such as country-sized municipalities and even states, as well as privately used lands like pasture and agriculture fields. The lack of research directly correlating the impact of deforestation in Amazon cities and their neighboring areas, including legally protected reserves, also reveals a critical gap in conservation research in the Amazon biome.

To overcome the issues raised above, this paper introduces a trainable and flexible data-driven methodology for mapping deforestation, focusing on the Amazon biome. In more technical terms, the current approach allows for free customization by incorporating and training different classification strategies, specifically those from two machine learning domains: anomaly detection and neural networks. Our methodology takes the Google Earth Engine (GEE) API to obtain fresh and accurate data, enabling the training of both AD- or NN-based models to obtain representative features from remotely sensed data. In addition to integrating the GEE and two branches of ML strategies, a comprehensive analysis of their applicability, generalization, accuracy and tuning aspects is provided, highlighting their strengths and limitations in the context of deforestation classification. Another contribution point is the exploration of recent applications of anomaly detection for remote sensing classification, adapting and evaluating this ML strategy to discriminate deforestation patterns. Unlike existing methods in the specialized literature that usually employ AD to select specific targets or create susceptibility maps for specific environmental incidents [18], our approach aims to explore the fitting capabilities of anomaly detection for typical machine learning tasks by customizing it for deforestation classification.

To quantitatively validate our approach while still filling the gap in targeted research for low-protected areas, the deforestation problem is explored through a case study of forest loss in the Brazilian municipality of São Félix do Xingu (SFX) and its surrounding areas, including the Kayapó indigenous park. The Kayapó’s conservation unit spans 32,000

{km}^{2}

, exceeding the size of many countries worldwide such as Armenia and Belgium. For decades, the Brazilian Kayapó people have defended their territory from loggers, miners, farmers and land grabbers. Now, with a newly constructed highway encroaching on their lands [39], this indigenous community faces such obstacles in keeping its ancestral territory and preserving biodiversity in the Amazon biome.

2. Protected Areas and Indigenous Peoples in the Brazilian Legal Amazon

According to the Brazilian Institute of Geography and Statistics (IBGE) [40], the Legal Amazon (LA) is a political designation that includes nine Brazilian states and spans across three biomes: Amazon, Cerrado and Pantanal. In the Brazilian LA, Protected Areas (PAs) are legally defined as clearly marked geographical spaces designated by the government to preserve ecosystems, biodiversity and essential environmental services such as soil conservation, watershed protection, pollination, nutrient recycling and climate regulation. They also uphold the rights and cultural heritage of traditional and indigenous communities that have historically inhabited these areas [41]. The PAs can be grouped into Conservation Units (CUs) and Indigenous Lands (ILs), which together cover approximately 52% of the forested areas in the Brazilian LA [8]. Comparisons of deforestation rates reveal that PAs experience deforestation at rates up to ten times lower than those in non-PAs [42].

Violating restrictions in PAs in the Brazilian Amazon carries serious legal consequences, including substantial fines, imprisonment and the confiscation of tools and equipment used in these illegal activities. Illegal actions such as the suppression of vegetation, illegal fishing and hunting are among the most common infractions in these areas, and enforcement mechanisms are designed to deter such activities by imposing stringent penalties [43].

In the Brazilian Amazon, indigenous lands are crucial for forest preservation. For example, between 2000 and 2021, these territories, which include PAs, were responsible for just 5% of the total net forest loss, underscoring their critical role in reducing deforestation compared to non-PAs [8]. Despite this vital importance, the relationship between traditional peoples and PAs has been intricate over the past decades, as it involves the transformation of historically unbounded territories into legally recognized “Indigenous Lands”. However, once these areas were officially designated as PAs by the Brazilian government, the native communities developed forms of “sustainable development” that simultaneously respect their own values and meet external demands for the conservation of the PAs [44].

Despite their conservation success, PAs are under increasing pressure from extractive industries, and even the government, who seek to convert these areas into zones for economic expansion [45]. This ongoing challenge emphasizes the need for robust legal protections and policies that prioritize the rights and traditional practices of indigenous peoples while keeping the preservation of the Amazon’s biodiversity.

3. Materials and Methods

Our methodological framework consists of five main steps, as shown in Figure 1. First, data collection is conducted, followed by data categorization and labeling. Next, both anomaly detection- and neural network-based models, adjusted for the classification task, are implemented, trained and then applied to the test data to determine the presence of deforested areas in remote sensing images. After that, a comprehensive assessment of the results is performed, where several evaluation metrics and qualitative analyses are carried out to check the performance of the classification models. Finally, the last step consists of drawing a conclusion based on the findings from the previous steps.

3.1. Study Area and Data Acquisition

To proceed with the classification task in the Amazon rainforest, the municipality of São Félix do Xingu, located in the state of Pará, Brazil, along with its vicinity, was selected. The municipality encompasses a vast expanse of Amazon rainforest, including the Taboca district and Kayapó’s Indigenous Park. Figure 2 depicts the study area location.

The examined data were collected and processed using the GEE API [46], which offers robust cloud-managed features, and an extensive catalog of geospatial data, including images captured using remote sensors. In particular, remote sensing images were obtained from the integrated Landsat-8 sensor with 30 m spatial resolution. This catalog was selected because it provides high-resolution images of the Earth’s surface, covering the visible spectrum, near-infrared and thermal infrared. Such availability comes from its Operational Land Imager and the Thermal Infrared Sensor. While the former captures spectral bands that enhance sensitivity to chlorophyll and suspended materials in coastal areas, the latter acquires infrared bands, crucial for accurate vegetation assessment [47]. Figure 3 illustrates two samples cropped from the study area, with the first highlighting the urbanized area of São Félix do Xingu, and the second illustrating the Kayapó indigenous park.

The images were acquired between 1 January 2020 and 31 December 2021 so that the growth and representative differences during such a period can be analyzed. Additionally, ground-truth samples were generated through a careful visual interpretation of the study area. This involved manually inspecting the reference polygons made available from the annual deforestation inventory provided by the PRODES project [6]. Lastly, the images were cropped to create the definitive dataset, totaling 1852 samples of

200 \times 200

pixels, with 620 samples from the Kayapó indigenous area. All the images include georeferenced coordinates in GeoJson format for input into the GEE API. For details regarding our training and parameter calibration approach, see Section 3.3.6.

3.2. Data Curation and Preprocessing

To enhance the discrimination power of the ML classification models, feature masks and spectral indices were computed from the raw images. These included the computation of water mask using the Normalized Difference Water Index (NDWI) [48], the no-cloud mask via the FMask algorithm [49] and the well-established Normalized Difference Vegetation Index (NDVI) [50]. In short, this step focused on identifying and subsequently removing undesirable objects from the remote sensing images.

3.2.1. Water Mask Detection

A straightforward yet effective way to generate water masks is by applying suitable spectral indices, isolating water-like pixels. In our approach, the NDWI [48] index is computed to capture water bodies in the study area.

To mathematically accomplish this, let

I (s) = x

be an attribute vector representing the input image I, positioned at pixel

s \in S \subset N^{2}

, where

x

contains the components

x_{G r e e n}

and

x_{N I R}

, denoting the radiometric responses at the green and near-infrared bands, respectively. The resulting NDWI image is then computed as follows:

I_{N D W I} = \frac{x_{G r e e n} - x_{N I R}}{x_{G r e e n} + x_{N I R}} .

(1)

Figure 4a illustrates an Amazon subregion with the Xingu River highlighted, where the water mask-based filtering was applied with

I_{N D W I} > 0.3

.

3.2.2. Cloud Mask Detection

In this step, a cloud mask generation method is applied to identify and exclude undesirable segments formed by clouds, which can disrupt the image resolution. To achieve this task, the so-called FMask algorithm [49] has been adopted to locate and label cloud-like pixels in the images.

After getting the cloud mask, the operation is inverted using the “not” operator, resulting in a no-cloud mask that effectively removes clouds from the region of interest. In the computed cloud mask, a value of zero indicates the presence of clouds, while a value of one indicates their absence. An illustrative example of cloud removal by using this approach can be seen in Figure 4b.

3.2.3. NDVI Composition Computation

The NDVI [50] is a popular and accurate spectral index for vegetation assessment, as it captures the specific spectra that chlorophyll reflects and absorbs. Since chlorophyll absorbs more red light and reflects more near-infrared light, the NDVI can be used to determine the difference in reflectance between these two bands. This difference image composition is normalized by dividing it by the sum of these values, and the calculation is performed on a pixel-by-pixel basis [51].

In more mathematical terms, the NDVI is obtained as follows:

I_{N D V I} = \frac{x_{N I R} - x_{R e d}}{x_{N I R} + x_{R e d}},

(2)

where

x_{R e d}

denotes the radiometric responses at the red spectrum.

Figure 4c presents the NDVI composition after discarding the water and clouds by applying the masks computed in the previous step (Section 3.2.1 and Section 3.2.2), and replacing the missing data by the median values observed over the considered period. In this visual representation, darker areas indicate regions with less vegetation, while the bright ones correspond to segments with more healthy/dense vegetation.

3.3. Anomaly Detection and Neural Network-Based Classification Models

This section describes the anomaly detection- and neural network-based classifiers used as part of our methodology.

3.3.1. Preliminary Notations and Background

In remote sensing applications, image classification relies on classifiers, which usually take probabilistic calculations to assign pixels or segments to specific classes [52]. Mathematically, let us consider an image

I = I (s)

obtained remotely using a sensor, where each pixel

s \in S \subset N^{2}

is associated with an attribute vector

x_{s} = (x_{1}^{(s)}, x_{2}^{(s)}, \dots, x_{n}^{(s)}) \in X \subset R^{n}

.

The image classification problem aims at determining a class

ω_{k} \in Ω = {ω_{1}, ω_{2}, \dots, ω_{c}}

for each pixel s by applying a mapping function

F : X \to Y

, where

Y = \{1, \dots, c\}

. To enable effective mapping, a training set

D = \{(x_{i}, y_{i}) \in X \times Y : i = 1, \dots, m\}

is necessary, where

(x_{i}, y_{i})

indicates that

x_{i}

is associated with a specific class

ω_{a}

, if

y_{i} = a \in Y

.

In practice, classes embed elements sharing similar characteristics and patterns. Thus, function F is applied to associate each pixel in the image with a specific class, tailored to different modeling strategies. Among these, anomaly detection stands out as a simple and effective alternative, as it can detect rare phenomena with signatures that differ from surrounding pixels, such as potential deforested areas [53]. Another highly effective category of classification algorithms is artificial neural networks.

Next, two anomaly detection methods, particularly the Isolation Forest (IF) and the One-Class Support Vector Machine (OC-SVM), as well as two popular neural network-based classifiers—Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN)—are described.

3.3.2. Isolation Forest

The IF algorithm is characterized by its relatively recent use in the context of anomaly detection. It has been an attractive choice to model classification tasks due to its simplicity and efficiency in handling high-dimensional datasets, leading to its increased utilization [54]. IF utilizes isolation structures to segregate instances from others in a given dataset, identifying anomalies based on features that deviate from normal data.

In the IF formulation, binary trees, termed Isolation Trees (IT), are employed to isolate data, with partitions randomly generated within the data. The deeper a data point is in the tree, the more challenging it is to isolate, indicating a higher likelihood of being an anomaly [55]. The IT starts with a sample set

\{x_{1}, \dots, x_{m}\}

, where each

x_{i} \in R^{d}

represents specific attributes. This set can be represented as a matrix

X

. The IT splits

X

by randomly selecting a value p from the q-th attribute, recursively continuing until one of three conditions is met: maximum length,

| X | = 1

, or identical columns in

X

.

In order to adapt the IF method as an anomaly detection-based classifier, the goal was to explore the available data based on the length of paths from the root to leaf nodes in the IT structure, represented here as

h (x_{i})

. The anomaly score,

S (x_{i}, m)

, quantifies the anomalousness of each data point relative to others in the dataset using

E (h (x_{i}))

, the mean path length [28]. As a result, scores nearing 1 indicate highly anomalous data points where

E (h (x_{i}))

is low, while scores near 0 suggest typical or normal data points, especially when

h (x_{i})

approaches its maximum value. Scores around 0.5 denote uncertainty in distinguishing anomalies when

E (h (x_{i}))

approximates a specific threshold value.

3.3.3. One-Class Support Vector Machines

SVM-based methods are highly effective, mathematically inspired approaches for binary classification, designed to find a hyperplane in the feature space that separates data into distinct classes. They rely on kernels to handle nonlinear data representation and are applicable to various domains [56].

The OC-SVM algorithm is a variant inspired by the classic SVM method, which can be adapted to classify specific samples while rejecting others, making it valuable for analyzing remote sensing data. To properly apply the OC-SVM as an authentic anomaly detection-based classifier, mathematical modeling is applied to identify “regular objects” within Ƶ with a probability

ν

of false positives. The “kernelized” OC-SVM function,

F : X \to \{+ 1, - 1\}

, returns

+ 1

if the input data belongs to Ƶ, and

- 1

otherwise, and is defined as follows:

F (x) = sgn (\sum_{i = 1}^{n} α_{i} K (x, x_{i}) - b),

(3)

where

b = \sum_{j = 1}^{n} α_{j} K (x_{i}, x_{j})

,

x_{i} \in

Ƶ and

K (\cdot, \cdot)

is a pre-specified kernel operator.

The coefficients

α_{i}

and

i = 1, \dots, n

are determined by solving the following optimization problem:

\begin{matrix} min_{α_{1}, \dots, α_{n}} \sum_{i, j = 1}^{n} α_{i} α_{j} K (x_{i}, x_{j}) \\ s . t . \{\begin{matrix} α_{i} \in [0, \frac{1}{ν n}] \\ \sum_{i = 1}^{n} α_{i} = 1 \end{matrix} \end{matrix}

(4)

It is important to point out that the OC-SVM is parameterized by

ν \in [0, 1]

, and its performance varies according to the selected kernel function. Table 1 lists the kernel operators explored in our analysis. For a comprehensive theoretical analysis of kernel functions, see [57].

3.3.4. Multilayer Perceptron Networks

The MLP is a popular artificial neural network that is composed of one or more hidden layers, characterized by their high connectivity and defined in terms of their synaptic weights [58]. In an MLP network, signals propagate forward as function signals, starting as input stimuli and emerging as output, and propagate backward as error signals, computed from the output and moving backward through the network. The hidden layers act as feature detectors, gradually processing information while uncovering significant patterns through nonlinear transformations during the model’s training [59].

In our MLP formulation for deforestation classification, the input signal

z_{j} (n)

for a neuron j is computed by taking into account the synaptic weights, neuron outputs from the previous layer and the biases, allowing for nonlinear transformations in the feature space. In more mathematical terms,

z_{j} (n)

is given by Equation (5):

z_{j} (n) = \sum_{i = 1}^{N} w_{i j}^{(o)} y_{i} (n) + b_{j}^{(o)},

(5)

where

w_{i j}^{(o)}

represents the weight function of the connection between neurons i and j,

y_{i} (n)

determines the output of neuron i from the previous layer and

b_{j}^{(o)}

defines the bias of neuron j, which is added to the input before applying the activation function,

f_{a}

, computed according to Equation (6),

y_{j} (n) = f_{a} (z_{j} (n)) .

(6)

After this, the loss function is calculated by first determining the instantaneous error, computed using the difference between the expected and actual values. Table 2 lists the MLP hyperparameters used to run the experiments.

3.3.5. Convolutional Neural Networks

CNNs consist of a powerful and robust family of neural network architectures designed to process grid-like data, such as images and videos. A functional advantage of CNNs is their scalability for improved accuracy, accommodating larger datasets and more complex model architectures effectively [60]. The CNN’s convolutional layer extracts local features using convolutional operations with kernels, generating feature maps that highlight discovered patterns. The pooling layer reduces the dimensionality of the feature map, retaining only essential information and removing ambiguities [61].

In our trainable CNN-based model, the input image at each convolutional layer is processed using a set of learnable filters. Each filter convolves across the width and height of the input volume, computing the dot product between the filter entries and the input, producing a two-dimensional activation map. Mathematically, the processed image

z_{i j} (n)

for a neuron located at

(i, j)

in a given convolutional layer can be expressed by Equation (7)

z_{i j} (n) = \sum_{p = 1}^{H} \sum_{q = 1}^{W} w_{p q}^{(c)} x_{i + p, j + q} (n) + b_{i j}^{(c)},

(7)

where

w_{i j}^{(c)}

accounts for the weight function,

x_{p q}

represents the input pixels and

b_{i j}^{(c)}

gives the bias term for the convolutional filter.

The activation function

f_{a}

applied to the convolutional layer outputs for neuron

(i, j)

is given by Equation (8):

y_{i j} (n) = f_{a} (z_{i j} (n)) .

(8)

In the implemented CNN architecture, max-pooling layers, as computed by Equation (9), are employed so as to perform downsampling operations by selecting the maximum value within a pre-defined window, thereby preserving the most salient features while reducing the spatial dimensions of the feature maps. This operation is crucial for achieving spatial invariance, which is highly beneficial for the image classification task.

y_{i j}^{pool} (n) = max_{\begin{matrix} 1 \leq p \leq P \\ 1 \leq q \leq Q \end{matrix}} (y_{i + p, j + q} (n)) .

(9)

Finally, the outputs from the last pooling layer are flattened and fed into a fully connected (dense) layer. This layer acts similarly to those in a conventional MLP network, integrating high-level features and making the final prediction through a sigmoid activation function. Table 2 lists the CNN hyperparameters taken to run the experiments.

3.3.6. Hyperparameters Tuning and Calibration

After implementation was finished, the four above-described ML methods were extensively trained and evaluated under various parameter settings, resulting in classifiers that are able to determine two classes: non-deforested areas and deforested areas (i.e., a binary classification). One of the main challenges faced in using these classification methods is the need to fine-tune multiple hyperparameters to optimize model performance. The number of valid hyperparameter settings can increase exponentially, thus complicating the optimization process. To overcome this issue and enhance the effectiveness of the implemented models, the Grid Search strategy was applied to select optimal hyperparameters for the IF-, OC-SVM-, MLP- and CNN-based models.

By using Grid Search (GS) for hyperparameter tuning [62,63], a thorough exploration of the parameter space is ensured, aiming to achieve the best possible performance for each model. This approach, although potentially time-consuming, provides a systematic and reliable means of optimizing the ML models for the deforestation classification task. Table 1 and Table 2 present the experimental design for hyperparameters tuning.

3.3.7. Training, Testing and Computational Aspects

To train and test the classifiers, a total of 1852 image samples, each measuring

200 \times 200

pixels, were taken. These were split into 1232 images (around two-thirds) for training, which included images from the Kayapó indiengous park as well as non-indigenous areas and 620 images (around one-third) for testing purposes. The test collection was evenly divided into two sets: 310 images were used for Experiment #1, and the complementary 310 samples, all from the Kayapó indigenous reserve, were reserved for Experiment #2. The model training was conducted considering the parameter spaces and the strategies for optimal parameter search discussed in Section 3.3.6.

The data acquisition (Section 3.1) and pre-processing (Section 3.2) steps were performed using Python with the support of the GEE API, while the remaining computational steps of our framework were fully implemented in Python 3.11. Specifically, the complete pipeline included running the GEE API to retrieve the time-series of RS images for 2020 and 2021, as well as to compute the feature masks and spectral indices, followed by implementing, training and tuning the IF, OC-SVM, MPL and CNN models in Python. For the implementation of the ML algorithms, image processing as well as Python libraries was used, including OpenCV [64], TensorFlow [65] and Scikit-Learn [66].

4. Results and Discussion

This section presents a comprehensive battery of tests involving almost two thousand real-world remotely sensed images collected from the Amazon rainforest study area, including the Kayapó’s indigenous park. Specifically, the proposed methodology was applied by taking as classifiers two anomaly detection-based methods, IF and OC-SVM, as well as two popular neural networks, MLP and CNN. The resulting classifiers were then assessed and compared using ten popular image classification metrics, including ROC curve analysis. Qualitative assessments, including visual inspections of the results, are also provided.

4.1. Quantitative Assessments

In this section, the performance of all trained classifiers was quantitatively evaluated and compared across different evaluation metrics to assess their effectiveness in classifying deforested areas. These metrics provide popular quantitative indicators of the models’ capabilities to correctly identify deforestation instances while minimizing false positives and false negatives. In particular, the following evaluation metrics were computed:

\begin{matrix} Accuracy & = \frac{T P + T N}{T P + T N + F P + F N}, \end{matrix}

(10)

\begin{matrix} Specificity & = \frac{T N}{T N + F P}, \end{matrix}

(11)

\begin{matrix} Sensitivity & = \frac{T P}{T P + F N}, \end{matrix}

(12)

\begin{matrix} Precision & = \frac{T P}{T P + F P}, \end{matrix}

(13)

\begin{matrix} F 1 - Score & = 2 \times \frac{Precision \times Sensitivity}{Precision + Sensitivity}, \end{matrix}

(14)

where

T P

(True Positive) denotes the portion of correctly identified deforested pixels,

F P

(False Positive) gives the portion of non-deforested pixels misclassified as deforested,

F N

(False Negative) refers to the portion of deforested pixels misclassified as non-deforested, and

T N

(true negative) represents the portion of non-deforested pixels correctly identified. These indices are computed by comparing detection results with the ground-truth segment masks, collected and visually inspected from the PRODES project [5].

The machine-intelligent models were first compared in terms of their confusion matrices for the whole study area, as shown in Table 3, Table 4, Table 5 and Table 6. The IF anomaly detection-based classifier achieved around 15% false negatives and 3% false positives, revealing a notable number of instances where deforestation goes undetected, but maintains a satisfactory classification of 75% true positives for the target class. Similarly, OC-SVM reached 77% true positives, with no false positives, and a 13% occurrence of false negatives. In contrast, the tuned MLP and CNN neural networks exhibited superior performance. Particularly, MLP and CNN delivered correct classifications in 89% and 90% of data, respectively, and produced low percentages of false positives and negatives. The exception lies in true negatives, which are comparable to the anomaly detection-based models.

After constructing the confusion matrices, ROC curves were generated using five different thresholds, enabling a more visual perspective of the model’s ability to distinguish between positive instances (presence of deforestation) and negative ones (absence of deforestation). Additionally, ROC plots allow for calculating the AUC metric, which aggregates the classification performance across all thresholds. Figure 5 brings the ROC curves for IF, OC-SVM, MLP and CNN classifiers, with corresponding AUC values of 76%, 93%, 96% and 98%. The AUC of IF indicated moderate performance, performing better than random guessing but potentially falling short of exceptional. In contrast, the AUC scores from OC-SVM, MLP and CNN exceed 90%, highlighting their good performance. However, the NN-based models MLP and CNN exhibited faster curve rises, attributed to their complex architectures and hyperparameters that, when adequately optimized, are capable of capturing intricate data patterns.

The classifiers were also numerically assessed for the Kayapo’s indigenous park by computing the evaluation metrics listed in Table 7. Starting with accuracy, one can verify that the deep learning CNN-based model outperformed all competitors with a high score of 99.35%, followed by MLP (97.74%), OC-SVM (87.10%) and IF (81.61%). Regarding the specificity, OC-SVM achieved an ideal score of 100%, indicating it correctly assigned all negative instances without any false positives, while CNN produced a high score of 96.55%, followed by MLP (93.10%) and IF (68.96%). By checking the sensitivity of all models, CNN is the one that generates the best score (99.64%), closely trailed by MLP (98.22%), OC-SVM (85.76%) and IF (82.92%). The precision metric, which reflects the correctness degree of positive predictions, resulted in an ideal score for both OC-SVM and CNN at 100%, followed similarly by MLP (99.28%) and IF (96.28%). Lastly, taking into account the F1-score, which balances precision and sensitivity, CNN delivered the best output (99.64%), closely followed by MLP (98.74%), OCSVM (92.33%) and IF (89.10%). This reinforces CNN’s overall superior performance in accurately classifying instances of deforestation compared to the remaining three classifiers, including the anomaly detection-based methods, which were specifically designed to map forest losses. In conclusion, while all classifiers demonstrated satisfactory performance, the CNN neural network, when adequately tuned, consistently surpassed the others in three of the five evaluation metrics computed, making it the most effective classifier for identifying deforestation.

4.2. Qualitative Assessments

The visual quality of the deforestation results produced by both groups of anomaly detection and neural network models, identifying deforested areas, was evaluated across various real-world remote sensing images. These samples included not only regions within the expansive Amazon rainforest but also cropped images from the legally protected Brazilian indigenous park of Kayapó. For enhanced visual inspection, segments classified by the ML methods were highlighted in pink.

4.2.1. Visual Evaluation of Classification in Multiple Amazon Rainforest Regions

From the images depicted in Figure 6, it can be observed that the IF AD-based model generated smaller and more scattered segments of exposed soil compared to the others. This scattered behavior may be attributed to the mathematical IF design, which maps anomalies within the data using tree-like structures. Moreover, pixels with darker grayscale values, as defined by the NDVI spectrum, are considered anomalies, leading to the deforestation labeling as granular rather than continuous. In contrast, the outputs from the OC-SVM, which is another anomaly detection-based approach, were closer to the deforested areas. However, its classification was still granular because the algorithm sought to separate normal data points from outliers. As a result, pixels representing deforestation, particularly those exhibiting an anomalous nature in the feature space, are highlighted as outliers.

Now, considering the results from the neural networks models, the tuned MLP delivered a more continuous classification, but excessively reached false negatives and poorly delineated segment boundaries, possibly due to issues inherent to the non-convolutional format of the network architecture. Finally, the last column brings the classifications generated by the tuned convolutional neural model. As the CNN can effectively capture spatial dependencies and correlations within the image data, it better delineates the contours of the targets, resulting in a more natural and smooth outcome. The only exception occurred in the middle image, where the CNN network produced false detections in the central part of the image.

4.2.2. Visual Evaluation of Classification in the Amazon Indigenous Park of Kayapó

The Amazon Indigenous Park of Kayapó, which is a vast and ecologically vital reserve within the São Félix do Xingu municipality, in Brazil, poses challenges for deforestation mapping via remote sensing images. In Figure 7, the model’s ability to accurately identify deforested areas within this protected conservation unit is assessed. By investigating the Kayapó indigenous park, a better understanding of the effectiveness of these machine-intelligent models in monitoring ecologically sensitive and culturally important areas can be reached.

The outputs for the IF, OC-SVM, MLP and CNN classifiers exhibited distinct characteristics and performance levels across the images provided. For instance, IF produced generally good classifications, effectively capturing a significant portion of exposed soil amidst the dense forest vegetation. However, it also presented a tendency to misclassify certain non-deforested areas as deforested. OC-SVM resulted in excessively granular classifications, failing to capture many true positives. This granularity likely stems from the formulation’s approach to separating normal data points from outliers, which can lead to fragmented and less accurate results. The neural network-tuned models, MLP and CNN, delivered similar and better outcomes compared to the anomaly detection models. They consistently and continuously captured the regions of interest, delineating the deforested areas with greater accuracy. The CNN, in particular, exhibited a slight edge in performance, yielding more refined classifications. Overall, these results corroborate the previous discussion, with the anomaly detection-based classifiers identifying substantial portions of the deforested areas but falling short compared to the neural networks.

5. Conclusions and Future Work

This paper presents a robust and adaptable data-driven methodology capable of accommodating either anomaly detection or neural network models within a ML framework for deforestation detection. In particular, the current framework can be applied from various perspectives of machine learning designs, including anomaly detection and neural network-based modeling, thus enhancing the accuracy and generalization capability of deforestation mapping while streamlining the classification process in remote sensing applications. To experimentally validate its use, a comprehensive analysis of anomaly detection methods adapted for deforestation detection was conducted, alongside two popular neural networks, MLP and CNN.

In addition to presenting a comparative evaluation of both formulation strategies for deforestation mapping, our analysis addressed a critical gap by applying and evaluating potential classifiers for detecting deforestation within a conservation unit, such as the Kayapó indigenous park. Spanning 32,000 km², this area is vital for biodiversity preservation but has faced ongoing threats from activities like agriculture and infrastructure development.

Through an extensive battery of tests involving nearly two thousand remotely sensed images of the Amazon rainforest, it was found that anomaly detection-driven classifiers such as IF and OC-SVM are suitable for detecting deforested areas but exhibited limitations in granularity and scattered false detections. In contrast, the MLP- and CNN-based models, when consistently tuned, outperform others by capturing deforestation signatures with greater precision and continuity, leveraging their capability to learn complex spatial dependencies from the data. For instance, the AUC for the IF and OC-SVM models was 76% and 93%, while MLP and CNN achieved 96% and 98%.

As shown by the experiments, CNN has proven to be effective in detecting deforestation in the Amazon rainforest, including indigenous reserves. Their ability to extract representative features from satellite imagery enables the identification of critical to subtle changes in land cover, which can offer clues for uncovering illegal activities on the ground, such as mining and illicit harvesting. Furthermore, CNNs can be successfully extended to map other native forest regions, as well as to detect other objects-of-interest, ranging from farmland [67] to plant [68] and tree [69] species. However, despite their effectiveness and robustness, CNNs may encounter challenges that warrant attention. For instance, they may dependent on the quality and resolution of the available data, and low-resolution datasets can adversely affect the model’s performance [70]. Another aspect to consider is the need for large training datasets to ensure the model’s adequate generalization and accuracy.

In conclusion, this study not only copes with the classification task for sensitive ecological areas but also validates the applicability of two advanced categories of machine learning techniques in environmental monitoring. By bridging the gap between anomaly detection and neural networks in terms of comparative performance and generalization capability, our work aimed to contribute to advancing the field of environmental analysis, such as the massive monitoring of the Amazon rainforest.

For future work, further exploration will include the following: (i) the integration of data from multiple satellite sensors to enhance the robustness and accuracy of the deforestation detection task; (ii) the application of other machine learning methods to develop more sophisticated models capable of capturing intricate deforestation patterns across different environmental contexts.

Author Contributions

Conceptualization—J.R., R.N., S.M.H. and W.C.; investigation—J.R., M.A.D. and W.C.; methodology—J.R., M.A.D., R.N. and W.C.; validation—J.R. and W.C.; resources—J.R., M.A.D. and S.M.H.; funding acquisition—R.N. and W.C.; writing—original draft—J.R. and W.C.; supervision—R.N. and W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the São Paulo State University (UNESP), the São Paulo Research Foundation (FAPESP), grants #2013/07375-0, #2022/13665-0 and #2023/14427-8, and the National Council for Scientific and Technological Development (CNPq), grants #316228/2021-4 and #305220/2022-5.

Data Availability Statement

Dataset available on request from the authors. The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Myers, N.; Mittermeier, R.A.; Mittermeier, C.G.; Da Fonseca, G.A.; Kent, J. Biodiversity hotspots for conservation priorities. Nature 2000, 403, 853–858. [Google Scholar] [CrossRef] [PubMed]
Müller, C. Brazil and the Amazon Rainforest: Deforestation, Biodiversity and Cooperation with the EU and International Forums; European Parliamentary Research Service: Brussels, Belgium, 2020. [Google Scholar]
Nobre, C.A.; Sampaio, G.; Borma, L.S.; Castilla-Rubio, J.C.; Silva, J.S.; Cardoso, M. Land-use and climate change risks in the Amazon and the need of a novel sustainable development paradigm. Proc. Natl. Acad. Sci. USA 2016, 113, 10759–10768. [Google Scholar] [CrossRef]
Pivello, V.; Vieira, I.; Christianini, A.; Ribeiro, D.; Menezes, L.; Berlinck, C.; Melo, F.; Marengo, J.A.; Tornquist, C.G.; Tomas, W.M.; et al. Understanding Brazil’s catastrophic fires: Causes, consequences and policy needed to prevent future tragedies. Perspect. Ecol. Conserv. 2021, 19, 233–255. [Google Scholar] [CrossRef]
PRODES. Prodes and Deter: Get to Know These Strategic Systems in the Fight against Deforestation in the Amazon. 2022. Available online: https://infoamazonia.org/en/2022/02/15/prodes-and-deter-systems-against-deforestation-amazon (accessed on 3 March 2024).
FG Assis, L.F.; Ferreira, K.R.; Vinhas, L.; Maurano, L.; Almeida, C.; Carvalho, A.; Rodrigues, J.; Maciel, A.; Camargo, C. TerraBrasilis: A spatial data analytics infrastructure for large-scale thematic mapping. ISPRS Int. J. Geo-Inf. 2019, 8, 513. [Google Scholar] [CrossRef]
Imazon. Imazon’s Deforestation Alert System. 2024. Available online: https://imazon.org.br/wp-content/uploads/2024/02/SAD-Janeiro-2024.pdf (accessed on 9 June 2024).
Qin, Y.; Xiao, X.; Liu, F.; de Sa e Silva, F.; Shimabukuro, Y.; Arai, E.; Fearnside, P.M. Forest conservation in Indigenous territories and protected areas in the Brazilian Amazon. Nat. Sustain. 2023, 6, 295–305. [Google Scholar] [CrossRef]
Donoso, V.G.; Hirye, M.C.; Gerwenat, C.; Reicher, C. Amazon Deforestation and Global Meat Consumption Trends: An Assessment of Land Use Change and Market Data from Rondônia That Shows Why We Should Consider Changing Our Diets. Sustainability 2024, 16, 4526. [Google Scholar] [CrossRef]
Carvalho, W.D.; Mustin, K.; Hilário, R.R.; Vasconcelos, I.M.; Eilers, V.; Fearnside, P.M. Deforestation control in the Brazilian Amazon: A conservation struggle being lost as agreements and regulations are subverted and bypassed. Perspect. Ecol. Conserv. 2019, 17, 122–130. [Google Scholar] [CrossRef]
Silva-Junior, C.H.; Silva, F.B.; Arisi, B.M.; Mataveli, G.; Pessôa, A.C.; Carvalho, N.S.; Reis, J.B.; Silva Júnior, A.R.; Motta, N.A.; E Silva, P.V.M.; et al. Brazilian Amazon indigenous territories under deforestation pressure. Sci. Rep. 2023, 13, 5851. [Google Scholar] [CrossRef]
Ometto, J.P.; Aguiar, A.P.D.; Martinelli, L.A. Amazon deforestation in Brazil: Effects, drivers and challenges. Carbon Manag. 2011, 2, 575–585. [Google Scholar] [CrossRef]
Souza, C.M., Jr.; Siqueira, J.V.; Sales, M.H.; Fonseca, A.V.; Ribeiro, J.G.; Numata, I.; Cochrane, M.A.; Barber, C.P.; Roberts, D.A.; Barlow, J. Ten-year Landsat classification of deforestation and forest degradation in the Brazilian Amazon. Remote Sens. 2013, 5, 5493–5513. [Google Scholar] [CrossRef]
Brovelli, M.A.; Sun, Y.; Yordanov, V. Monitoring forest change in the amazon using multi-temporal remote sensing data and machine learning classification on Google Earth Engine. ISPRS Int. J. Geo-Inf. 2020, 9, 580. [Google Scholar] [CrossRef]
Silva, C.A.; Guerrisi, G.; Del Frate, F.; Sano, E.E. Near-real time deforestation detection in the Brazilian Amazon with Sentinel-1 and neural networks. Eur. J. Remote Sens. 2022, 55, 129–149. [Google Scholar] [CrossRef]
Assunção, J.; Gandour, C.; Rocha, R. DETER-ing deforestation in the Amazon: Environmental monitoring and law enforcement. Am. Econ. J. Appl. Econ. 2023, 15, 125–156. [Google Scholar] [CrossRef]
Holloway, J.; Mengersen, K. Statistical machine learning methods and remote sensing for sustainable development goals: A review. Remote Sens. 2018, 10, 1365. [Google Scholar] [CrossRef]
Gino, V.L.; Negri, R.G.; Souza, F.N.; Silva, E.A.; Bressane, A.; Mendes, T.S.; Casaca, W. Integrating unsupervised machine intelligence and anomaly detection for spatio-temporal dynamic mapping using remote sensing image series. Sustainability 2023, 15, 4725. [Google Scholar] [CrossRef]
Babu, J.S. Analysis and Detection of Deforestation Using Novel Remote-Sensing Technologies with Satellite Images. In Proceedings of the IADS International Conference on Computing, Communications & Data Engineering (CCODE), Tirupati, Andhra Pradesh, India, 7–8 February 2018; pp. 1–10. [Google Scholar]
Adarme, M.O.; Feitosa, R.Q.; Happ, P.N.; De Almeida, C.A.; Gomes, A.R. Evaluation of Deep Learning Techniques for Deforestation Detection in the Brazilian Amazon and Cerrado Biomes from Remote Sensing Imagery. Remote Sens. 2020, 12, 910. [Google Scholar] [CrossRef]
Santos, F.; Graw, V.; Bonilla, S. A geographically weighted random forest approach for evaluate forest change drivers in the Northern Ecuadorian Amazon. PLoS ONE 2019, 14, e0226224. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Parimala, V.K. Anomaly Detection: Recent Advances, AI and ML Perspectives and Applications; IntechOpen: London, UK, 2024. [Google Scholar]
Bijlani, N.; Nilforooshan, R.; Kouchaki, S. An unsupervised data-driven anomaly detection approach for adverse health conditions in people living with dementia: Cohort study. JMIR Aging 2022, 5, e38211. [Google Scholar] [CrossRef]
Marzuoli, A.; Liu, F. Monitoring of natural disasters through anomaly detection on mobile phone data. In Proceedings of the IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 4089–4098. [Google Scholar]
Chaudhary, A.; Agarwal, R. Machine Learning Techniques for Anomaly Detection Application Domains. In Paradigms of Smart and Intelligent Communication, 5G and Beyond; Springer: Berlin/Heidelberg, Germany, 2023; pp. 129–147. [Google Scholar]
Guo, Q.; Pu, R.; Cheng, J. Anomaly detection from hyperspectral remote sensing imagery. Geosciences 2016, 6, 56. [Google Scholar] [CrossRef]
Luz, A.E.O.; Negri, R.G.; Massi, K.G.; Colnago, M.; Silva, E.A.; Casaca, W. Mapping fire susceptibility in the Brazilian Amazon forests using multitemporal remote sensing and time-varying unsupervised anomaly detection. Remote Sens. 2022, 14, 2429. [Google Scholar] [CrossRef]
Ananias, P.H.M.; Negri, R.G.; Bressane, A.; Dias, M.A.; Silva, E.A.; Casaca, W. ABF: A data-driven approach for algal bloom forecasting using machine intelligence and remotely sensed data series. Softw. Impacts 2023, 17, 100518. [Google Scholar] [CrossRef]
Stanimirova, R.; Tarrio, K.; Turlej, K.; McAvoy, K.; Stonebrook, S.; Hu, K.T.; Arévalo, P.; Bullock, E.L.; Zhang, Y.; Woodcock, C.E.; et al. A global land cover training dataset from 1984 to 2020. Sci. Data 2023, 10, 879. [Google Scholar] [CrossRef] [PubMed]
Camara, G.; Simoes, R.; Ruivo, H.M.; Andrade, P.R.; Soterroni, A.C.; Ramos, F.M.; Ramos, R.G.; Scarabello, M.; Almeida, C.; Sanches, I.; et al. Impact of land tenure on deforestation control and forest restoration in Brazilian Amazonia. Environ. Res. Lett. 2023, 18, 065005. [Google Scholar] [CrossRef]
das Neves, P.B.T.; Blanco, C.J.C.; Duarte, A.A.A.M.; das Neves, F.B.S.; das Neves, I.B.S.; dos Santos, M.H.d.P. Amazon rainforest deforestation influenced by clandestine and regular roadway network. Land Use Policy 2021, 108, 105510. [Google Scholar] [CrossRef]
Jakimow, B.; Baumann, M.; Salomão, C.; Bendini, H.; Hostert, P. Deforestation and agricultural fires in South-West Pará, Brazil, under political changes from 2014 to 2020. J. Land Use Sci. 2023, 18, 176–195. [Google Scholar] [CrossRef]
Dallaqua, F.B.; Faria, F.A.; Fazenda, A.L. ForestEyes Project-Citizen Science and Machine Learning to detect deforested areas in tropical forests. In Proceedings of the XXXIV Conference on Graphics, Patterns and Images (SIBGRAPI), SBC, Porto Alegre, RS, Brazil, 18–22 October 2021; pp. 14–20. [Google Scholar]
Dallaqua, F.B.J.R.; Fazenda, Á.L.; Faria, F.A. ForestEyes project: Can citizen scientists help rainforests? In Proceedings of the 15th International Conference on eScience (eScience), San Diego, CA, USA, 24–27 September 2019; pp. 18–27. [Google Scholar]
Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 2115–2118. [Google Scholar]
Caye Daudt, R.; Le Saux, B.; Boulch, A. Fully Convolutional Siamese Networks for Change Detection. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
Bazi, Y.; Melgani, F. Convolutional SVM Networks for Object Detection in UAV Imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3107–3118. [Google Scholar] [CrossRef]
Kayapo. Kayapo Project; International Conservation Fund of Canada (ICFC): 11 June 2024. Available online: https://kayapo.org/territory/ (accessed on 3 September 2024).
IBGE. Legal Amazon. 2014. Available online: https://www.ibge.gov.br/en/geosciences/maps/regional-maps/17927-legal-amazon.html?edicao=18047 (accessed on 28 August 2024).
Imazon. Protected Areas in the Brazilian Amazon: Challenges and Opportunities. 2011. Available online: https://imazon.org.br/en/publicacoes/protected-areas-in-the-brazilian-amazon-challenges-opportunities-2 (accessed on 28 August 2024).
Ricketts, T.H.; Soares-Filho, B.; da Fonseca, G.A.; Nepstad, D.; Pfaff, A.; Petsonk, A.; Anderson, A.; Boucher, D.; Cattaneo, A.; Conte, M.; et al. Indigenous lands, protected areas, and slowing climate change. PLoS Biol. 2010, 8, e1000331. [Google Scholar] [CrossRef]
Kauano, É.E.; Silva, J.M.; Michalski, F. Illegal use of natural resources in federal protected areas of the Brazilian Amazon. PeerJ 2017, 5, e3902. [Google Scholar] [CrossRef]
Albert, B.; de Robert, P.; Laques, A.É.; Le Tourneau, F.M. From Amerindian territorialities to indigenous lands in the Brazilian Amazon: The Yanomami and Kayapó cases. In Protected Areas, Sustainable Land? Routledge: London, UK, 2016; pp. 123–142. [Google Scholar]
Silva, V.V.d.; Silva, R.G.d.C. Amazon, Frontier and Protected Areas: Dialectic between economic expansion and nature conservation. Ambiente Soc. 2022, 25, e02241. [Google Scholar] [CrossRef]
Yang, L.; Driscol, J.; Sarigai, S.; Wu, Q.; Chen, H.; Lippitt, C.D. Google Earth Engine and Artificial Intelligence (AI): A Comprehensive Review. Remote Sens. 2022, 14, 3253. [Google Scholar] [CrossRef]
Roy, D.P.; Wulder, M.A.; Lovel, T.R.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Helder, D.; Irons, J.R.; Johnson, D.M.; Kennedy, R.; et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ. 2014, 145, 154–172. [Google Scholar] [CrossRef]
Gao, B.C. Normalized difference water index for remote sensing of vegetation liquid water from space. In Proceedings of the Imaging Spectrometry, Orlando, FL, USA, 17–21 April 1995; Volume 2480, pp. 225–236. [Google Scholar]
Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
Jin, C.; Kim, M.; Kim, C.; Lee, Y.; Lee, K.D.; Ryu, J.H.; Choi, C. Accuracy evaluation of reflectance, normalized difference vegetation index, and normalized difference water index using corrected unmanned aerial vehicle multispectral images by bidirectional reflectance distribution function and solar irradiance. J. Appl. Remote Sens. 2023, 17, 044512. [Google Scholar] [CrossRef]
Gandhi, G.M.; Parthiban, S.; Thummalu, N.; Christy, A. Ndvi: Vegetation Change Detection Using Remote Sensing and Gis—A Case Study of Vellore District. Procedia Comput. Sci. 2015, 57, 1199–1210. [Google Scholar] [CrossRef]
Negri, R.G.; Frery, A.C.; Casaca, W.; Azevedo, S.; Dias, M.A.; Silva, E.A.; Alcântara, E.H. Spectral-Spatial-Aware Unsupervised Change Detection With Stochastic Distances and Support Vector Machines. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2863–2876. [Google Scholar] [CrossRef]
Decuyper, M.; Chávez, R.O.; Lohbeck, M.; Lastra, J.A.; Tsendbazar, N.; Hackländer, J.; Herold, M.; Vågen, T.G. Continuous monitoring of forest change dynamics with satellite time series. Remote Sens. Environ. 2022, 269, 112829. [Google Scholar] [CrossRef]
Al Farizi, W.S.; Hidayah, I.; Rizal, M.N. Isolation Forest Based Anomaly Detection: A Systematic Literature Review. In Proceedings of the 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), Semarang, Indonesia, 23–24 September 2021; pp. 118–122. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.H. ACM Transactions on Knowledge Discovery from Data. Isol.-Based Anom. Detect. 2012, 6, 1–39. [Google Scholar]
Boswell, D. Introduction to Support Vector Machines. 2002. Available online: https://home.work.caltech.edu/~boswell/IntroToSVM.pdf (accessed on 21 November 2023).
Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Jiang, W.; He, G.; Long, T.; Ni, Y.; Liu, H.; Peng, Y.; Lv, K.; Wang, G. Multilayer perceptron neural network for surface water extraction in Landsat 8 OLI satellite images. Remote Sens. 2018, 10, 755. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks and Learning Machines; Pearson Education: London, UK, 2009. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Shekar, B.; Dagnew, G. Grid search-based hyperparameter tuning and classification of microarray cancer data. In Proceedings of the International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 25–28 February 2019; pp. 1–8. [Google Scholar]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Open Source Computer Vision Library. OpenCV. 2024. Available online: https://opencv.org (accessed on 30 August 2024).
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems; Tensorflow: Mountain View, CA, USA, 2015; Available online: https://www.tensorflow.org (accessed on 1 November 2023).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Reese, M.; Dasgupta, A.; Waske, B. Farmland quality assessment using deep learning and UAVs. Remote Sens. Appl. Soc. Environ. 2024, 35, 101235. [Google Scholar] [CrossRef]
Guirado, E.; Tabik, S.; Alcaraz-Segura, D.; Cabello, J.; Herrera, F. Deep-learning versus OBIA for scattered shrub detection with Google earth imagery: Ziziphus Lotus as case study. Remote Sens. 2017, 9, 1220. [Google Scholar] [CrossRef]
Veras, H.F.P.; Ferreira, M.P.; da Cunha Neto, E.M.; Figueiredo, E.O.; Dalla Corte, A.P.; Sanquetta, C.R. Fusing multi-season UAS images with convolutional neural networks to map tree species in Amazonian forests. Ecol. Inform. 2022, 71, 101815. [Google Scholar] [CrossRef]
Shafaey, M.A.; Salem, M.A.M.; Ebeid, H.; Al-Berry, M.; Tolba, M.F. Comparison of CNNs for remote sensing scene classification. In Proceedings of the 13th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 18–19 December 2018; pp. 27–32. [Google Scholar]

Figure 1. Flowchart of the proposed methodology.

Figure 2. Study area delimitation.

Figure 3. Images acquired using the Landsat-8 satellite from specific parts of the study area. (a) São Félix do Xingu urban region, and (b) Kayapó indigenous land. Representations in natural color composition.

Figure 4. (a) NDVWI-based water mask, (b) non-cloud mask from FMask algorithm, and (c) NDVI after water detection and cloud removal.

Figure 5. ROC curves for classifying images with or without deforestation using different anomaly detection and neural network models.

Figure 6. Deforestation (in purple) captured by the anomaly detection and neural network methods. (a) Input, (b) IF, (c) OC-SVM, (d) MLP and (e) CNN.

Figure 7. Deforestation (in purple) in indigenous park captured by the anomaly detection and neural network methods. (a) Input, (b) IF, (c) OC-SVM, (d) MLP and (e) CNN.

Table 1. Optimal hyperparameters and their search spaces for IF and OC-SVM models.

Model	Parameter	Description	Tuning Universe	Optimal
IF	n_estimators	Number of trees	100, 150, 200, 250, 300	200
	max_samples	Proportion of samples for each tree	0.5, 0.75, 1.0, 1.25, 1.5	1.0
OC-SVM	K	Kernel function	linear, poly, rbf	rbf
	$ν$	Training error margin parametrization	0.01, 0.05, 0.1, 0.2	0.05
	$γ$	Kernel coefficient for rbf	scale, auto	scale

Table 2. Optimal hyperparameters and their search spaces for MLP and CNN models.

Model	Parameter	Description	Tuning Universe	Optimal
MLP	n_layers	Number of layers	2, 3, 4	3
	neuron_layer	Neurons per layer	64, 128, 256	128
	$f_{a}$	Activation function	relu, tanh	relu
	$η$	Learning rate	0.001, 0.01, 0.1	0.001
	batch	Batch size	32, 64, 128	32
	epochs	Number of epochs	50, 100, 200	100
CNN	n_conv	Number of conv layers	2, 3, 4	2
	filter_layer	Filters per layer	32, 64, 128	32
	kernel	Kernel size	(3, 3), (5, 5)	(3, 3)
	pooling	Pooling size	(2, 2), (3, 3)	(2, 2)
	$f_{a}$	Activation function	relu, tanh	relu
	$η$	Learning rate	0.001, 0.01, 0.1	0.001
	batch	Batch size	32, 64, 128	32
	epochs	Number of epochs	50, 100, 200	100

Table 3. IF confusion matrix.

Actual/Predicted	Negative	Positive
Negative	TN	FP
Negative	6.45%	2.90%
Positive	FN	TP
Positive	15.49%	75.16%

Table 4. OC-SVM confusion matrix.

Actual/Predicted	Negative	Positive
Negative	TN	FP
Negative	9.36%	0.00%
Positive	FN	TP
Positive	12.90%	77.74%

Table 5. MLP confusion matrix.

Actual/Prediction	Negative	Positive
Negative	TN	FP
Negative	8.70%	0.65%
Positive	FN	TP
Positive	1.62%	89.03%

Table 6. CNN confusion matrix.

Actual/Prediction	Negative	Positive
Negative	TN	FP
Negative	9.03%	0.32%
Positive	FN	TP
Positive	0.32%	90.33%

Table 7. Quantitative evaluation metrics for all trained classifiers.

Classifier	Accuracy	Specificity	Sensitivity	Precision	F1-Score
IF	0.8161	0.6896	0.8292	0.9628	0.8910
OC-SVM	0.8710	1.0000	0.8576	1.0000	0.9233
MLP	0.9774	0.9310	0.9822	0.9928	0.9874
CNN	0.9935	0.9655	0.9964	0.9964	0.9964

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rodrigues, J.; Dias, M.A.; Negri, R.; Hussain, S.M.; Casaca, W. A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands. Land 2024, 13, 1427. https://doi.org/10.3390/land13091427

AMA Style

Rodrigues J, Dias MA, Negri R, Hussain SM, Casaca W. A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands. Land. 2024; 13(9):1427. https://doi.org/10.3390/land13091427

Chicago/Turabian Style

Rodrigues, Julia, Mauricio Araújo Dias, Rogério Negri, Sardar Muhammad Hussain, and Wallace Casaca. 2024. "A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands" Land 13, no. 9: 1427. https://doi.org/10.3390/land13091427

APA Style

Rodrigues, J., Dias, M. A., Negri, R., Hussain, S. M., & Casaca, W. (2024). A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands. Land, 13(9), 1427. https://doi.org/10.3390/land13091427

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Dual-Mode Machine Learning Framework for Classifying Deforestation Patterns in Amazon Native Lands

Abstract

1. Introduction

2. Protected Areas and Indigenous Peoples in the Brazilian Legal Amazon

3. Materials and Methods

3.1. Study Area and Data Acquisition

3.2. Data Curation and Preprocessing

3.2.1. Water Mask Detection

3.2.2. Cloud Mask Detection

3.2.3. NDVI Composition Computation

3.3. Anomaly Detection and Neural Network-Based Classification Models

3.3.1. Preliminary Notations and Background

3.3.2. Isolation Forest

3.3.3. One-Class Support Vector Machines

3.3.4. Multilayer Perceptron Networks

3.3.5. Convolutional Neural Networks

3.3.6. Hyperparameters Tuning and Calibration

3.3.7. Training, Testing and Computational Aspects

4. Results and Discussion

4.1. Quantitative Assessments

4.2. Qualitative Assessments

4.2.1. Visual Evaluation of Classification in Multiple Amazon Rainforest Regions

4.2.2. Visual Evaluation of Classification in the Amazon Indigenous Park of Kayapó

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI