1. Introduction
Covering an extensive area of over 8.5 million km
2, Brazil ranks as the fifth-largest nation globally. Due to its continental size, the country is home to six highly biodiverse biomes, including the Amazon Rainforest, Atlantic Forest, Cerrado, Caatinga, Pampas and Pantanal [
1]. Among these, the Amazon biome stands out as the world’s largest tropical forest, widely recognized for its immense biodiversity and great environmental significance globally. Indeed, it plays a crucial role in the global climate, significantly contributing to reducing various ecological issues, including carbon dioxide sequestration, climate regulation, and the distribution of rainfall and air masses [
2].
Despite its importance in regulating the Earth’s climate and ecological systems, the Amazon rainforest has been threatened by harmful human activities, ranging from large wildfires to massive deforestation, primarily driven by cattle ranching and intensive mining [
3,
4]. According to the
Brazilian Amazon Rainforest Monitoring Program by Satellite (PRODES) [
5,
6], the tree cover loss from 1 August 2021 to 31 July 2022 was 11,568 km
2, with most of it occurring in legally protected lands. These lands, divided into indigenous territories and conservation units, represent 44% of the Brazilian Amazon rainforest [
7].
Thanks to the existence of public policies and these reserved areas, deforestation has remained relatively stable in such areas until 2018. However, from 2018 to 2021, the annual percentage rate of gross forest loss in these territories was twice as high as that in non-designated lands, with part of this loss attributed to the weakening of forest policies in Brazil by the federal government from 2019 to 2022 [
8]. It is not a surprise that reserved areas have continued to be under pressure, with many local conflicts involving illegal activities such as slash-and-burn and logging. These anthropic actions have been the precursors to deforestation, preceding the creation of cattle to mark territories [
9], as well as illegal mining camps [
10]. For example, deforestation within Brazilian indigenous territories increased by 129% from 2013 to 2021, with 59% of CO
2 emissions (around 96 million tons) produced during this period being emitted between 2019 to 2021, highlighting the uncontrolled desertification process of the Amazon biome [
11].
Given the recurrent human interference in the Amazon biome, including its law-protected areas, the systematic monitoring of deforestation is indispensable. One way to accomplish this on a large scale is by applying Remote Sensing (RS) technologies, which allow for the continuous tracking of forest degradation. Representative examples include near-real-time detection of tree loss, precise quantification of affected areas, and early identification of deforestation [
12,
13,
14,
15,
16]. Another effective tool is Machine Learning (ML) [
17], as it enables the design of new data-driven methods to identify potential changes in forest zones by analyzing intrinsic features and trends in remotely sensed data [
18]. Neural Networks (NN) [
19,
20] and popular classification algorithms such as Random Forest (RF) [
21] and Support Vector Machines (SVM) [
19,
22] are among the most effective methods used for environmental analysis. In contrast, Anomaly Detection (AD) [
23], although a less utilized class of machine learning techniques, has found applications across various fields, including health sciences [
24], social monitoring [
25] and fault detection [
26]. More recently, AD has also been applied in remote sensing to detect temporal changes on the Earth’s surface [
27], map fires [
28] and monitor algae proliferation [
29].
Over the last decade, prior research has employed ML with fresh RS data taken from multiple sources of government-maintained repositories, such as PRODES [
5] and the
Institute of People and the Environment of the Amazon (IMAZON) [
30]. Camara et al. [
31] assessed deforestation in the Brazilian Amazon from 2008 to 2021, by unifying different government datasets. They found that a significant portion of the forest degradation occurred in private lands, the majority of which was illegal. Das Neves et al. [
32] applied the RF algorithm on public data acquired from IMAZON to analyze the impact of both hidden and official road networks on forest loss in the state of Pará, Brazil, from 1988 to 2018. They discovered that clandestine roads were the most critical contributors to forest exploitation, with official roads following closely behind. Jakimow et al. [
33] also applied an RF-based approach to study the relationship between forest loss and fire occurrences in a large area of the Brazilian Amazon from 2014 to 2020. They highlighted an atypical rise in burned areas and forest loss post-2018, particularly in agrarian settlements, conservation units and medium/large rural properties. Dallaqua et al. [
34] utilized the SVM classifier for detecting deforestation in the Brazilian Legal Amazon. In contrast to previous works, their approach relied on the
ForestEyes Project [
35], which involved volunteers classifying remote sensing images, resulting in annotated data that were taken as the training set for classification. Lastly, deep neural networks have also been successfully used for mapping deforestation. A good representative of this is the work by Adarme et al. [
20], which aimed to compare the performance of three convolutional networks: Early Fusion [
36], Siamese Network [
37] and Convolutional Support Vector Machine [
38].
Despite their immeasurable importance in supporting the Amazon rainforest, most existing works do not assess the impact of deforestation on protected conservation units. Instead, they focus on evaluating vast forest areas, such as country-sized municipalities and even states, as well as privately used lands like pasture and agriculture fields. The lack of research directly correlating the impact of deforestation in Amazon cities and their neighboring areas, including legally protected reserves, also reveals a critical gap in conservation research in the Amazon biome.
To overcome the issues raised above, this paper introduces a trainable and flexible data-driven methodology for mapping deforestation, focusing on the Amazon biome. In more technical terms, the current approach allows for free customization by incorporating and training different classification strategies, specifically those from two machine learning domains: anomaly detection and neural networks. Our methodology takes the Google Earth Engine (GEE) API to obtain fresh and accurate data, enabling the training of both AD- or NN-based models to obtain representative features from remotely sensed data. In addition to integrating the GEE and two branches of ML strategies, a comprehensive analysis of their applicability, generalization, accuracy and tuning aspects is provided, highlighting their strengths and limitations in the context of deforestation classification. Another contribution point is the exploration of recent applications of anomaly detection for remote sensing classification, adapting and evaluating this ML strategy to discriminate deforestation patterns. Unlike existing methods in the specialized literature that usually employ AD to select specific targets or create susceptibility maps for specific environmental incidents [
18], our approach aims to explore the fitting capabilities of anomaly detection for typical machine learning tasks by customizing it for deforestation classification.
To quantitatively validate our approach while still filling the gap in targeted research for low-protected areas, the deforestation problem is explored through a case study of forest loss in the Brazilian municipality of São Félix do Xingu (SFX) and its surrounding areas, including the Kayapó indigenous park. The Kayapó’s conservation unit spans 32,000
, exceeding the size of many countries worldwide such as Armenia and Belgium. For decades, the Brazilian Kayapó people have defended their territory from loggers, miners, farmers and land grabbers. Now, with a newly constructed highway encroaching on their lands [
39], this indigenous community faces such obstacles in keeping its ancestral territory and preserving biodiversity in the Amazon biome.
2. Protected Areas and Indigenous Peoples in the Brazilian Legal Amazon
According to the
Brazilian Institute of Geography and Statistics (IBGE) [
40], the Legal Amazon (LA) is a political designation that includes nine Brazilian states and spans across three biomes: Amazon, Cerrado and Pantanal. In the Brazilian LA, Protected Areas (PAs) are legally defined as clearly marked geographical spaces designated by the government to preserve ecosystems, biodiversity and essential environmental services such as soil conservation, watershed protection, pollination, nutrient recycling and climate regulation. They also uphold the rights and cultural heritage of traditional and indigenous communities that have historically inhabited these areas [
41]. The PAs can be grouped into Conservation Units (CUs) and Indigenous Lands (ILs), which together cover approximately 52% of the forested areas in the Brazilian LA [
8]. Comparisons of deforestation rates reveal that PAs experience deforestation at rates up to ten times lower than those in non-PAs [
42].
Violating restrictions in PAs in the Brazilian Amazon carries serious legal consequences, including substantial fines, imprisonment and the confiscation of tools and equipment used in these illegal activities. Illegal actions such as the suppression of vegetation, illegal fishing and hunting are among the most common infractions in these areas, and enforcement mechanisms are designed to deter such activities by imposing stringent penalties [
43].
In the Brazilian Amazon, indigenous lands are crucial for forest preservation. For example, between 2000 and 2021, these territories, which include PAs, were responsible for just 5% of the total net forest loss, underscoring their critical role in reducing deforestation compared to non-PAs [
8]. Despite this vital importance, the relationship between traditional peoples and PAs has been intricate over the past decades, as it involves the transformation of historically unbounded territories into legally recognized “Indigenous Lands”. However, once these areas were officially designated as PAs by the Brazilian government, the native communities developed forms of “sustainable development” that simultaneously respect their own values and meet external demands for the conservation of the PAs [
44].
Despite their conservation success, PAs are under increasing pressure from extractive industries, and even the government, who seek to convert these areas into zones for economic expansion [
45]. This ongoing challenge emphasizes the need for robust legal protections and policies that prioritize the rights and traditional practices of indigenous peoples while keeping the preservation of the Amazon’s biodiversity.
5. Conclusions and Future Work
This paper presents a robust and adaptable data-driven methodology capable of accommodating either anomaly detection or neural network models within a ML framework for deforestation detection. In particular, the current framework can be applied from various perspectives of machine learning designs, including anomaly detection and neural network-based modeling, thus enhancing the accuracy and generalization capability of deforestation mapping while streamlining the classification process in remote sensing applications. To experimentally validate its use, a comprehensive analysis of anomaly detection methods adapted for deforestation detection was conducted, alongside two popular neural networks, MLP and CNN.
In addition to presenting a comparative evaluation of both formulation strategies for deforestation mapping, our analysis addressed a critical gap by applying and evaluating potential classifiers for detecting deforestation within a conservation unit, such as the Kayapó indigenous park. Spanning 32,000 km2, this area is vital for biodiversity preservation but has faced ongoing threats from activities like agriculture and infrastructure development.
Through an extensive battery of tests involving nearly two thousand remotely sensed images of the Amazon rainforest, it was found that anomaly detection-driven classifiers such as IF and OC-SVM are suitable for detecting deforested areas but exhibited limitations in granularity and scattered false detections. In contrast, the MLP- and CNN-based models, when consistently tuned, outperform others by capturing deforestation signatures with greater precision and continuity, leveraging their capability to learn complex spatial dependencies from the data. For instance, the AUC for the IF and OC-SVM models was 76% and 93%, while MLP and CNN achieved 96% and 98%.
As shown by the experiments, CNN has proven to be effective in detecting deforestation in the Amazon rainforest, including indigenous reserves. Their ability to extract representative features from satellite imagery enables the identification of critical to subtle changes in land cover, which can offer clues for uncovering illegal activities on the ground, such as mining and illicit harvesting. Furthermore, CNNs can be successfully extended to map other native forest regions, as well as to detect other objects-of-interest, ranging from farmland [
67] to plant [
68] and tree [
69] species. However, despite their effectiveness and robustness, CNNs may encounter challenges that warrant attention. For instance, they may dependent on the quality and resolution of the available data, and low-resolution datasets can adversely affect the model’s performance [
70]. Another aspect to consider is the need for large training datasets to ensure the model’s adequate generalization and accuracy.
In conclusion, this study not only copes with the classification task for sensitive ecological areas but also validates the applicability of two advanced categories of machine learning techniques in environmental monitoring. By bridging the gap between anomaly detection and neural networks in terms of comparative performance and generalization capability, our work aimed to contribute to advancing the field of environmental analysis, such as the massive monitoring of the Amazon rainforest.
For future work, further exploration will include the following: (i) the integration of data from multiple satellite sensors to enhance the robustness and accuracy of the deforestation detection task; (ii) the application of other machine learning methods to develop more sophisticated models capable of capturing intricate deforestation patterns across different environmental contexts.