1. Introduction
Climate change refers to long-term changes in temperature and weather patterns. Historically, these changes were due to natural phenomena. However, human activities, particularly the increase in greenhouse gas emissions, have driven these changes since the eighteenth century. Forest management is a crucial strategy for reducing greenhouse gas emissions and mitigating the impacts of climate change. This study integrates the Random Forest and Markov chain models to develop a high-precision land use prediction model.
The Random Forest model is a powerful ensemble learning method that constructs multiple decision trees during training and provides robust performance for grading tasks. It is particularly effective in managing large datasets with numerous variables, making it suitable for land use classification. On the other hand, the Markov chain model describes a sequence of possible events where the probability of each event depends solely on the state attained in the previous event. Combining these two models enables the analysis of spatial and temporal land use dynamics, providing substantial scientific support for sustainable land management policies.
We developed this model by analyzing several existing studies. For instance, Talukdar et al. [
1] examined six machine learning algorithms, including Random Forest, for land use classification. Their study concluded that the Random Forest algorithm demonstrated the highest accuracy among those assessed, indicating its effectiveness in handling complex datasets and improving prediction accuracy, which makes it an ideal choice for land use analysis. Jalayer et al. [
2] investigated spatial and temporal changes in land use and land cover within the Chalus watershed in Iran, utilizing multitemporal Landsat satellite imagery. Their research employed the support vector machine algorithm and the Markov chain model to predict land use and land cover maps for 2021 and 2040. They highlighted the significance of considering temporal dynamics in forecasting models, which provides a more comprehensive and accurate view of land use changes.
Amin et al. [
3] evaluated the effectiveness of various machine learning algorithms for land use classification in complex mountainous regions, such as Gilgit-Baltistan in Pakistan. Using high-resolution satellite imagery from Sentinel-2 and Google Earth Engine, they assessed algorithms such as CART, MaxEnt, minDistance, SVM, and Random Forest (RF). Their results showed that the RF algorithm achieved the best performance, with an overall accuracy of 79%. The final RF-based mapping revealed the following percentage distribution: arid lands (46.7%), snow cover (22.9%), glaciers (7.9%), grassland (7.2%), water (4.7%), wetlands (2.9%), built-up areas (2.7%), agriculture (1.9%), and forests (1.2%).
Marey et al. [
4] experimented with an integrated model combining patch-generating land use simulation (PLUS) with Markov chain analysis (MC) to simulate changes in land use and land cover (LULC) for Montreal Island, Canada. Their findings indicated that sustainable policies favor more contiguous green spaces, enhancing ecological connectivity. In contrast, industrial policies promote the concentration of commercial and industrial areas, often at the expense of green spaces. This model reported an accuracy value of 0.970, underlining its potential as a decision support tool in urban planning, facilitating scenario-driven exploration of LULC dynamics with high spatial accuracy.
Ibrahim [
5] also employed a Random Forest approach to select features from Sentinel-1, Sentinel-2, and Shuttle Radar Topographic Mission (SRTM) data. The results indicated that this combination improved land use classification accuracy, demonstrating the importance of leveraging the synergy between optical, radar, and elevation data. Finally, Liping et al. [
6] predicted future land use trends for 2025 and 2036 in Jiangle County based on dynamic changes in land use patterns using remote sensing and geographic information systems. The predictive model they used was CA-Markov.
2. Role of Forests in Climate Change Mitigation
Mediterranean forests, characterized by their unique climate and specialized flora, are crucial for mitigating climate change and protecting soil. These forests capture carbon dioxide (CO
2), regulate temperatures, and influence rainfall patterns. They also stabilize soil, maintain groundwater levels, and support biodiversity, enhancing ecosystems [
7,
8,
9,
10,
11,
12,
13,
14]. Forests are responsible for sequestering approximately 33% of human-induced carbon emissions; however, increasing tree mortality raises concerns about their long-term sustainability [
15,
16,
17].
Both climate change and human activities threaten the resilience of forests, leading to a rise in wildfires and storms. A study titled “Emerging Signals of Declining Forest Resilience under Climate Change” [
18] reveals that increased natural and anthropogenic factors can undermine forests’ capacity to adapt. Using satellite data and machine learning, researchers discovered that 23% of natural forests are nearing a critical resilience threshold [
19]. Tropical and temperate forests show declining resilience due to climate variability and diminished water resources, while boreal forests exhibit greater resilience due to warming and CO
2 fertilization.
Global and European climate goals aim to reduce emissions and achieve climate neutrality by 2050 [
19,
20]. Italy is enacting various strategies to facilitate decarbonization, including the Integrated National Energy and Climate Plan (PNIEC) and a dedicated climate law. Furthermore, the National Strategy for Adaptation to Climate Change emphasizes resilience and adaptation initiatives [
21,
22,
23,
24]. The EU aims to reduce net greenhouse gas emissions by at least 55% by 2030 and improve the net carbon sink [
25].
Urban planning significantly contributes to climate mitigation by fostering the creation of green spaces, energy-efficient structures, and ecological corridors. Establishing parks and designing energy-efficient buildings that utilize renewable energy resources can decrease reliance on fossil fuels, helping combat climate change. Urban green spaces and ecological corridors enhance the quality of urban life, conserve biodiversity [
26], and mitigate urban heat islands (UHIs) through shading and transpiration. Car traffic, industrial emissions, and air conditioning contribute to rising temperatures. Green corridors and continuous green spaces can modify the microclimate, improve urban ventilation, conserve energy, enhance public health, manage stormwater effectively, and reduce noise pollution. Thus, urban design and planning are essential for addressing climate change through mitigation and adaptation strategies [
27,
28,
29].
For instance, a study by Noha Hefnawy on the New Administrative Capital of Egypt emphasizes the importance of urban green spaces in mitigating climate change, demonstrating their effectiveness in reducing vulnerability to heatwaves and boosting urban resilience [
30]. Additionally, the essay “Urban Planning Tools for Climate Change Mitigation” explores various urban planning instruments that can be employed to lower carbon emissions and improve the resilience of cities [
31]. In collaboration with other national parks in southern Italy, projects such as “System Action” at Aspromonte Park aim to establish a network of ancient forests to enhance forest management and climate change mitigation potential [
32]. Finally, the FORMIT project investigates how forest management strategies can bolster the mitigation potential of European forests [
33].
The European Commission’s CORINE program, initiated in the 1980s, provides standardized data on land cover and land use across Europe. The CORINE Land Cover dataset, developed in 1990, has become a key element of the European Environment Agency’s Copernicus Land Monitoring Service, offering comprehensive information on land cover and use throughout Europe [
34].
Figure 1 outlines the evolution of CORINE Land Cover.
Figure 2 illustrates a portion of the Aspromonte area, highlighting its rich tree species diversity. On the far right, marked in purple, is the dam’s site on the Menta River, which was under construction at the time. It is important to note that we only have the fourth and fifth levels of detail for specific years in the CORINE Land Cover project, and 2000 is one of those years. The details at the fourth level pertain to class 3 (Forests and Seminatural Areas), while those at the fifth level correspond to class 3.1.3 (Mixed Forest).
To address climate change, greenhouse gas emissions must be reduced, CO
2 must be prevented from being released from ecosystems, and CO
2 uptake must be enhanced (IPCC, 2018) [
35]. Natural solutions, such as maintaining and restoring forests, are crucial in improving carbon storage. The Forest Stewardship Council (FSC) advocates for sustainable forest management. Enhancing ecosystems’ carbon sequestration capacity is vital for mitigating climate change.
This study examines changes in forest use and their impact on carbon storage. Ecosystems, natural processes, and human activities influence land use and land cover change (LULC). Various models, including the Markov model [
36], artificial neural networks (ANNs) [
37], and logistic regression [
38], can predict changes in land use. Integrated models can explain multifactorial impacts [
39,
40,
41,
42]. Sustainable policies and practices are essential for forest conservation [
43].
This study proposes an AI-based solution to analyze land use transitions, specifically focusing on forests.
3. Materials and Methods
The software we used is the open source QGIS in the version 3.34.14, as we indicated in
Section 3. Our approach involved combining two models: the Random Forest (RF) model and the Markov chain model. Previous studies have demonstrated that the RF model is a powerful machine learning classifier for land use studies due to its non-parametric nature, high classification accuracy, and ability to prevent overfitting. In this study, we leveraged the exceptional accuracy of the RF model to establish transformation rules that predict the likelihood of one land use category converting to another based on various driving factors.
By integrating the RF model with the Markov chain model, we were able to evaluate spatiotemporal predictions with good accuracy. This innovative combined model allows us to predict the evolution of land use under different scenarios and offers useful insights for sustainable land management. We utilized the RF model’s capability for land cover and land use classification alongside the Markov chain’s ability to track temporal changes related to both natural and human activities. This integration enabled us to develop a comprehensive model that effectively predicts land use evolution and its impact on carbon storage.
The Random Forest model can handle extensive datasets with multiple features for land cover and land use classification. It is structured as a collection of decision trees, each built from a random subset of the training data. Each tree independently makes predictions based on a separate random subset of the data, and the final prediction is determined by the majority vote of all the trees. This ensemble learning method is illustrated in
Figure 3.
A key feature of the Random Forest model is the randomness in selecting characteristics for each tree. This randomness reduces the correlation among the trees and enhances the overall robustness of the model. The process can be broken down into several stages:
Generating Subsets: For each node in a tree, a random selection of features is derived from the original dataset. This subset is typically much smaller than the total number of available features.
Selecting the Optimal Feature: Among the randomly selected attributes, the feature most effectively distinguishes the data based on a quality metric, such as entropy or the Gini index.
Iterative Procedure: This process is repeated for each node of every tree in the forest, ensuring that each tree is constructed using a distinct set of characteristics.
Before training the model, it is essential to configure specific critical hyperparameters (see
Table 1).
To optimize the parameters of the Random Forest model, we employed a hyperparameter search technique that included five-fold cross-validation. Cross-validation is a reliable method used to evaluate a machine learning model’s performance and ensure its generalizability to unseen data. Our study randomly divided the dataset into five equal-sized subsets, known as “folds”. The model was then trained and validated five times, each using a different fold as a validation set and the remaining four as a training set. This approach guarantees that each data point is used for both training and validation exactly once.
Model performance was assessed during each iteration using metrics such as F1 score and mean squared error (MSE). The F1 score is the harmonic mean of precision and recall, which helps balance false positives and false negatives. The MSE measures the average of the squares of the errors, indicating the model’s accuracy in predictions.
These metrics help evaluate the model’s accuracy and generalization to new data. The results of the five iterations were then averaged to obtain a single estimate of the model’s performance. This method provides significant advantages, as averaging the outcomes of multiple iterations offers a more accurate estimate of model performance and reduces the impact of each train–test split. By training and validating the model on different subsets of data, cross-validation helps prevent overfitting, where the model performs well on training data but poorly on unseen data. Finally, using all available data ensures that each data point is utilized for both training and validation, maximizing the effectiveness of the dataset.
During the cross-validation process, we evaluated different combinations of hyperparameters to find the optimal set. The parameters considered were as outlined in
Table 2.
Each combination was evaluated using five-fold cross-validation, and the combination that yielded the best performance, based on the F1 score and mean squared error (MSE), was selected as the optimal parameter set.
The optimal parameter set identified included 300 trees, a maximum depth of 20, a minimum of 5 samples required to split a node, and a minimum of 2 samples needed to be in a leaf node. This thorough evaluation process ensured that the model was well-optimized and capable of making accurate predictions.
The identified optimal parameters were as outlined in
Table 3.
The total number of iterations conducted while searching for the optimal parameters was 540, corresponding to the number of combinations specified in the grid.
Examining the influence of hyperparameters clarifies the importance of their weight in determining model performance:
- -
n_estimators: This parameter determines the number of trees in the forest. A larger number of trees tends to improve model performance by reducing variance, but it also increases computational costs and training time. In this case, 300 shafts provided a good balance between performance and efficiency, indicating that an optimal point is found where the model is robust enough without being too expensive in terms of resources.
- -
max_depth: The maximum depth of trees controls how complex they can become. Deeper trees can capture more complex patterns, but they also risk overfitting. A max_depth of 20 was optimal in this case, suggesting that the model is able to capture the necessary complexity without overfitting the training data.
- -
min_samples_split: This parameter specifies the minimum number of samples required to split an internal node. Higher values prevent the model from learning patterns that are too specific, thus reducing overfitting. A value of 5 is optimal, indicating that the model is generalized enough to avoid over-fitting the training data.
- -
min_samples_leaf: This parameter determines the minimum number of samples required to be in a leaf node. Setting this parameter helps simplify the model, especially in regression tasks. A value of 2 was optimal, ensuring that the model is not too sensitive to noise in the data.
The Markov chain is a valuable modeling tool that helps us understand how ecosystems change over time due to environmental and anthropogenic influences. Each state in the model corresponds to a category of land use, and the transition probability matrix is a crucial element that defines the likelihood of transitioning from one state to another. This understanding is essential for predicting changes in land allocation.
The model allows for the simulation of various scenarios regarding territorial shifts, each of which provides unique insights into the system’s evolution. The transition matrix, fundamental to Markov chains, specifies the probabilities of moving from one state to another within a given time step, offering a clear and comprehensive understanding of the system’s temporal dynamics.
Each element of the Pij matrix represents the transition from state (i) to state (j). The total probability in each row equals 1. The formula for a chain of n states is as follows:
The main feature of a Markov chain is that the probability of transitioning from one state to another depends solely on the current state rather than on the sequence of previous events. This characteristic is known as the “Markov property”.
Markov chains are beneficial for modeling land use changes because they capture temporal dynamics and transition probabilities between different states. This capability is essential for predicting how forest areas might evolve into urban or agricultural zones over time. The key components of a Markov chain include the states, which represent the various categories of land use (e.g., forest, urban, agricultural), the transition probabilities that indicate the likelihood of moving from one state to another within a specified time interval, and the transition matrix, which contains all the probabilities of transitioning between states. However, despite their advantages, Markov chains have significant limitations when managing complex land-use dynamics. The Markov property assumes that a system’s future state is determined solely by its current state without considering the sequence of preceding events. This simplification can create challenges, particularly in modeling intricate systems such as land use change.
The Markov property simplifies the modeling process by focusing only on the current state. While this makes calculations more manageable, it may overlook important historical context that can influence future states. Relying solely on transition probabilities between states may be limiting, especially when these probabilities change over time due to unconsidered external factors. This can result in less accurate predictions in dynamic environments.
Additionally, real-world scenarios often involve complex and multifaceted interactions among different factors. The Markov property may not fully capture these interactions, leading to oversimplified models that miss critical nuances.
To address these limitations, the Markov model can be combined with a Random Forest, which accounts for previous events. The Random Forest’s ability to handle long-term dependencies and consider a broader historical context helps mitigate the limitations of Markov chains. This combination allows for more accurate and reliable forecasts, particularly in complex scenarios where multiple interconnected factors influence dynamics.
In summary, while the Markov property provides a useful framework for modeling certain systems, its limitations in managing long-term dependencies and complex interactions can be effectively addressed by integrating it with models like Random Forest, which offer improved prediction and adaptability.
The integration of Markov chains with Random Forest combines the strengths of both methods to enhance prediction accuracy. The integration process is carried out in the following phases:
Initial Classification with Random Forest: Using Random Forest, a model is created to classify current land use based on explanatory variables, such as land features, socio-economic information, and other geospatial data.
Calculating Transition Probabilities with Markov Chains: Markov chains are employed to calculate the transition probabilities between different land states which are derived from historical data.
Future Prediction: By integrating the results from the Random Forest classification with the transition probabilities obtained from the Markov chains, it becomes possible to predict future changes in land use. This approach enables the creation of forecast maps that illustrate the likely transformations of the
Figure 4 illustrates the developmental stages of the RF–Markov model.
Study Area
Our study area is part of the Aspromonte area, which is in the province of Reggio Calabria in the municipalities of Roccaforte del Greco, San Luca, and Samo.
Figure 5 shows the position of the Aspromonte in Calabria.
The research concentrated on the Calabria Region, encompassing around 15,000 km
2. Approximately 49.2% of its terrain is hilly, 41.8% is mountainous, and only 9% is flat. The Lucanian Apennines, a segment of the Pollino Massif, and the Calabrian Apennines constitute the mountainous regions. The area consists of five provinces: Catanzaro, Cosenza, Reggio Calabria, Crotone, and Vibo Valentia. The population is approximately 2 million individuals. The population density is 129 people per square kilometer. The region possesses a substantial forest cover, comprising around 67% of its territory (
Figure 6 and
Figure 7).
Figure 8 shows our study area in the province of Reggio Calabria, displaying land cover from CORINE Land Cover 2018.
It is a territory that has several reserves and protected areas, some of which overlap. In particular, the Alto Aspromonte State Forest is a State property reserve established in 1911, managed by the Carabinieri Biodiversity Department of Reggio Calabria and extending over 2870 ha (distributed across two distinct areas, the first covering about 900 ha in the countryside of Roccaforte del Greco and the second covering about 1940 ha in the countryside of Samo and about 30 ha in the countryside of San Luca).
There are overlapping protected areas:
Aspromonte National Park (Zone A of Integral Reserve);
ZSC IT9350155 ‘Montalto’ (sup. included in the State Forest, about 117 ha), ZSCIT9350154 ‘Torrente Menta’ (sup. included about 142 ha), ZSC IT9350180 ‘Contrada Scala’ (sup. included about 94 ha), ZSC IT9350157 ‘Torrente Ferraina’ (all included, about 416 ha), and ZSC IT9350178 ‘Serro d’Ustra e Fiumara Butramo’ (sup. Included about 36 ha);
SPA IT9310069 (ZPS) ‘Calabria National Park’.
The Aspromonte State Forest, which covers about 2870 ha, is located in the heart of the Aspromonte National Park, near the summit of Montalto. Acquired by the state during the last century, it was managed to reforest bare land and protect existing wooded areas, preventing hydrogeological instability.
Included in the Calabria National Park in 1968 and the Aspromonte National Park in 1994, the forest is rich in biodiversity and perennial springs. It has several habitats protected by the EU, including those outlined in
Table 4.
In the forest, permanent monitoring areas and botanical surveys are conducted to study its evolution. Among the notable sites are the forest of ‘Mancuso’ in San Luca, the mixed forest of Ferullà, and the Calabrian pine forest near Acatti. The Infernal Valley is a beech forest rich in habitats for mammals and insects.
The diverse flora of the forest includes ancient oaks, larch pines, and various sporadic species such as maple and alder. In spring, the undergrowth is colored with orchids, violets, mountain lilies, and wild carnations. Recent research estimates the presence of about 1800 species and subspecies of flora, highlighting the significant biodiversity of the area.
Aspromonte National Park is engaged in numerous conservation efforts to protect and enhance its natural heritage. These include the creation of integral reserves that ensure the protection of the natural life cycles of trees, making forests more resilient to global change. In addition, the park is part of the UNESCO World Geoparks Network, an initiative that aims to safeguard and enhance the unique geological heritage of the central Mediterranean.
4. Results
Natural and socio-economic factors mainly influence changes in land use. It is essential to accurately identify these driving forces and compile a dataset to simulate and effectively predict future land use changes. This study utilizes hourly measurements of precipitation, temperature, pressure, and humidity collected over the past decade and socio-economic data indicated by population density (number of people per square kilometer). Land use was categorized into three primary classifications: woodland, agricultural land, and urban areas. The data cover the period from 2000 to 2018.
We used Excel for sensitivity analysis, systematically varying the input variables to observe how the model’s results changed. We increased and decreased key variables by 10% and verified that the model’s output corresponded proportionally.
The Random Forest (RF) model was employed to investigate the conversion of woodland into agricultural and urban areas. The predicted changes in land use were subsequently validated using a Markov chain. The model included a preprocessing phase involving data cleaning, normalization, and managing missing information.
We utilized a confusion matrix (
Figure 9) to evaluate the model’s performance and derived the following evaluation measures. The confusion matrix serves as a tabular representation that demonstrates how the model’s predictions align with the actual data.
These data are used to compute several evaluation measures, including precision, recall, F1 score, and accuracy (
Table 5).
Mean Accuracy: This denotes the proportion of accurate predictions relative to the total forecasts. A result of 0.9888 indicates that the model accurately identified 98.88% of the samples.
Mean F1 Score: This represents the harmonic mean of precision and recall. A result of 0.9878 signifies excellent equilibrium between accuracy and recall, accurately categorizing both positive and negative classes.
Mean Recall Score: This quantifies the model’s proficiency in accurately recognizing all positive events. A result of 0.9877 indicates that the model accurately recognized 98.77% of the positive events.
Mean Precision Score: This quantifies the ratio of accurately identified positive instances to the total instances classified as positive. A number of 0.9877 signifies that 98.77% of cases identified as positive are indeed positive.
The ROC/AUC curve was also produced (
Figure 10) to evaluate the reliability of the model.
The graph shows the three ROC curves for the different classes with their respective AUC (area under the curve) values. All curves have an AUC value of 1. This result indicates that the model has a perfect classification capability, justified by the classes’ separability, data quality, and the effectiveness of the classification algorithm used.
Table 6 shows the probabilities of transition from one state to another produced by Markov, which are reproduced in graphic form in
Figure 11.
The graphical representation makes the transition probabilities between the different land use classes immediately visible. For example, forest areas have a very high probability of remaining forested, while urban areas have a slightly lower probability of remaining stable. This representation provides visual support for urban planning and forest management decisions. Decision-makers can use this information to develop evidence-based conservation and sustainable development strategies.
The transition matrix delineates the probabilities of moving from one state to another, as predicted by the model. In this representation, a value of 0 denotes woodland, 1 signifies farming, and 2 represents urban area. Each entry (P{ij}) of the matrix represents the likelihood of transitioning from state (i) to state (j). The subsequent transitions arise from the matrix analysis:
State 0 (Forest) has a probability of 0.99865 of remaining forest, a probability of 0.000712 of transitioning to farmland, and a probability of 0.000712 of evolving into an urban area.
State 1 (Farmland) has a probability of 0.000000 of transitioning to woodland, a probability of 1.000000 of remaining farmland, and a probability of 0.000000 of becoming an urban area.
State 2 (Urban Area) has a probability of 0.011111 of transitioning to woodland, a probability of 0.000000 of converting to farming, and a likelihood of 0.988889 of remaining an urban area.
Consequently, woodlands possess a remarkably high likelihood (99.86%) of persisting as woodlands, with a negligible possibility of being transformed into agricultural or urban zones.
Urban regions possess a 98.89% likelihood of remaining urban, with a minimal risk (1.11%) of transitioning to woodland. Agricultural lands, with a 100% certainty of surviving in that state, demonstrate significant stability.
For greater visibility, a graphical representation of the Markov chain is also provided. Each node represents a state of the Markov chain, and the arcs between the nodes represent the probability of transition from one state to another (
Figure 12).
Quantity Results
This matrix was employed to construct a Markov chain. The Markov chain employs these probabilities to predict a system’s future state based on its present state.
The ultimate probabilities of the state following 15 time steps are as follows:
Woodland: 0.9796
Farmland: 0.0106
Urban: 0.0098
After 15 iterations, there is a 97.96% probability that the system is in the “Woodland” state.
This condition exhibits significant stability and dominance over the long term. The “Farmland” status has a probability of 1.06%, indicating it is less stable than the “Woodland” state. The “Urban Area” designation is less likely than the initial state.
The results reveal that the system stabilizes predominantly in the “Woodland” condition with a significantly high likelihood. Comparing the findings from the two models, it is evident that both concur on “Woodland” as the predominant class and designate “Urban” as the class with the lowest probability. The Markov model demonstrates a greater likelihood for the agricultural class than the random forest model.
The model’s predictions indicate that urban areas have a very high probability (98.89%) of remaining so, with a minimal chance of transition to forest areas (1.11%). This suggests that urban areas are stable and can be further developed with adequate infrastructure without significant land use change risks. The stability of urban areas allows planners to develop strategies to manage urban growth sustainably, avoiding overlap with ecologically sensitive or agricultural areas.
As indicated by the model’s predictions, the stability of urban and agricultural areas allows planners to optimize land use, ensuring that urban areas are developed efficiently and that agricultural areas are protected for food production. Using the model’s predictions, planners can ensure that urban areas are developed efficiently, with adequate infrastructure and public services, without compromising quality of life.
The models can be used to simulate different urban development scenarios, allowing planners to assess the impact of their long-term decisions. For example, they can test how different zoning policies affect population distribution and land use. Simulations can help predict the impact of urban development policies and make informed decisions to promote sustainable development.
Again, the model’s predictions show that forest areas are highly likely (99.86%) to remain forested and have a minimal chance of being transformed into agricultural or urban areas. This suggests that current conservation policies are effective, but it is important to continue monitoring and protecting these areas to maintain their stability. The stability of forest areas indicates that conservation policies must be maintained and strengthened to ensure the long-term protection of forests.
Using the model’s predictions, forest management policies can be geared towards the reforestation of degraded areas. Not only does this help restore ecosystems, but it also contributes to climate change mitigation through carbon absorption. Model predictions can help identify areas needing reforestation and plan reforestation activities effectively.
Therefore, the model’s predictions can inform sustainable forest management practices, ensuring that forest exploitation activities do not compromise the biodiversity and resilience of forest ecosystems. This includes promoting forestry practices that are closer to nature.
In conclusion, the model’s predictions help develop forest management policies that balance the use of forest resources with biodiversity conservation and ecosystem protection.
Due to its capacity to handle substantial volumes of data and variables, the created model can readily adapt to intricate contexts and variables. Its ability to manage both continuous and categorical data renders it appropriate for forecasting alterations in land use. Measuring the impact of variables enables the identification of the key elements driving ecosystem transformation. Moreover, the amalgamation of numerous decision trees mitigates the likelihood of overfitting.
The RF–Markov chain model makes predictions but fails to generate a map. Having identified the areas,
Figure 13 displays a map showing the few forest areas in the GIS most subject to change.
5. Discussion
This study demonstrated the effectiveness of integrating Random Forest and Markov chain models for predicting changes in forest status in urban and agricultural areas. The results highlight the high precision of the combined model, which achieved an average accuracy of 98.88% and an average F1 score of 98.78%. These metrics indicate an excellent balance between precision and recall.
The transition probabilities calculated using Markov chains revealed that forests have a 99.86% chance of remaining forests, while urban areas have a 98.89% chance of remaining urban. This suggests remarkable long-term stability for forests, with a 97.96% chance of remaining as forests after 15 iterations.
Integrating the two models allows for capturing both temporal dynamics and explanatory variables, providing robust and accurate prediction of land use changes. This combined approach is particularly beneficial for urban planning and sustainable land management, as it helps identify areas at risk of deforestation or urbanization and enables the development of strategies to mitigate these impacts.
The use of the Random Forest algorithm in predicting changes in forest status is well-supported by numerous studies. For instance, Amini et al. [
44] explored using the Random Forest algorithm to analyze land use and urban land cover changes using temporal data from Landsat. They implemented classification and change analysis on the Google Earth Engine cloud platform, utilizing satellite imagery from Landsat 5, 7, and 8 from 1985 to 2019. Additionally, another study applied Random Forest to identify areas at risk of deforestation, effectively pinpointing the most vulnerable locations and aiding in planning conservation measures [
45]. Lastly, Dalmonech et al. [
46,
47] used Random Forest to model the impact of climate change on forests, predicting how different tree species might respond to various future climate scenarios. These examples demonstrate the power of Random Forest as a forecasting tool for managing forest resources, justifying its choice in the present study. The combination of Random Forest and Markov chains offers several advantages over other machine learning methods, such as neural networks. While neural networks can be powerful, they often require extensive computational resources and large datasets to achieve high accuracy. For example, a study by Zhang et al. [
48] utilized a deep learning model for land use change prediction, achieving an accuracy of 96.5%. However, the training process was significantly time-consuming and required specialized hardware.
Random Forest models are generally faster to train and can handle smaller datasets more effectively. They also provide better interpretability, allowing for easy assessment of the importance of each variable. This makes Random Forests particularly suitable for applications where understanding the influence of different factors is crucial.
Comparing different predictive models highlights their strengths and weaknesses, showing how the Random Forest model can address many limitations found in other models. While decision trees are known for their simplicity in interpretation and visualization, they often suffer from overfitting and exhibit high variance. The Random Forest model mitigates these issues by combining multiple decision trees, which reduces variance and improves overall robustness. Support vector machines (SVMs) are effective in high-dimensional spaces and work well with clear separation margins. However, they can be computationally expensive and less interpretable. In contrast, the Random Forest model is generally faster to train and offers greater interpretability by allowing for easy assessment of feature importance. The K-nearest neighbor (KNN) model is simple to implement and does not require a formal training process, but it is sensitive to outliers and can be inefficient with large datasets. On the other hand, Random Forest manages outliers more effectively and scales better with larger datasets. Neural networks are powerful tools capable of modeling complex relationships, but they require large amounts of data and significant computational resources, making them less interpretable. In contrast, the Random Forest model demands less data and fewer computational resources, making it more interpretable.
The Random Forest model offers several advantages over other predictive models, including robustness, accuracy, the ability to handle noisy data, versatility, and the ability to evaluate the importance of different features. These benefits make it a viable choice for providing reliable and interpretable forecasts.
Various studies have demonstrated the applicability of the combined Random Forest and Markov chain model in different regions. For example, Badshah et al. [
49] applied this model to analyze metropolitan urban growth in Beijing, China, achieving an overall accuracy exceeding 90%. Similarly, Liu et al. [
50] employed a similar approach to forecast land use changes in Southeast Asia, showcasing the model’s adaptability to diverse geographical contexts. These studies illustrate that the model can effectively be applied in areas with different socio-economic and environmental conditions. Incorporating local data and adjusting model parameters ensures that predictions remain accurate and relevant across various settings.
In conclusion, the combined Random Forest and Markov chain model represents a powerful and flexible tool for predicting land use change, with significant implications for climate change mitigation and biodiversity conservation. Its capacity to manage large volumes of data and variables and its adaptability to complex contexts make it particularly suitable for analyzing and predicting landscape transformations.
6. Conclusions
The RF–Markov model exhibits high accuracy in predicting land use dynamics, providing essential theoretical support for efforts in forest conservation and climate change mitigation. The results highlight the model’s capability to accurately predict transitions between different land use classes, which is crucial for informed and sustainable management of natural resources.
The model’s predictions indicate a strong likelihood of lasting stability in the “Woodland” ecosystem, suggesting that the Calabria Region possesses significant carbon capture and storage potential. This underscores the importance of sustainable forest management in combating climate change and preserving biodiversity. Additionally, the model can identify areas at risk of conversion, enabling the implementation of targeted conservation measures to prevent the loss of vital forest habitats.
Moreover, the model shows that urban areas have a very high probability (98.89%) of remaining as such, with only a minimal chance of transitioning to forest areas (1.11%). This stability suggests that urban areas can be further developed with appropriate infrastructure without significant risks of land use change. The model’s capacity to predict urban stability allows planners to optimize land use and devise sustainable urban growth strategies.
However, this study does have some limitations that need consideration. The accuracy of the forecasts heavily depends on the quality and quantity of historical data available. Incomplete or inaccurate data can negatively impact the predictions. Additionally, critical variables such as extreme climate change, land management policies, and unplanned human activities may not have been fully incorporated into the model, potentially reducing its accuracy. The assumptions of the Markov model, which state that transitions between states rely solely on the current state and not on previous ones, may not accurately represent the complexities of land use dynamics.
To enhance the model and address these limitations, the following future developments can be considered:
- -
Utilize high-resolution remote sensing data to better capture detailed land use changes;
- -
Extend sensitivity analysis to assess the combined importance of all variables simultaneously;
- -
Apply the model to various geographical regions and environmental contexts to test its generalizability and robustness.
In conclusion, the RF–Markov model is a powerful and reliable tool for predicting land use dynamics, which supports forest conservation and climate change mitigation. By integrating these forecasts into decision-making processes, we can promote sustainable development that balances human needs with environmental protection. This approach ensures a more resilient and sustainable future for generations to come.