2.3.1. Habitat Quality Assessment
In the InVEST model’s Habitat Quality module, environmental stressors and habitat suitability are established based on expert input and literature guidelines. Stressors refer to land use types that adversely impact habitat suitability, with each stressor assigned specific influence ranges and weights based on assessed risk levels. Common stressors include developed land types such as urban areas, rural settlements, and other constructed zones. Additionally, some researchers also include cultivated land in highly urbanized areas, such as paddy fields and dry farmland, within the scope of stressors [
45]. The model evaluates the spatial distribution and intensity of these pressures to assess their impact on habitat quality, providing a comprehensive measure of habitat degradation caused by human activities. The greater the intensity of human activities and the closer the proximity to the habitat, the greater the threat posed to habitat quality. In other words, the more extensive and concentrated human interventions in the surrounding landscape, the more likely it is that habitat quality will be compromised. This reflects the critical role that anthropogenic pressure plays in shaping habitat degradation [
46]. Thus, habitat suitability scores are distributed not only based on land use types but also on the spatial proximity to human activities, reflecting the cumulative impacts of anthropogenic pressures on habitat quality. Habitat suitability scores in the InVEST model are assigned according to land use types and represent the capacity of a land unit to support species based on ecological conditions. These scores, which range from 0 to 1, are used to quantify how suitable a habitat is for maintaining biodiversity, with a score of ‘1’ indicating optimal conditions for species to thrive. The suitability is determined by factors such as land cover type, vegetation, and other landscape characteristics that influence species’ ecological requirements, including food, shelter, and breeding sites. Higher scores indicate areas that meet the essential needs of species, such as undisturbed natural habitats, while lower scores correspond to habitats that are either degraded or heavily influenced by human activities, such as urban or agricultural areas. Finally, the habitat quality index is calculated by integrating habitat suitability and the impact of stressors. The habitat quality index is calculated through the following formulas:
represents the influence of stressor
on grid cell
from grid cell
is the linear distance between grid cells
, and
is the maximum effective range of stressor
represents the level of stress experienced by grid cell
for LUCC type
is the number of stressors,
is the total number of grid cells affected by stressor
refers to all the grid cells in the stressor map,
is the weight of stressor
represents the number of stressors affecting each grid cell,
represents the sensitivity of land cover type
to stressor
, and
indicates the habitat’s resistance to stress.
represents the habitat quality of grid cell
for land cover type
is the habitat suitability of LUCC type
is the half-saturation constant, and
is a default parameter in the model.
Habitat quality is typically classified into four categories [
47]: high-quality habitat, medium-quality habitat, low-quality habitat, and poor-quality habitat. These reflect the degree to which a habitat meets the ecological requirements of species. High-quality habitats are those that provide abundant resources, such as food, water, and shelter, fully supporting species’ life history needs, resulting in high biodiversity and reproductive success. Medium-quality habitats, while still supportive of species survival, may have some limitations, such as suboptimal breeding sites or environmental instability, which can constrain reproductive success. Low-quality habitats offer basic resources but are often impacted by factors like habitat fragmentation, pollution, or human disturbance, which reduce their ability to support species’ growth and reproduction. Finally, poor-quality habitats are severely degraded environments that lack essential resources, where species struggle to survive and reproduce. This study assigns values to stressors and the sensitivity of habitats to these stressors based on the InVEST model’s user guide and previous research [
48] (see
Supplementary Materials, Tables S3 and S4). The analysis results are categorized into four levels, using the quartile method to set the thresholds.
The global Moran’s I index was employed to assess spatial autocorrelation in habitat quality across three time points: 2010, 2015, and 2020. Moran’s I is a widely used measure of spatial dependence that quantifies the extent to which similar habitat quality values are spatially clustered. Habitat quality is often unevenly distributed across landscapes, and spatial autocorrelation is critical in understanding how changes in habitat quality may be spatially related [
49]. By applying Moran’s I, this study evaluates the spatial dynamics of habitat quality changes in the context of land use and environmental changes across the three years. It allows for the identification of areas where habitat quality has improved or deteriorated in a spatially dependent manner, which is crucial for understanding broader ecological trends. The formula for calculating global Moran’s I is as follows [
is the total number of spatial units;
are the habitat quality values at locations
, respectively;
is the mean habitat quality value across all locations.
is the spatial weight between locations
, defined as 1 if
are neighbors, and 0 otherwise.
2.3.2. BNs Construction Method
We employed a BNs model to represent and analyze uncertain relationships among ecological variables. The model uses Bayes’ theorem to assess the causal impact of explanatory variables on habitat quality [
51]. Although BNs rely on parameters, specifically CPTs, they are often described as ‘non-parametric’ due to their flexibility in structure and the ability to model a wide range of probability distributions without assuming a specific parametric form for the data. The Bayes’ theorem is expressed as follows:
is the conditional probability of event
given that event
has occurred.
is the joint probability of events
, meaning the probability that both events occur together.
is the prior probability of event
In a BNs model, nodes represent random variables, while arcs denote dependencies between variables quantified by conditional probability tables (CPTs) [
52]. Network parameters are the conditional probability of the network nodes and describe either causal or random relationships between nodes, which can be represented by conditional probability tables [
53], such as the probability table for node
represents the joint probability that each variable takes a specific value. Joint probability is the basis for solving other probabilities and it is important for Bayesian networks [
54]; the calculation formula is as follows:
BNs modeling software (Netica 7.01) options include the Bayes Net Toolbox (BNT), Netica, Hugin, BayesBuilder, Genie, and JavaBayes. This study uses Netica, developed by Norsys, for its intuitive interface and user-friendly platform, making it more suitable than BNT for this analysis [
55]. Additionally, Netica provides sensitivity and diagnostic analysis features to assess the strength of causal relationships between variables [
The construction of a BNs model involves two key steps: defining the model structure and learning node parameters. These capture both quantitative and qualitative relationships between variables. While traditional machine learning algorithms, such as three-phase analysis and SGS, can establish statistical connections between variables [
57], they fail to identify true causal relationships or the direction of influence [
38]. Thus, this study employs an expert-knowledge-based approach to construct the BN structure, focusing on causal relationships between factors.
Parameter learning in a BNs model involves determining the conditional probability distributions between nodes, assuming the network structure is known. When data sre complete, maximum likelihood estimation (MLE) or Bayesian methods can be applied. This study uses MLE for parameter learning. Sensitivity and diagnostic analyses are then employed to assess relationships between factors [
58]. Sensitivity analysis evaluates the influence of input variables on target variables by observing changes in response probabilities, expressed as variance reduction—larger values indicate greater influence [
59]. Diagnostic analysis, using backward reasoning, examines probability changes under a given outcome, with larger changes indicating stronger effects on the target variable [
60]. In this study, Netica is used to perform sensitivity and diagnostic analyses on the BNs model.
represents variance reduction;
represents the variance of variable
represents the variance of variable
given variable
represents the state of the output variable.
To ensure the accuracy and practicality of model learning, we first randomly generate a sample dataset by selecting 25% of the total area within the city boundary using GIS spatial overlay techniques. The variables within this dataset are then discretized into two to four levels (see
Supplementary Materials, Table S4). Integrated with GIS, geospatial data are used to generate cases, where the attribute information of variables serves as the data source for case learning. In this study, all risk variables are characterized for each assessment unit (30 m × 30 m), with spatial data representing evidence within the variable relationships. All cases across the study area collectively form the evidence for case learning. The training of the BN model involves populating the CPTs based on each piece of evidence. Using Netica, a case is generated in each run, and the CPTs are trained from these cases through learning algorithms. Subsequently, the model is constructed based on expert knowledge (
Figure 2), and its structure is adjusted according to relevant research [
61]. During the model training phase, the dataset, which incorporates the results from the InVEST model analysis, is used along with other variables to build the CPTs and perform parameter learning and analysis.
To validate the model, we first evaluate its performance using three key metrics integrated into the Netica: Logarithmic loss, Quadratic loss, and Spherical payoff. Logarithmic loss measures the accuracy of probabilistic predictions by comparing predicted probabilities to actual outcomes. A lower value indicates better predictive accuracy. Quadratic loss calculates the squared difference between predicted and actual outcomes. A lower value suggests better alignment between the model’s predictions and the true values. Spherical payoff assesses how well the predicted probability distribution matches the true distribution. A higher value (closer to 1) indicates a stronger match and better model performance.
Following the evaluation with Netica, we further assess the model’s robustness using the 10-fold cross-validation method [
14]. In this method, the entire dataset is randomly divided into 10 subsets. The model is trained using k-1 subsets, with the remaining subset used as the test set to evaluate the model’s performance [
62]. This process is repeated k times, each time using a different subset for testing, which helps assess the model’s robustness and prevent overfitting. The performance of the model in each fold is evaluated using the error rate (
), calculated based on the confusion matrix generated by Netica. The lower the error rate, the better the model’s performance. The error rate is computed as follows:
is the case numbers that predict values inconsistent with the actual values, while
represents the case numbers that predict values consistent with the actual values.
Ecological sensitivity variables within the network include DEM, slope, relief, and proximity to water, which are categorized using the natural breaks classification method. These four variables are defined as key ecological sensitivity parameters because they play a crucial role in shaping habitat quality, species distribution, and ecosystem functioning. The DEM is essential for understanding the physical structure of landscapes and its influence on hydrological and climatic conditions. Variations in elevation directly affect water flow, temperature, and soil characteristics, all of which contribute to the suitability of habitats for various species. Slope, as an indicator of terrain steepness, influences factors such as soil stability, water runoff, and vegetation distribution, which are critical for assessing ecosystem resilience and the potential for erosion. Relief, representing the variation in surface elevation, further adds to ecological complexity and is often linked to higher biodiversity, as more varied terrain can support a wider range of species and ecological niches. Lastly, proximity to water bodies is a key determinant of habitat quality, as water sources are essential for the survival and reproduction of many species. Water bodies also regulate microclimatic conditions, fostering biodiversity in surrounding areas. Together, these variables interact to shape ecological sensitivity, providing valuable insights for habitat conservation and resource management.
The spatial characteristics of habitat degradation and restoration are represented by ecological sensitivity variables that account for natural factors, as well as by the variable indicating the distance to built-up areas, which reflects human-induced impacts. Habitat degradation is measured by the change in habitat quality between two time points, as determined by the InVEST model. Based on the degree of degradation, data from the 2010–2015 period are categorized into four levels: undegraded areas and three levels of degradation (1, 2, and 3). This allows for a more detailed, quantitative distinction of habitat degradation and restoration. In contrast, for the 2015–2020 period, the data are initially classified into four levels as well, following a similar classification logic. However, for the purpose of assessing habitat restoration dynamics, the data are then summarized into two categories: degraded and undegraded areas. This two-category classification is primarily used for qualitative analysis to identify the spatial relationships between degradation and restoration across the two time periods. Specifically, it aims to reveal critical transitions, such as previously undegraded or restored areas that have suddenly degraded, or areas that were degraded but show signs of improvement, as well as areas that continue to degrade. This approach allows for a clearer understanding of the temporal shifts in habitat quality and the patterns of ecological change over time. Lastly, long-term high-quality stable areas (LTSAs) are defined as regions that have maintained medium–high habitat quality levels consistently throughout the 2010–2020 period. LTSAs, which are resilient to both natural and anthropogenic pressures, are prioritized for urban biodiversity conservation.