*Communication* **Comparing Global Sentinel-2 Land Cover Maps for Regional Species Distribution Modeling**

**Zander S. Venter 1,\*, Ruben E. Roos 1, Megan S. Nowell 1, Graciela M. Rusch 1, Gunnar M. Kvifte <sup>2</sup> and Markus A. K. Sydenham <sup>1</sup>**

<sup>1</sup> Norwegian Institute for Nature Research, Sognsveien 68, N-0855 Oslo, Norway

<sup>2</sup> Faculty of Biosciences and Aquaculture, Nord University, N-7729 Steinkjer, Norway

**\*** Correspondence: zander.venter@nina.no

**Abstract:** Mapping the spatial and temporal dynamics of species distributions is necessary for biodiversity conservation land-use planning decisions. Recent advances in remote sensing and machine learning have allowed for high-resolution species distribution modeling that can inform landscape-level decision-making. Here we compare the performance of three popular Sentinel-2 (10-m) land cover maps, including dynamic world (DW), European land cover (ELC10), and world cover (WC), in predicting wild bee species richness over southern Norway. The proportion of grassland habitat within 250 m (derived from the land cover maps), along with temperature and distance to sandy soils, were used as predictors in both Bayesian regularized neural network and random forest models. Models using grassland habitat from DW performed best (*RMSE* = 2.8 ± 0.03; average ± standard deviation across models), followed by ELC10 (*RMSE* = 2.85 ± 0.03) and WC (*RMSE* = 2.87 ± 0.02). All satellite-derived maps outperformed a manually mapped Norwegian land cover dataset called AR5 (*RMSE* = 3.02 ± 0.02). When validating the model predictions of bee species richness against citizen science data on solitary bee occurrences using generalized linear models, we found that ELC10 performed best (*AIC* = 2278 ± 4), followed by WC (*AIC* = 2367 ± 3), and DW (*AIC* = 2376 ± 3). While the differences in *RMSE* we observed between models were small, they may be significant when such models are used to prioritize grassland patches within a landscape for conservation subsidies or management policies. Partial dependencies in our models showed that increasing the proportion of grassland habitat is positively associated with wild bee species richness, thereby justifying bee conservation schemes that aim to enhance semi-natural grassland habitat. Our results confirm the utility of satellite-derived land cover maps in supporting high-resolution species distribution modeling and suggest there is scope to monitor changes in species distributions over time given the dense time series provided by products such as DW.

**Keywords:** pollinators; grassland; wild bees; management; conservation; spatial modeling

#### **1. Introduction**

The Anthropocene has heralded an unprecedented loss of biodiversity, primarily due to changes in land use and land cover, climate change, pollution, (over-)exploitation, and biological invasions [1]. In response, governments have established frameworks that address biodiversity loss, including the United Nations' (UN) Sustainable Development Goals (SDGs, 2030 Agenda) as well as the Aichi biodiversity targets (Strategic Plan for Biodiversity 2011–2020) and the Post-2020 Global Biodiversity Framework of the Convention on Biological Diversity. Recently, the UN established the statistical standards for ecosystem accounting (EA) under the System of Environmental–Economic Accounting (SEEA), which require countries to account for changes in ecosystems over time [2]. To achieve the SDGs, meet biodiversity conservation targets, and to account for ecosystem changes, we require monitoring and evaluation tools that are both globally available and locally relevant [3].

**Citation:** Venter, Z.S.; Roos, R.E.; Nowell, M.S.; Rusch, G.M.; Kvifte, G.M.; Sydenham, M.A.K. Comparing Global Sentinel-2 Land Cover Maps for Regional Species Distribution Modeling. *Remote Sens.* **2023**, *15*, 1749. https://doi.org/10.3390/rs15071749

Academic Editors: Matteo Convertino and Jie Li

Received: 31 January 2023 Revised: 16 March 2023 Accepted: 22 March 2023 Published: 24 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

One such tool is species distribution modeling, also known as ecological niche or habitat modeling, which has been widely used to inform decision-making in conservation planning [4].

By modeling and mapping the distribution of species over space and time, we are able to make data-driven decisions about which areas to prioritize for restoration or conservation [5]. The predictive and explanatory power of species distribution models also allows for identifying critical environmental variables (e.g., precipitation, habitat availability) that drive species communities [6]. Insect conservation strategies are a good case in point, whereby mapping and monitoring of insect species richness are prioritized both at the international [7,8] and national levels [9,10]. In Norway, field surveys of bee diversity have been combined with habitat and climate models to create prediction maps that can help determine where wild bee habitat enhancement schemes can be most efficient [11]. However, such priority maps and many species distribution models are currently based on static environmental predictor variables such as land cover maps, whereas management requires more flexible solutions that can detect temporal dynamics in ecosystem conditions and species distributions over time [12].

Recent advances in satellite remote sensing and earth observation have filled data gaps and improved the spatio-temporal transferability of species distribution models [13]. As such, satellite-derived products can capture the environmental processes that underlie the distribution of biodiversity, such as vegetation productivity, water availability, temperature, and perhaps most importantly, land use and land cover. The Sentinel satellites under the Copernicus Programme have been used to produce annually updatable regional and global land cover maps at 10 m resolution, including ELC10 [14], Dynamic World (DW) [15], and World Cover 2020 (WC) [16]. All three products have the capacity to be multi-temporal, but only DW is operationally delivering near real-time land cover maps as new Sentinel-2 scenes become available (every 2–5 days). Due to the novelty of freely available, mediumresolution, high-frequency land cover maps, their use in species distribution modeling is still in its infancy. It is also not clear whether global Sentinel-based land cover maps can replace or improve upon the contribution of regional land cover datasets [17], given that regional maps produced by national mapping agencies through manual methods such as photogrammetry are often more precise and tailored to local conditions.

The aim of this study was to compare Sentinel-based, 10-meter-resolution land cover maps in their ability to predict wild bee species richness distributions across gradients in temperature and habitat availability. To do so, we use a Norwegian land cover dataset called AR5 [18] as a benchmark to evaluate three satellite-based maps, including DW, WC, and ELC10, that are annually updatable. The proportion of semi-natural grassland habitat within 250 m derived from the land cover maps is used as a predictor variable in species distribution models that predict the richness of solitary bee species in southern Norway.

#### **2. Methods**

To compare the utility of Sentinel-based land cover maps in species distribution modeling, we chose wild bee species as a model system. We do this for three main reasons, which are elaborated on below: (1) bees are keystone species in grassy ecosystems globally that are good indicators for ecosystem condition; (2) they are experiencing significant local declines in population numbers; and (3) they have a limited home range and are dependent on local resources for survival, so we expect their distribution to respond strongly to landscape-scale land use gradients in the Sentinel-based land cover maps.

Insects in general and wild bees in particular are key components of many ecosystems and provide important contributions to people [19,20]. For example, the economic value of pollinating insects is considerable, estimated at EUR 153 billion globally [21]. Although pollination is often associated with domestic insects such as the honey bee (*Apis mellifera*), a large and diverse community of wild bee species contributes significantly to the pollination of wild plants and crops [22,23]. Insect abundance, biomass [24], and diversity [25] are declining in some regions due to urbanization, deforestation, climate change, pesticide use, and invasive species [26]. The same is true for bee species [27]. Although the diversity of wild bees is predominantly driven by temperature at global and regional scales, habitat and landscape characteristics are important drivers at local scales [28,29]. Bees are central place foragers that travel back and forth from a nesting site to collect resources [30], and therefore, the abundance and diversity of wild bees correlate to the diversity and abundance of floral resources (vascular plants) in the landscape [28,31]. Land cover types such as semi-natural grasslands, including hay and flower meadows, provide ample floral [32] and non-floral resources to wild bees [33]. Non-floral resources include nesting sites, and although some wild bee species nest in dead wood, the majority use sandy soil sediments as nest sites [34]. Consequently, wild bee diversity can be predicted by the availability of suitable land cover types for resources (i.e., semi-natural grasslands) and for nests (i.e., sandy soils), in combination with climatic variables [29].

#### *2.1. Solitary Bee Surveys*

We surveyed solitary bee communities in 72 traditionally managed (mowed) seminatural grasslands distributed along a climatic gradient from south-eastern Norway to mid-Norway (Figure 1A). In 2019, we sampled 32 semi-natural grasslands in south-eastern Norway [11], adding another 20 study sites in 2020 [29]. To capture potential influences of climatic conditions on solitary bee diversity at regional scales, solitary bee communities were sampled in another set of 20 semi-natural grasslands in mid-Norway in 2021. All surveys were conducted using pan traps, which are an efficient method for surveying wild bee communities [35], in particular when the aim is to survey the solitary as opposed to social bee species [36]. Each pan trap consisted of three white plastic soup bowls, coated with fluorescent yellow or blue, or left white, mounted on a fence pole at the height of the surrounding vegetation [11,29]. We deployed 3 traps per site in 2019 and 2 traps per site for sites sampled in 2020 and 2021. The number of traps per site was reduced from 3 to 2 after 2019 to allow more sites to be sampled. This resulted in 176 samples (the sum of traps across all sites) distributed across 72 study sites. Within sites, traps were always placed at least 20 m apart to avoid inter-trap competition [37].

**Figure 1.** Extent of study area with solitary bee sampling locations and individual survey extents (**A**). Location of study sites in Trøndelag County, sampled in 2021 (**B**). Location of study sites in Oslo, Viken, and Innlandet counties, sampled in 2020 (**C**). Locations of study sites in Oslo, Viken, and Innlandet, sampled in 2019 (**D**).

In all years, sites were sampled in May, June, and July at similar times of day and on days with weather conditions that are optimal for bee activity (i.e., low wind and temperatures above 15 ◦C). For each sampling period, we activated the pan traps at a site by mounting the bowls and filling them with water and a drop of detergent. Sampled bee specimens were collected after 48 h. In 2019, sampling was initiated on 13 May, 21 June, 9 July, and 23 July [11]. In 2020, sampling was initiated on 13 May, 25 May, 14 June, and 16 July [29]. In 2021, sampling began on 29 May, 22 June, and 24 July. Collected bees were stored in 96% laboratory ethanol prior to pinning and identification. Voucher specimens are stored in the entomological collections at the Norwegian Institute for Nature Research. We tallied the number of solitary bee species sampled per trap across the season.

#### *2.2. Land Cover Maps*

Data pre-processing and extraction of land cover maps took place in the Google Earth Engine [38]. The AR5 map obtained from the Norwegian Institute of Bioeconomy Research [18] is provided as a vector map at 1:5000 scale, which we rasterized at 5 m resolution. AR5 is updated every 5–8 years and therefore represents a mosaic of years across the country. WC and ELC10 are available in Google Earth Engine for the years 2020 and 2018, respectively. However, DW is provided as a collection of classified Sentinel-2 images with less than 35% cloud cover. To generate an annual land cover composite for 2020 comparable with WC and ELC10, we calculated the mode predicted land cover class in the image band named "label" across all DW images during June, July, and August. For all land cover maps, we used a focal mean function to calculate the proportion of grassland habitat within 250 m of each pixel in the study area (Figure 1) at 10 m resolution. We used the 250-m radius because solitary bee diversity has been shown to respond strongly to habitat availability at this spatial scale [39]. To isolate grassland pixels, we used the "grassland" class from WC and ELC10 and the "grass" class from DW, which are both defined as areas dominated by natural herbaceous vegetation, including grasslands, prairies, steppes, savannahs, and pastures. For AR5, we used the "innmarksbeite" and "åpent fastmark" classes, which are defined as open ecosystems dominated by herbaceous vegetation and often used for extensive grazing [18]. We used a radius of 250 m as solitary bee species richness has previously been shown to respond to habitat area at this spatial scale [39].

#### *2.3. Modeling*

Data modeling and visualization were performed in R [40]. As in [29], we included predictor variables related to climatic conditions, habitat availability, and distances to high-quality nesting substrates for below ground nesting bees, which account for the vast majority of solitary bees. In [29], the spatial variation in climatic conditions was estimated using a digital elevation model (DEM) together with latitude. Here, we used the average temperature for the warmest quarter (June, July, and August), during the current 30-year climate reference period (1990–2021), calculated from daily estimates and interpolated station measurements at a 1 km resolution from the Norwegian Meteorological Institute's database [41]. We used the average temperature during the warmest quarter instead of the annual mean temperature because high winter temperatures along the coast result in annual mean temperatures not reflecting the gradient in ambient temperature experienced by bees during their main activity periods in Norway (spring to autumn). Using modeled climate data instead of DEMs to estimate the effects of temperature on bee diversity enables one to project changes in bee diversity as a function of future climate scenarios. In [29] we used a habitat suitability model to estimate the potential habitat area surrounding each pan trap at a 60-m radius. In contrast, in the current study, we use the proportion of pixels identified as "grassland" by satellite-derived land cover maps as estimates of habitat availability within a 250-m radius surrounding each trap. A benefit of using satellitederived grassland classifications instead of estimates of habitat suitability maps that may partly rely on existing land cover products is that one reduces the number of modeling steps required to produce maps of pollinator diversity from remote sensing data. In addition, we

used the geographic distance to areas with soils mapped as having a high water infiltration capacity by the Norwegian geological survey [42], as such areas are typically located on sandy soils, which is the preferred nesting substrate for the majority of Norwegian soil nesting bees.

We ran and compared eight models of solitary bee species richness that all followed the general formula:

Solitary bee species richness (survey year + average temperature during the warmest quarter + Distance to sandy soils + proportion of grassland within a 250-m radius).

The eight models differed in terms of the data source (map) used to estimate the proportion of grassland around each pan trap and the type of model used to predict solitary bee species richness. The survey year was included as a categorical variable to account for inter-annual variation in bee species richness as well as for annual differences in climatic conditions, which could influence the number of species sampled in a trap. For each land cover map, we trained a Random Forest (RF) regression model [43] and a Bayesian Regularized Neural Network (BRNN) model [44] in Caret [45] in R. BRNN models were tuned by selecting the number of neurons (1, 2, or 3) that resulted in the lowest root mean square error (*RMSE*) following 25 bootstrap resamples of the training data. RF models were tuned by selecting the mtry (number of parameters tested at each node) and split-rule variance or extra trees that resulted in the lowest *RMSE*, following 25 bootstrap resamples. We adopted leave-one-out cross-validation (LOOCV) to assess model predictive performance due to the small number of study sites. Small training datasets can result in large variances in model performance when using traditional training-testing splits (e.g., 70% training and 30% testing) before model fitting, compared to LOOCV [46]. In LOOCV, each model was iteratively trained and tuned on data from 71 study sites and then used to predict solitary bee species richness in pan traps from the one held-out site. Predictive performance is calculated across all LOOCV iterations. Following this nested cross-validation procedure ensures independence between the data used for tuning the models and the data used for final validations.

For each land cover-map model prediction, we evaluated the predictive power in terms of the Pearson correlation coefficient (Cor), the root-mean-square deviation (*RMSE*), and the mean absolute error (MAE) between observed values of solitary bee species richness and the solitary bee species richness predicted by the model. As an alternative form of model validation, we also tested the ability of model predictions of bee species richness to predict the variance in occurrence of solidary bee records obtained from citizen science data. We first downloaded all post-2015 solitary bee species observations from the Global Information Biodiversity Facility (GBIF) that intersected our study area (xmin: 9.836, xmax: 12.719, ymin: 58.909, ymax: 63.962) and had a GPS error of less than 50 m (n = 2111). To reduce spatial bias in the data—i.e., that some areas have been surveyed more frequently or intensively than others—we only included one occurrence per square kilometer. We randomly sampled 10,000 pseudo-absences from within our study area and used binomial generalized linear models to quantify how well the predicted species richness scores explained the variance in the GBIF presence-absence data. We used the Akaike information criterion (*AIC*) to compare the explanatory capacity of the different models (*δAIC* > 2). Finally, we used partial dependency plots through the R-package pdp [47] as a means to visualize the estimated marginal effect of the proportion of grassland, using the different land cover maps, on solitary bee species richness.

In summary, our modeling workflow included 16 unique models for combinations of reference data (survey, GBIF), model type (RF, BRNN), and land cover dataset (AR5, DW, ELC10, WC). Each model was iterated 100 times in order to estimate the mean and standard deviation in model performance metrics.

#### **3. Results**

We did not find that models using the proportion of grassland estimated from the vector-based Norwegian land cover map (AR5) outperformed models where grassland had been estimated from satellite-derived land use models with 10 m resolution. Specifically, satellite-derived maps exhibited an average *RMSE* of 2.87 ± 0.03 (±standard deviation), whereas the Norwegian AR5 map produced models with a *RMSE* of 3.02 ± 0.02 (Figure 2). Similarly, for the AR5-based BRNN models, the correlation coefficient between observed and predicted solitary bee species richness (SR) was slightly lower (Figure 2A) than for the satellite-based models (Figure 2B–D). This was also the case for the Random Forest (RF) models (Figure 2E–H).

**Figure 2.** Performance of Bayesian regularized neural network (BRNN) and random forest (RF) models for predicting solitary bee species richness (SR). Data points represent model predictions against observed SR following a leave-one-out cross-validation procedure. The average and standard deviation of correlation coefficients (Cor), root-mean and mean-absolute error (*RMSE*, MAE) from 100 iterations of each leave-one-out cross validation are reported for models trained with land cover data from (**A**,**E**) a Norwegian map (AR5), (**B**,**F**) dynamic world (DW), (**C**,**G**) European land cover (ELC10), and (**D**,**H**) world cover (WC). Red and blue points show the predicted and observed solitary bee species richness for each pan trap across the 100 leave-one-out iterations, and black points show the average predicted bee species richness and the associated observed species richness for each specific trap.

Among the satellite-derived models (Figure 2B–D,F–H), there were only marginal differences in their performances as predictors of solitary bee SR. Models using grassland habitat from DW performed best (*RMSE* = 2.8 ± 0.03; averaged across RF and BRNN models), followed by ELC10 (*RMSE* = 2.85 ± 0.03) and WC (*RMSE* = 2.87 ± 0.02). Grassland habitat was ranked the third most important predictor variable in BRNN and RF models (Figure 3). The mean summer temperature and sampling year were the most important predictors in the models. All models, independent of data source or model type, produced partial dependence plots that corroborated a positive association between grassland habitat and solitary bee SR (Figure 4). The association was less distinct in the AR5 model (Figure 4A) compared to the satellite-based models (Figure 4B–D).

**Figure 3.** Variable importance plots showing the scaled relative importance of sampling year (3 levels), summer mean temperature, distance to sandy soils, and grassland proportion within 250 m for predicting solitary bee species richness. For the sampling year, the year 2019 was used as the reference year and does therefore not appear in the figures. Variable importance is derived from Bayesian regularized neural network (BRNN; **A**–**D**) and random forest (RF; **E**–**H**) models for predicting solitary bee species richness (SR). Models were trained with land cover data from (**A**,**E**) a Norwegian map (AR5), (**B**,**F**) dynamic world (DW), (**C**,**G**) European land cover (ELC10), and (**D**,**H**) world cover (WC).

When validating the model predictions of bee SR against citizen science data on solitary bee occurrences using generalized linear models (Figure 5), we found that the order of performance was changed. We found that ELC10 performed best (*AIC* = 2278 ± 4), followed by WC (*AIC* = 2367 ± 3), and DW (*AIC* = 2376 ± 3). The BRNN models performed slightly better than RF models, both in terms of leave-one-out cross-validation (Figure 2) and in terms of explaining the occurrence of solitary bees from GBIF (Figure 5). Therefore, we used the BRNN models to generate wall-to-wall prediction maps of bee SR over the study region (Figure 6). A qualitative visual comparison shows that the broad-scale spatial patterns of bee SR predictions are similar between models, with bee SR increasing with north-south temperature gradients and along populated valleys where extensive grazing practices have established semi-natural grassland patches. At the landscape scale (Figure 7), AR5 predictions show less spatial variation than the satellite-derived maps. All models appear to pick up the grassland habitat adjacent to the runways at Gardermoen International Airport; however, ELC10 appears to pick up the most habitat in the agricultural landscapes southwest of the airport (Figure 7C).

**Figure 4.** Partial dependence plots from Bayesian regularized neural network (BRNN) and random forest (RF) models of bee species richness. Partial dependencies are shown for the spatial predictors: proportion of grassland habitat within 250 m, derived from (**A**) a Norwegian map (AR5), (**B**) dynamic world (DW), (**C**) European land cover (ELC), and (**D**) world cover (WC); (**E**–**H**) distance to sandy soils; and (**I**–**L**) mean temperature during the summer months.

**Figure 5.** Explanatory power of bee species richness (SR) predictions from Bayesian regularized neural network (BRNN) and random forest (RF) models. Effects plots are shown along with *AIC* scores from binomial generalized linear models of solitary bee species occurrence data from GBIF as a function of predicted bee SR from models trained with land cover data from (**A**,**E**) a Norwegian map (AR5), (**B**,**F**) dynamic world (DW), (**C**,**G**) European land cover (ELC10), and (**D**,**H**) world cover (WC). Points in the figures show the mean occurrence of solitary bee records calculated within bins of one decimal. Individual GLMs were run for each of the 100 spatially filtered datasets of solitary bee records in order to calculate the average *AIC* and its standard deviation.

**Figure 6.** Solitary bee species richness (SR) prediction maps for Trøndelag county (**A**–**E**) and Oslo and Innlandet county (**F**–**J**) from Bayesian regularized neural network (BRNN) models trained on grassland habitat data from (**B**,**G**) a Norwegian map (AR5), (**C**,**H**) dynamic world (DW), (**D**,**I**) European land cover (ELC10), and (**E**,**J**) world cover (WC). Each map is overlaid with the bee survey sites.

**Figure 7.** Solitary bee species richness (SR) prediction maps over the landscape surrounding Gardermoen international airport (left panel) from Bayesian regularized neural network (BRNN) models trained on grassland habitat data from (**A**) a Norwegian map (AR5), (**B**) dynamic world (DW), (**C**) European land cover (ELC10), and (**D**) world cover (WC).

#### **4. Discussion**

Species distribution modeling that incorporates high-resolution satellite data is still in its infancy [13], yet the application to solitary bee species richness presented in this study confirms its potential. Our results show that the Sentinel satellite-based land cover maps outperformed a regional manually digitized land cover map over southern Norway (AR5) in predicting solitary bee species richness (Figure 2). While the use of satellite imagery to map vegetation types or individual forest species is common [48], using derived products to predict habitat suitability for animal or plant species that are not directly visible in satellite images is far less common [11,49]. Here we show that the availability of grassland habitat within 250 m, measured from satellite data, is positively associated with solitary bee species richness (Figure 4) and is therefore predictive of solitary bee occurrences at regional scales (Figure 5).

We found larger differences in the predictive capacity of grassland habitat derived from manually mapped versus satellite-derived land cover maps than differences between the satellite-based maps themselves (Figure 2). The AR5 map uses a minimum mapping unit of 2000 m2, which results in very small grassland fragments being subsumed into a broader land cover class [18]. For example, road verges or small urban parks will be classified as "built" in the AR5 maps. However, such small grassland patches can harbor significant floral resources for bees, and the fact that AR5 does not map these areas may therefore be why AR5 was less predictive of species richness than satellite-based maps. Furthermore, AR5 is not as up-to-date as the satellite-based maps and may misrepresent the conditions on the ground during 2019, 2020, and 2021, when the field surveys were conducted. In contrast, ELC10 and WC use a minimum mapping unit of 100 m2, while DW uses 2500 m2. At the landscape scale, it is evident that predictions of bee species richness with ELC10 were more spatially heterogeneous than AR5 or DW (Figure 6), probably due to its smaller minimum mapping unit and ability to detect smaller grassland patches. This may have contributed to its greater predictive capacity compared to AR5. This also explains why, when validating our models using citizen science data from GBIF, ELC10 outperformed DW. In contrast to the survey dataset, GBIF data are spatially clustered and biased toward urban landscapes, which are easily accessible but also have complex landscapes. Due to ELC10's small minimum mapping unit, it captures the landscape complexity more than DW does and is able to predict the GBIF bee SR better.

The accuracies of the solitary bee species richness models presented here are arguably high enough to make data-driven management decisions at the landscape scale. The average *RMSE* of 2.87 means that one can at least distinguish very species-rich areas (maximum species richness of 16 in our study) from species-poor areas (zero species). The differences in model accuracy between satellite-based grassland maps were marginal (*RMSE* difference of 0.04); however, when visualized at a landscape scale (Figure 6), small nuances may have important implications for management and decision-making. For instance, a ranking or prioritization of grassland patches for receiving conservation subsidies based on the maps in Figure 6 may yield different results depending on the data source used. Therefore, post-stratified accuracy assessment of species distribution models in specific landscapes may be necessary before they can be adopted in practice [50].

Based on several limitations identified in our study, we outline avenues for further research on the integration of high-resolution satellite data in species distribution modeling. Firstly, it is not necessary to use derived products such as land cover maps, as we have carried out here. Instead, one can use the spectral signatures themselves from a satellite image to calibrate distribution models [13], although this would not allow ecologists or policymakers to relate species-rich areas to specific land use types. Secondly, maps with a higher thematic resolution than those used in this study would produce more detailed species distribution maps [51]. For example, all four maps tested here contained a single broad category for grassland, without distinguishing between intensively managed grasslands and extensively managed grasslands, such as the mowed meadows from which we sampled solitary bees [11,29]. Therefore, measuring aspects of ecosystem condition or

use, such as grassland use intensity [52], might further improve the accuracy of species distribution models. Thirdly, we did not explore how accurately satellite-based prediction models can detect real changes in species distributions over time because we did not implement a field survey design that was comprehensive enough to capture changes in species ranges over time. However, we know from earlier studies that changes in land cover that are detectable from space are direct drivers of species range shifts [53]. To this end, DW is probably the most suitable Sentinel-based land cover dataset to quantify land cover and use dynamics [54] because of its continuous updates and delivery and would be well-suited to such dynamic species distribution modeling. This also strengthens the call for investment in long-term biodiversity monitoring programs so that satellite-based distribution models can be calibrated and validated with in situ data [55].

#### **5. Conclusions**

The proliferation of high-resolution earth observation data and derived land cover products provides scope for mapping biodiversity distributions with models that are both locally relevant for decision making and scalable to the globe. Here we found that globally available Sentinel-based land cover maps can improve upon manually digitized regional land cover maps for predicting the richness of solitary bee species in southern Norway. The differences in predictive performance between DW, WC, and ELC10 were marginal; however, at the landscape scale, the smaller minimum mapping units of WC and ELC10 allow them to resolve smaller habitat patches, which are reflected in the landscape variations in predicted species richness. Furthermore, the rich time series provided by maps such as DW (from 2015 to present) offer unique opportunities to model short-term changes in species distributions in response to land use changes if paired with in-situ temporal monitoring data. We conclude that the use of satellite-derived land cover maps can facilitate high-resolution species distribution models that can guide decision-making relevant to landscape ecology. To this end, future modeling efforts should be aimed at those species that perform key roles in ecosystems, are indicators of ecosystem status, and support nature's contribution to people.

**Author Contributions:** Conceptualization, M.A.K.S., Z.S.V. and R.E.R.; methodology, M.A.K.S., Z.S.V. and R.E.R.; formal analysis, M.A.K.S., Z.S.V. and R.E.R.; data curation, M.A.K.S., G.M.K. and Z.S.V.; writing—original draft preparation, Z.S.V., R.E.R. and M.A.K.S.; writing—review and editing, Z.S.V., R.E.R., M.A.K.S., G.M.K., G.M.R. and M.S.N.; project administration, M.A.K.S.; funding acquisition, M.A.K.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** Insect sampling from semi-natural grasslands in 2019 and 2020 was funded by the Norwegian Agricultural Agency (Klima-og Miljøprogrammet: POLLILAND, grant number 2018/72806). Sampling in 2021 was financed by The Research Council of Norway, project no. 160022/F40 NINA basic funding. This study was financed by the Norwegian Agricultural Agency (Klima- og Miljøprogrammet: POLLILAND-MIDT, grant number 2021/40219).

**Data Availability Statement:** The data and code to reproduce this analysis are archived on Zenodo: 10.5281/zenodo.7762665.

**Acknowledgments:** We thank Mikaela E.G.P. Olsen, Solveig Haug, Jonas Lystrup Andresen, April McKay, Stian Brønner, and Ida Elise Løvall Rastad for operating the traps installed in the semi-natural grasslands.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Comparing Methods for Estimating Habitat Suitability**

**Khaleel Muhammed 1, Aavudai Anandhi 2,\* and Gang Chen <sup>1</sup>**


**Abstract:** Habitat suitability (HS) describes the ability of the habitat to support living organisms. There are several approaches to estimate habitat suitability. These approaches are specific to a species or habitat or estimate general HS broadly across multiple species or habitats. The objectives of the study were to compare the approaches for estimating HS and to provide guidelines for choosing an appropriate HS method for conservation. Three HS estimation methods were used. Method 1 scores the suitability based on the naturality of the habitat. Method 2 uses the average of HS values found in the literature. Method 3 uses the species richness as an indicator for HS. The methods were applied to a case study in the Choctawhatchee River Watershed. GIS applications were used to model the suitability of the watershed. The advantages and disadvantages of the HS methods were then summarized. The multiple HS maps created using the three methods display the suitability of the watershed. The highest suitability occurred in the southern parts of the region. Finally, a decision support tool was developed to help determine which approach to select based on the available data and research goals.

**Keywords:** conservation; ecology; GIS; habitat suitability; indicators; land use/cover; spatial data; watershed

#### **1. Introduction**

Habitat suitability (HS) describes a habitat's ability to support a particular fish or wildlife species [1,2]. HS relates to environmental variables such as vegetation to the probability of a species' occurrence [3,4]. A simple way to describe HS is to determine how natural a habitat is [5]. The more a habitat resembles its natural state, the more suitable it is for the species to live in it. It is important to study HS as it is used to characterize how ideal a habitat is. Anthropogenic pressures on biodiversity such as urban growth and agriculture are key factors that cause HS decline [6,7]. Efforts to limit anthropogenic impacts on species and habitats can be strengthened by using tools for biodiversity monitoring. These include the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model, the ecological niche model (ENM), and the habitat suitability index model (HSIM) [8,9].

Understanding of the interactions between species and their environment is needed to determine the optimum habitat conditions. Indicators are powerful tools to represent the complex interactions between multiple components of the environment in simple terms [10]. Living and nonliving components such as plant/organism growth or climate are important in categorizing suitability [11]. Resources that a species needs to survive are often used as indicators. Parameters such as vegetation density, the abundance of water, and sediment characteristics also serve as indicators [12,13]. Other parameters such as road density [14] and the shell dissolution of mollusks [15] can be used as indicators in HS. Sometimes, the presence of a species can also be used as an indicator. For example, the presence of bird species has been used as indicators of habitat structural components and complexity [16].

**Citation:** Muhammed, K.; Anandhi, A.; Chen, G. Comparing Methods for Estimating Habitat Suitability. *Land* **2022**, *11*, 1754. https://doi.org/ 10.3390/land11101754

Academic Editors: Matteo Convertino and Jie Li

Received: 25 August 2022 Accepted: 23 September 2022 Published: 9 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Since HS is a measure of species–habitat interactions, mapping HS is useful in conservation efforts. The consistent estimation of HS is necessary to create reliable maps [17].

In the past, approaches for estimating HS were either species-focused or habitatfocused. HS is calculated based on the needs of individual species or species group in the species-focused approach [3]. The habitat-focused approach considers the presence of habitat components that may either be biotic or abiotic [18,19]. The different approaches are chosen based on the research goals [20,21].

A habitat-focused approach is common for estimating suitability [22]. With the habitatfocused approach, the HS index is calculated by dividing the current habitat conditions by the optimum habitat conditions. This results in a value between 0 and 1. When a simulation modeling framework is used, the index is the ratio of a model's output compared to an established standard of comparison or an optimum habitat condition. The comparison standard is either (1) an assigned numerical value that corresponds with the qualitative rankings (excellent = 4, average = 2, etc.); (2) a maximum regional value for models that use defined units (productivity, population density, etc.); or (3) the maximum rank for models that classify habitats hierarchically [1]. The denominators in all of these methods are related to the optimum habitat conditions. Factors affecting the optimum habitat conditions can be biotic (i.e., vegetation density and predation [2,23,24]) or abiotic (i.e., topography, water availability, soil characteristics, and temperature for soil systems and sediment concentration, and dissolved oxygen for aquatic environments [12,25–28]). The habitat is completely unsuitable when HS is characterized with a value of 0, while a value of 1 represents the optimum conditions [29].

A species-focused approach is used when the goal is to conserve a certain species. An example is the evaluation of habitat suitability based on the ability of each landscape to provide the needs of song birds [19]. Alternatively, a habitat-focused approach is taken to conserve a specific land use or land cover. For instance, water parameters such as water presence frequency and water depth re used to estimate HS for the wetlands. Description of lad use/land cover can be obtained in [30]. However, these approaches are very specific. It is important to compare the results of different methods in any region.

#### *Objective*

The objective of this paper was to compare three methods for estimating the habitat suitability and to develop a way to choose a method for estimating HS based on the available data and research goals.

These methods were then applied in a case study in the Choctawhatchee River Watershed. The study watershed is a biodiversity hotspot that houses more species of trees than any other forests in temperate North America [31].

#### **2. Study Region, Materials, and Methods**

#### *2.1. Study Region*

Figure 1 shows a map of the study region created using ArcMap® 10.4.1. The Choctawhatchee River and Bay Watershed is an important location in the Southeast of the United States. It is a biodiversity hotspot containing an abundance of native plant and wildlife species as well as being a critical habitat for gulf sturgeon and Choctawhatchee beach mice. Over 60% of the watershed is in Alabama, where there is a significant agriculture component [32].

As of 2019, the land use in Choctawhatchee River Watershed is provided in Figure 1 [33]. The region has high species richness when compared to the rest of the United States [34].

#### *2.2. Methodology*

ArcMap 10.4.1 was used to analyze the datasets. Python programming was also used. The libraries used were the Geospatial Data Abstraction Library (GDAL) 3.2.0 developed by the Open Source Geospatial Foundation in Chicago, IL, USA; NumPy 1.19.2 created by

Travis Oliphant in Provo, UT, USA; and Pandas 1.2.1 created by Wes McKinney in New York City, NY, USA. Three methods of modeling HS were used. Spatial data were obtained for LULC, species richness, and region extents. Table 1 lists the data and their sources.

**Figure 1.** Location of the Choctawhatchee River Watershed. (**a**) The species richness index of the United States is shown as are where the watershed is in the United States, higher richness index values are shown as red and lower values in blue. (**b**) A pie chart of the proportion of each LULC. (**c**) The full extent of the watershed and the locations of the Choctawhatchee and Pea Rivers.


**Table 1.** The details of the spatial data used in the study.

Method 1—Binary Method: Step 1: Download the Multi-Resolution Land Characteristics Consortium (MRLC) LULC data [38]; the land use/land cover data from the National Land Cover Dataset (NLCD) for the year 2019 and its legend were obtained [33]. Step 2: Classify the natural and unnatural groups and assign index; agriculture and developed land use classes were deemed unnatural, and the land cover classes open water, wetland, grassland, shrubland, and forests were considered as natural. Step 3: Obtain the binary

HS index; Unnatural LULC classes were assigned an HS index value of 0 and natural LULC classes were given an HS index value of 1. Step 4: Create an index map (natural and unnatural) where the LULC values were replaced with the corresponding HS index values (0/1) to create an index map. The analysis involved using the following software: ArcMap 10.4.1 (Clip tool from Raster Processing toolbox) and Python codes (Geospatial Data Abstraction Library with Jupyter notebook).

Method 2—Literature Review: Step 1 was the same as in method 2. Step 2: A document search was performed using Google Scholar (e.g., habitat suitability), which gave 500,000 plus results, and a more focused search using words in quotes, additional words (e.g., "habitat suitability", "InVEST"), and selected time-period (2010–2020) was carried out. Priority was given to articles based on the following three criteria: (1) the full articles were accessible; (2) the articles used a similar definition for habitat suitability; and (3) the articles provided the numerical habitat suitability values for LULCs comparable to those found in the study region. A total of 21 articles were retrieved. In Scholar search, InVEST, as an additional search term, was used in the search to narrow down the results in a systematic way. Step 3: Obtaining HS index (between 0 and 1). The results of the search were summarized in a table and graphs provided in the Results section. Habitat suitability values were organized by the specific LULC. The LULC were further grouped into broad classifications according to the LULC descriptions provided by the MRLC [39], for example, rivers, lakes, reservoirs, and glaciers were grouped as simply "water", while open forests, orchards, and native forests were grouped as simply "forest". When multiple values for a group were obtained from the literature, the average values for suitability were calculated and used to create the table. The box plot was created using Python. The HS vales were placed into single column arrays for each LULC group. A box plot was then created for each LULC group and displayed in the same figure. Step 4: Create an index map (natural and unnatural). The LULC values between 0 and 1 were replaced with corresponding HS index values (0/1) to create an index map. The analysis was carried out using the software described in Method 1. A table (Table 2) lists the references for each land use/cover type along with the number of data points obtained for them. The LULC values in the map clipped to the watershed's extents were replaced with the mean values obtained in the literature review.

Method 3—Species Richness Method: Step 1: Bring the datasets to uniform scales and obtain the species richness data [34]. National Land Cover Data (NLCD) land use/land cover map's resolution (30 m × 30 m) were rescaled to species richness maps with a 10 km × 10 km resolution. ArcMap 10.4.1 software with the Resample tool using the MAJORITY technique was used [40]. Step 2: Average the richness/land use. The major LULC from ~111 pixels (30 m) now represent the LULC for the 10 km map. The richness and LULC data were merged into one raster file by using the Combine ArcMap Spatial Analysist tool to observe both the number of species and LULC for each pixel. Step 3: Clip the watershed area. Shapefiles for the Choctawhatchee watershed (HUC 031402) and the Southeast Plains and Eastern Temperate Forest ecoregions were obtained [35–37]. The combined richness raster was clipped to the extent of each shapefile. Step 4: Estimate the average richness value. The average species richness for each LULC was then calculated for each region as well as the entire contiguous United States. Habitat suitability indices were normalized using the species richness results by dividing the species richness of each LULC by the highest species richness value for each region. If a 10 km grid cell for a forest within the Southeast Plains has a richness value of 300 and the highest richness value in that region is 600, the HS index would be 0.5 (300 divided by 600). Step 5: Mapping HS. The LULC values in the original 30 m map were replaced with the corresponding habitat suitability index values.

A table (Table 3) listing the total species richness, standard deviation, and count of grid cells was created by importing the attribute tables of the species richness raster images clipped to the extents of the Choctawhatchee River Watershed, Southeast Plains Ecoregion, Eastern Temperate Forest Ecoregion, and the contiguous United States.

#### **3. Results**

#### *3.1. Method 1—Binary Method*

A hypothesis map was created using the binary method where it was assumed that developed lands such as urban and agriculture have a suitability index score of 0, and every other landscape was assumed to have an index score of 1. The number of grid cells with a value of 0 were counted and compared to the number of grid cells that had a value of 1. Approximately 27.33% of the watershed had low suitability. The areas with low suitability appeared near the middle and northern parts of the watershed. Most grid cells with zero suitability occurred on the Alabama side of the watershed. Figure 2 displays the resulting HS index map of the watershed.

**Figure 2.** The habitat suitability map of the Choctawhatchee Watershed using the binary method (Method 1).

#### *3.2. Method 2—Literature Review Method*

A total of 21 studies were analyzed. Of the 21, 15 studies originated in China [9,41–54], two studies were from Ethiopia [29,55], and only one study each originated in India [56], Indonesia [57], Spain [58], and the United States [19]. A total of 36 values were used to calculate the average suitability for water, 17 were used for bare lands, 30 were used for grasslands, ten were used for shrub lands, 36 values were used for forests, 18 were used for wetlands, 24 were used for agricultural, and 15 were used for developed lands. The habitat with the highest mean value was forest land. Next was shrubland, followed by water and wetlands that had nearly the same average suitability. Developed lands predictably had the lowest mean suitability.

Table 2 breaks down the broad land use/cover classes into specific types and lists the references that site each LULC. The number of values obtained for each LULC type is listed, along with the average suitability. Different LULC within the same class sometimes had very different suitability values. For instance, raw land and beaches were both considered bare land, but had an HS of 0.05 and 0.9, respectively. The overall average HS for each broad LULC class is also listed.


**Table 2.** The average habitat suitability for each land use/cover type obtained from the literature review.


**Table 2.** *Cont.*

The landscapes with the largest range of suitability values were water habitats, which had values ranging from 0 to 1. This was followed by forest habitats with values ranging from 0.1 to 1. The land use with the lowest variability was developed land, which ranged from 0 to 0.15, with most studies reporting the suitability to be 0. The median and average values were similar for grasslands, shrub lands, wetlands, and developed lands. Median and average values for the remaining landscapes were not as close with averages falling well below the median value, except for bare lands, where the average was higher than the median. This is displayed in the box plots of Figure 3. The bold line represents the median, the diamond marker represents the mean, and the circles represent the outliers.

**Figure 3.** The suitability box plots for each LULC class. The whiskers indicate 1.5 times the interquartile range values. The bold line represents the median and the diamond shaped markers represent the mean. The circles are the outliers.

Figure 4 displays a HS map of the watershed based on the average values derived from the literature. The Alabama side of the watershed in the North generally had a lower HS when compared to the Florida side in the South. Urban areas had the lowest HS at 0.012. Urban land uses made up 6.80% of the watershed. Bare land had the second lowest HS at 0.224 and made up 0.12% of the watershed. The habitat with the third lowest HS was agriculture, having a HS near the median at 0.354. Agriculture made up 21.26% of the watershed. Overall, about 28% of the watershed had a relatively low HS.

**Figure 4.** The habitat suitability map of the Choctawhatchee Watershed created using the results of the literature review.

#### *3.3. Method 3—Species Richness Method*

The area of interest was the Choctawhatchee River and Bay Watershed located in the Southeast United States. The watershed was within the boundaries of the Southeastern Plains ecoregion, which was in the Eastern Temperate Forest ecoregion that encompassed most of the Eastern United States. A visual representation of these regions is shown in Figure 5.

**Figure 5.** A location map with boundaries of the Choctawhatchee River Watershed, Southeastern Plains Ecoregion, and Eastern Temperate Forest Ecoregion overlaid in the United States.

Table 3 lists the average total richness and standard deviation values for each LULC. The number of pixels is also listed. Each pixel represents 100 square kilometers. The watershed had higher values than the averages at broader levels. Since the LULC raster was resampled from 30 m to 10,000 m, there were no pixels where high intensity developed land, barren land, or herbaceous wetland were the majority LULC. The values for these LULCs were estimated based on the most similar region (Southeast Plains). The trends also did not match across levels. Richness was high in the medium intensity developed land within the watershed, but richness generally decreased as the intensity (amount of impervious surface) increased. Broad generalizations might not be accurate when assessing a watershed. A visual representation of Table 3 can be seen in Figure S3 in the Supplementary Materials.


**Table 3.** The average and standard deviation of the species richness for each land use/cover type.

Figure 6 shows the resulting maps. The lowest index value when using the average species index for the Southeast Plains ecoregion was close to 1, meaning that there was very little variability in the values. The variability increased as the sample size used to calculate the average increased. The lowest HS values occurred the most in the northern parts of the watershed in the Southeastern Plains and Eastern Temperate Forest maps. The map derived from using the entire contiguous United States did not seem to have a pattern aside from the highest suitability occurring in wetlands along the streams of the watershed.

**Figure 6.** The habitat suitability maps of the Choctawhatchee Watershed based on species richness derived from averaging the values within regions of varying sizes. Maps created using the average values from grid cells in the (**a**) Southeastern Plains ecoregion; (**b**) Eastern Temperate Forest ecoregion; (**c**) contiguous USA.

#### *3.4. Comparison of the Methods*

The average HS values of each LULC is shown in Figure 7. The landscapes with consistently high HS values, regardless of method, are open water, forests, and wetlands. The most apparent differences in HS were seen with agriculture and urbanization. These two land uses were low when using the binary and literature review methods. However, HS was high for these land uses when using the species richness data.

**Figure 7.** The habitat suitability index for the various land use/cover obtained.

The binary method is the simplest method. It only requires a list of the habitats present in a region and an understanding of which ones are natural or unnatural. The literature review method requires more research than the other methods. This method is also heavily reliant on values from previous works. The obtained values are assumed to be correct. The species richness methods require species count data. The completeness of the data has a large impact on the outcome, so the values will be inaccurate if many species are not accounted for.

#### *3.5. Choosing a Method*

Assessing what data are available is vital when deciding on a method. The natural/unnatural binary method is used when minimum data are available. If the habitats are known, it is possible to determine whether the habitat is natural or unnatural. A list of habitats is derived from the literature or datasets. Mapping HS requires spatial data. Research goals may require the HS values to be more exact. Using 0 and 1 for unnatural and natural, respectively, would be too broad. Expressing variation between the HS of different habitats requires expert knowledge of the target region. The literature review method is used when there is no access to expert knowledge. A literature review is used to synthesize the results of multiple studies [59]. Existing literature is needed to perform a literature review and gathering results from similar studies is preferred [60]. HS indicators such as species richness are used in a data driven approach. Using an indicator requires available data for the study region or a similar region. Indicators that are used for this are biophysical, socio-economic, or management attributes [61]. The characteristics of each landscape were studied to determine an indicator that could be used to model the suitability across all landscapes, which included the biotic and abiotic components of deciduous forests [62], evergreen forests [63], mixed forests [64,65], wetlands [66,67], shrub lands [68], grasslands [69,70], and bare lands [71,72]. Species richness was used as an indicator to estimate the overall health of any habitat and to identify priority conservation areas [73–75].

Figure 8 summarizes when to use each method and lists the information required to perform them. The figure indicates the specificity and complexity of the methods in relation to each other. The binary method is the simplest and least specific of the methods. Next is the literature review method, followed by the species richness method, the most complex and specific method of the three.


**Figure 8.** A comparison of the three methods. Method selection depends on the available data, complexity level, and research goals.

#### **4. Discussion**

The results revealed that there were significant differences in the habitat suitability scores when using the different methods. However, the Florida side of the watershed consistently had a higher average suitability than the Alabama side.

#### *4.1. Assumptions and Limitations in the Case Study*

The three methods to estimate habitat suitability are the natural/unnatural binary method, a literature review of published works, and the indicator method using species richness. The binary method is the simplest method since calculations are not required. This method requires the knowledge of the landscapes present in the area of interest. Some land covers such as forests and wetlands could be managed and therefore considered unnatural. It was assumed that the only unnatural LULC were urban areas, cropland, and pastures. The literature review resulted in a map that was similar to the binary map. The key difference was the absence of 0 or 1 values in the map based on the literature review. Bare land also had an index value that was less than agriculture. There was also some variability between the index values of the urban and agricultural lands instead of both being assumed to be equally unsuitable. Generally, the areas that had the lowest suitability were nearly the same.

The map of species richness (Figure 1a) showed that the index values were all close to one. This is likely due to the region being a biodiversity hotspot [31,76]. During the development of the three maps using the species richness data in conjunction with LULC, the species richness values were higher in the Choctawhatchee River Watershed than in the rest of the United States due to (1) the watershed being more diverse than average or (2) there was a smaller sample size, which resulted in higher average values. Furthermore, there were not enough data points within the Choctawhatchee River Watershed to calculate the average species richness for every LULC. Using the average richness values resulted in maps dissimilar to each other, aside from the values having low variability compared to the results of the literature review. These maps also did not resemble the hypothesis binary map. Developed land was among the land uses with the highest suitability when using the average based on the contiguous United States. However, developed land is usually thought to be 0 or very close to it [29,41,42,46,48,53,54,58]. This could either mean that the species richness is a more accurate indicator of HS than averaging the results from past studies, or that the species richness was not adequate on its own to estimate HS. It could also mean that the species richness dataset is too limited.

A study in another region would use the species richness data available in that region. If no data are available, the values from a nearby region are useable. A literature review can also be used to estimate the values. The number of species in an area could also be counted manually when working in a small area. It is also possible to use the presence of one species as an indicator of the species richness of another species based on how important the indicator species is to the diversity of a habitat [77–79].

#### *4.2. Advantages and Limitations of the Methods Used*

The advantage of the binary method (Method 1) is that it can be applied in the absence of data or expert knowledge. Its limitation is that it is broad and does not account for the differences between LULCs. Agriculture (both cropland and pastures) and all other different types of developed lands are assumed to have the same suitability. It is possible that agricultural lands are more suitable than developed lands because they are not entirely unnatural. Open space also may be more suitable than high intensity developed lands.

The advantage of obtaining HS values from literature (Method 2) is that this method does not require expert knowledge. The more variable values are more descriptive than the simple binary method. The assumption is that the values used in the literature are accurate. However, the accuracy of the values changes when using an average of the values. This method is limited by the publications available, which requires other scientists and researchers to have conducted studies beforehand. These HS values come from the literature originating in various regions due to the lack of studies conducted in the Choctawhatchee River Watershed. This may be a potential advantage as HS can be estimated in a region where no previous studies are available. Studies based on regions that are unlike the area of interest cause this method to have the same disadvantage as the binary method (i.e., LULCs are given the same suitability values despite being different). This is because the values for all types of similar habitats are used to calculate an average. Every type of developed land, agricultural land, forest, and wetland is assumed to have the same suitability, which may not be accurate. There are also habitats that have a wide range of values. Both beaches and deserts are bare land. Beaches have high HS and deserts have low HS. In this case, using the average may not be adequate to account for the range in values.

Furthermore, the HS values were the average value from 23 articles. The limitation of this method is that the average value is subjective to the literature used.

The species richness method (Method 3) presents a way to estimate the general habitat suitability, whereas other methods estimate the suitability for a single species, a species group, or a specific habitat. This method does not require knowledge of the individual species, the present, or optimum habitat conditions. The binary method where natural is suitable, and unnatural is not suitable, is currently how HS is modeled in cases where there is no specific habitat data or when the goal is to estimate HS in general [5].

The main disadvantage is that the results of these methods are sensitive to the amount of data that is present. There is currently a lack of wildlife population data in most locations. The database used in this study only presented the species richness for the vertebrates (mammals, birds, reptiles, amphibians, and fish) and trees. The available data did not cover all macro-organisms. There was no species richness data for invertebrates such as arthropods and mollusks, non-tree plant species such as grasses and shrubs, or fungi. The total population of each species group was also unavailable. The details found in Table S3 in the Supplementary Materials cannot be utilized given the obtainable data.

The resolution of the spatial data also influences the accuracy. Accuracy decreases as the grid cell size increases because it becomes harder to account for the evenness of a species. For instance, a grid cell can represent a hectare. Most of the species might live in a section that is a tenth of a hectare. However, all of the species were counted to obtain a total value for the entire hectare. Smaller grid cell sizes allowed for more precise species mapping.

Using species richness by itself is not adequate when estimating the habitat suitability. When looking at suitability maps made from individual species group richness (Figure 9), tree richness had the highest range of index values. Despite the index values, the range of bird species was the highest, with the lowest being three species and the highest being 249 species. The distribution of values in the total richness index map (Figure 1a) was most influenced by the number of trees and birds. This means that areas in the Southeast United States and along the coasts had the highest biodiversity. The distribution of species did not seem to be driven by general land use types, but rather a combination of climate, terrain, and other factors.

**Figure 9.** The national HS maps based on 1. tree richness, 2. mammal richness, 3. bird richness, 4. reptile richness, and 5. amphibian richness. Higher species richness values are shown as red and lower values in blue.

The advantage of relating the species richness to specific land uses is that it gives an extra dimension to maps in comparison to the existing methods that estimate HS using LULC alone. This method also grants the ability to determine the typical suitability of a habitat based on data. As new data become available, HS can be adjusted to reflect the changes in biodiversity [80–82]. Being based on observable data can be an advantage and a disadvantage for the method.

The HS estimation methods are based entirely on the number of species, assuming that the species are distributed independently of spatial evenness. Doing so increased the possibility of the inaccuracy in the results. In addition, functional species such as predators, raptors, or primary productivity as indicators of HS could be used as relevant factors for the study. Validating theoretical concepts is a challenge because there are not observations to validate the model [83,84]. These apply to HS, also a theoretical concept.

#### *4.3. Implications for Conservation*

The spatial representation of HS is a good tool for supplementing conservation strategies. Biodiversity maps are used to protect biodiversity in many conservation programs [85]. Studies have shown a link between habitat suitability and wildlife population viability for a variety of species [86–89]. The binary method of estimating HS may not be a good tool for biodiversity conservation as it is not a function of biodiversity or habitat conditions directly, but it does show where human land uses occur [5]. The results of the literature review provide a good idea of which habitats are the most suitable. The most suitable habitats can then be studied to determine the species viability [90]. The HS maps where suitability is an index of species richness are direct estimations of biodiversity. These maps can be used to rank habitats in an area to determine which habitats are the most viable and which habitats potentially need conservation attention.

HS are linked both directly and indirectly to almost all the 17 Sustainability Development Goals (SDGs). For example, HS is important for water and land resource conservation, which are related to SDG-14 (life below water) and SDG-15 (life on land). HS is indirectly related to SDG-6 (clean water and sanitation) because it is an integral part of water integrity, which is influenced by the physical characteristics of the waterbodies (physical integrity) and impacts the life below water (biological integrity) [91,92].

#### **5. Conclusions**

The objective was to compare the three approaches for estimating habitat suitability, summarize the advantages and disadvantages of these methods, and provide guidelines for selecting a HS method for conservation. The study focuses on the Choctawhatchee River Watershed (in Alabama and Florida, USA). The three habitat suitability estimation methods were as follows. Method 1 provides a suitability score based on the naturality of the habitat. Method 2 uses the average values from the literature with similar definitions for habitat suitability. Method 3 uses species richness. HS estimation is approachable from the perspective of a single species or species group, from a habitat-focused standpoint, or with the goal of estimating the suitability for wildlife in general. These approaches can be too specific or too broad. Estimating HS using species richness data is more specific than the existing binary method while being broad enough to use when modeling large multi-habitat areas such as a watershed. If complete species richness data are available, this method is advantageous. Using a more complete dataset may reveal that natural habitats are more suitable than developed lands. It is therefore important to gather more data before using species richness as an indicator.

In choosing a method, approaches can be chosen after determining what types of data are feasibly obtainable and based on the research goals. Things to consider are the specificity of the method, the accuracy of the data, and the assumptions made. Different methods change how conservation strategies are chosen. Broad methods assist in identifying how natural each habitat is. Specific methods assist in identifying the species or resource distribution in each habitat [74]. It is important to consider conservation goals when choosing a method. Using a method that includes one or multiple HS indicators such as species diversity, the presence of invasive species, and/or water quality makes it easier to decide on which conservation measures to take [75,93].

Steps should be taken in the future to improve HS mapping. This includes using models and techniques such as machine learning to predict species richness based on inventory data for terrestrial species [94,95] and aquatic species [96]. Modeling the change in HS in real-time is also a possibility [97]. These methods are currently not being used to produce maps for habitat conservation or the general public. Using functional species or primary productivity as indicators of HS could be used as relevant factors for study in the future. Habitat suitability modeling will become accessible and more evidencebased when accurate and complete species maps become obtainable. This will make it possible to consistently identify habitats to apply conservation actions. Currently, it is best to have expert knowledge of the region to estimate the suitability of the habitats or use the literature review carried out in this study. If this is not an option, using the natural/unnatural approach is the next best method.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/land11101754/s1, Figure S1: Venn diagram comparing deciduous forest, evergreen forest, and wetland habitats; Figure S2: Venn diagram comparing shrub land, grassland, and bare land habitats. <sup>1</sup> Steppe grasslands have very fertile soils. <sup>2</sup> Savannah grasslands have sandy/stony soil; Figure S3: Average terrestrial species richness for each LULC class.; Figure S4: Venn diagram that compares the Brillouin, Shannon–Wiener, and Hurlbert biodiversity index equations.; Table S1: Potential habitat conditions and components: A—all; B—bare land; D—deciduous forests; E—evergreen forests; G—grassland; S—shrub lands; W—wetlands; Table S2: Optimum habitat conditions; Table S3: Biodiversity Index Equations.

**Author Contributions:** Conceptualization, A.A., G.C. and K.M.; Methodology, K.M.; Software, K.M.; Validation, K.M.; Formal analysis, K.M.; Investigation, K.M.; Resources, A.A. and G.C.; Data curation, K.M.; Writing—original draft preparation, K.M.; Writing—review and editing, A.A., G.C. and K.M.; Visualization, A.A. and K.M.; Supervision, A.A.; Project administration, A.A. and G.C.; Funding acquisition, A.A. and G.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was supported by the National Institute of Food and Agriculture of USDA through Grant No. 2018-68002-27920 to Florida A&M University, USDA-NIFA capacity building grant 2017-38821-26405 and 2022-38821-37522; USDA-NIFA Evans-Allen Project, Grant 11979180/2016- 01711; the National Science Foundation through grant no. 1735235 to Florida A&M University as part of the National Science Foundation Research Traineeship; and the U.S. Department of Education GAANN Program: GAANN: Meeting the Needs of the Nation's Infrastructure through Civil Engineering at Florida A&M University (Award #: P200A180074). The authors would like to acknowledge Mr. Teek for editing this manuscript.

**Data Availability Statement:** The generated data are original and specific to this study and available from the corresponding author upon request.

**Acknowledgments:** We especially thank the members of FAMU's NSF Research Traineeship Sustainable Food Energy and Water Systems (NRT-SFEWS) group.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

