1. Introduction
Turkeybeard (
Xerophyllum asphodeloides (L.) Nutt.) is a perennial understory forest herb that grows in discreet mountain populations in the eastern United States, from Virginia to Alabama. The plant occurs naturally in forested areas of these states as well in the Pine Barrens of Southern New Jersey and thus is found in 10 states [
1,
2,
3,
4]. Turkeybeard is native to North America and is classified as a rare and threatened species in some states and at the national level [
5,
6,
7]. There is only one published study [
3] that offered a predictive habitat suitability model for turkeybeard, which was situated in western and eastern Virginia. Bourg et al. [
3] described a set of environmental variables that might correlate well with turkeybeard occurrences: elevation, slope, forest type, and fire frequency.
To manage natural areas in a sustainable manner, habitat modeling approaches are necessary for evaluating current and desired future conditions of forests. It has been suggested that ideal management plans for public forest lands would have a conservation perspective, provide a set of sustainable goals, and employ habitat modeling approaches [
8,
9]. For instance, forest management goals might state a need to increase plant species diversity and restore the natural integrity of a forested landscape. Therefore, a critical aspect of forest management is to use habitat models to gauge the development direction of a forest when management activities are employed [
10]. Managing forest areas using sustainable approaches has become a worldwide issue [
11]. Over the past 30 years, scientists and experts have used habitat modeling approaches to make informed decisions concerning the management of forestlands for biodiversity conservation [
8,
10]. In fact, the U.S. national forests are currently managed under an approach that focuses on the sustainability of ecosystems [
12]. In this regard, habitat suitability models may be of value in assessing current and future trajectories of certain forest conditions.
Giles [
13] defines a Habitat Suitability Index (HSI) as an index of the carrying capacity of an area for a given species. The United States Fish and Wildlife Service [
14] quantified habitat suitability as an organism’s life requisites, using the structure, composition, and spatial components of habitat. Habitat models, therefore, allow quantification of habitat suitability as a proxy for the response of a species to an environment [
15]. The development of habitat suitability indices has been driven by advances in data processing capacity and analysis techniques, shifts in scientific paradigms, and increased attention to environmental heterogeneity and scale issues [
16].
Since habitat-related information is often derived from forest inventories, one can potentially produce habitat suitability models that are a function of forest character and forest management history. With the many, easy to implement, modeling approaches available today, it thus becomes critical to address several conceptual and methodological issues when constructing these models [
10]. A habitat suitability modeling process ideally follows five steps: conceptualization, data preparation, model calibration (fitting), model evaluation, and spatial prediction [
17]. With respect to model fitting and spatial prediction, several analytical methods have recently been employed for developing habitat suitability models, including generalized linear models (GLM) [
18], classification and regression trees (CART) [
19], random forest (RF) [
20], artificial neural networks (ANN) [
21], generalized additive models (GAM) [
22], and maximum entropy (MaxEnt) approaches [
23].
In this paper, we use this prior work [
3] to inform a new ensemble model to predict relative habitat suitability across the Talladega National Forest (TNF). This represents the first effort to model a potentially suitable habitat for the rare turkeybeard in the southern part of its range. We discuss how this model can be used to inform forest management, and also address some limitations associated with modeling rare species. We hypothesize that through the development of a habitat model, the highest valued habitat can be identified by a consistent prediction of it through four different models, and the correlation between the presence of the plant and the highly valued habitat is positive and relatively good.
3. Results
Sixteen environmental variables were eliminated based on Pearson’s
r > 0.7 and VIF > 10, leaving 12 variables to be used in the models (
Table 1).
The average AUC score was 0.99 for MaxEnt model (
Table 2). The ground slope had the highest contribution to the model. The second highest contributing variable was Bio4 (temperature seasonality), and the third was Bio8 (mean temperature of the wettest quarter). The least contributing variables were aspect, soil available water content, and soil pH. The threshold average score for the 10 runs was found to be 0.51 and it was applied to generate the prediction map.
The best GLM model included seven environmental variables: ground slope, soil percent silt, Bio4 (temperature seasonality), Bio8 (mean temperature of the wettest quarter), soil available water content, soil percent clay, and elevation. However, only four variables had significant coefficients at
p < 0.05, including ground slope, Bio4 (temperature seasonality), soil percent clay, and soil available water content. The threshold score for the best model was 0.15 and it was used to generate the final suitability map. Based on the best model, the AUC score was 0.96 (
Table 2).
The best GAM model included five environmental variables: elevation, ground slope, soil percent clay, Bio4 (temperature seasonality), and Bio8 (mean temperature of the wettest quarter). Based on ANOVA parametric effects for the variables in the best model, all were significant except Bio4 (temperature seasonality). Based on ANOVA for nonparametric effects, Bio4 (temperature seasonality) and Bio8 (mean temperature of the wettest quarter) were found to be significant; however, elevation, ground slope, and soil percent clay were insignificant for nonparametric effects. The threshold score for the best model was 0.42 and it was applied to generate the final prediction map. The best GAM model AUC score was 0.98 (
Table 2).
All 10 runs of RF were averaged, and they were evaluated to generate the final prediction maps and the AUC score was used to evaluate the model’s success. The average AUC score was 0.99 (
Table 2). All 12 variables which were selected after the correlation check were used in the RF modeling process. The ground slope was the most contributive variable in the model followed by elevation and Bio4 (temperature seasonality). The least contributive variables were soil organic material, soil pH, and aspect. The threshold average score for the 10 runs was 0.39 and it was applied to generate the final prediction map.
Based on the evaluation of all modeling approaches, the highest AUC score was 0.99 for the MaxEnt and RF analyses. The lowest AUC score was 0.96 for the GLM (
Table 2), suggesting that the models were generally successful in classifying presence and pseudo-absence data for turkeybeard on the TNF with the given environmental variables. Based on the point-biserial correlation evaluation, the prediction performances for each model were found to be good. The lowest prediction performance was 0.71 for GLM, and the highest score was 0.94 for the RF (
Table 2). The AUC and biserial correlation relation are given in
Figure 2. Given both of these metrics (AUC and biserial correlation), the average of 10 runs of the RF model seemed to produce slightly better results for classifying presence and pseudo-absence data for turkeybeard on the TNF. The ground slope and elevation variables were the most important of the 12 variables used in the RF model runs. The other three models also used these two variables, but fewer of the other 10 variables because they were found to neither be statistically significant nor contributive to model results.
Based on our modeling process 97.01% (92,428.2 ha) of the land was assigned a predicted value (presence or absence). The remaining 2.99% (2847.8 ha) contained NODATA values as a result of missing values in one or more environmental data layers. Of the four models, the highest suitable area was 2979.2 ha (0.92%) using GLM predictions. The lowest amount of suitable area predicted was 225.9 ha (0.24%) of the TNF using RF (
Table 3). The final habitat map was generated from the composite of four methods. All four models agreed on 159.57 ha (0.17%) of suitable habitat for turkeybeard. The prediction area from the agreement of three models was 310.05 ha (0.34%), 348.84 ha (0.38%) from two models, 2293.74 ha (2.48%) from a single model. The known presence locations comprise 2.62 ha; therefore, nearly 157 ha (159.57 ha–2.62 ha) of new potentially suitable locations are suggested based on suitable areas agreed on by all four models. The suitability maps for MaxEnt, GLM, GAM, and RF processes can be found in
Appendix A.
The final prediction map shows the most likely suitable areas in TNF represented by areas where all models agree (
Figure 3). Based on the final suitability prediction map, the most highly suitable areas are mostly located in the same geographical areas as the already known populations. Most of the likely suitable locations are also located in the northern portion of the national forest. Suitable areas resulting from the combination of the four model outputs include the already known locations of turkeybeard.
Across all four methods, the most highly contributing (MaxEnt, RF) and significant (GLM, GAM) variable was ground slope. Furthermore, the Bio4 (temperature seasonality) variable was found to be highly contributive for the GLM and MaxEnt models and the elevation variable was found to be highly contributive for the GAM and RF models. Otherwise, the composition of each model’s significant or highest contributing variables was different. The five highest contributive variables of all four models can be found in
Appendix A.
4. Discussion
The results indicate that using a small sample size for MaxEnt, GLM, GAM, and RF models can provide adequate habitat prediction results. As expected, the predictions of the suitable areas across the whole of the TNF were very low, ranging from just 0.24% of the area predicted by RF to 3.22% of the area predicted by GLM. The potentially suitable area that represents where all models agree, from the ensemble modeling effort, is only 0.17% of the TNF (159.57 ha). One of the most important limitations of this study was the presence of data for the species (only 22 sites). Turkeybeard species are located in 10 states, but each state has very low occurrences due to the species’ rareness. Only 22 occurrence locations were available from the TNF to generate the potentially suitable areas of the species. Based on the literature, sample size effects become less critical above 50 presences, even though some studies claim that some complex methods (MaxEnt) can effectively function with a small sample size [
17,
52,
53,
54]. Guisan et al. [
17] also advise using 20 to 50 presence locations. However, Hernandez et al. [
41] claimed that MaxEnt modeling can produce useful results with a sample size as small as five occurrences, yet the sample size can still have a profound impact on the predicted probability [
55]. These issues are important as our small sample size may have led to overfitting the models. Further, the proportion of background (pseudo-absence) points may have led to biases in the results.
Another limitation of this study was the geographical distribution of the occurrences. When building a model, it is important to incorporate as many geographically diverse samples as possible [
55]. The clustering of turkeybeard occurrence sites in a fairly limited geographic area resulted in a small area being predicted suitable. The models seemed to be able to discriminate between the environmental characteristics of the clustered locations of known turkeybeard and the random (pseudo-absence) points distributed across the more highly variable geography of the whole TNF. However, while similar, the outcomes (potential highly suitable habitat) from the four models were diverse enough to suggest caution is needed in preserving areas that were identified as potentially containing turkeybeard; proposed habitat areas will need to be examined for the presence of turkeybeard.
The first known study to predict habitat suitability for the turkeybeard was conducted by Bourg et al. [
3]. They conducted their research in Virginia using 24 population occurrences of turkeybeard. For their analyses, they used elevation, slope, aspect, forest type, six soil variables (composition, depth, drainage, pH, fertility, and water capacity), and fire frequency variables. Their models did not include climate variables, but their models included fire frequency variables, unlike our research. They used the classification and regression tree (CART) modeling using GIS, and generally, their modeling effort was successful at defining suitable habitats for turkeybeard. They found that four variables (elevation, slope, forest type, fire frequency) were major determinants for estimating turkeybeard distribution. Our studies agree on the importance of slope, and some agree on elevation, but we did not consider fire frequency and forest type which may be important to consider in future models. Elevation likely did not show up as very important in our work due to either a lack of sufficient variability in those gradients across the TNF, or a lack of sufficient sample locations to successfully identify a relationship over random variability. Our study shows that ground slope is the most contributive variable among all models. Elevation is the second highest contributive variable for GAM and RF models, whereas Bio4 (temperature seasonality) is the second highest contributive variable for the MaxEnt and GLM models. This shows some initial agreement among the four models.
Several other studies found the climate and soil conditions important for estimating plant abundance and distribution [
56,
57,
58]. Based on this guidance, we included climate and soil variables in our models. Chauvier et al. [
59] found that the most important environmental factors for the distribution of the plant species involved the climate. In this work, climate variables were also found effective, but soil variables produced insignificant effects. For further study, if more geographically diverse populations of turkeybeard are located, it may help isolate the effects of soil components on the species distribution. Knowledge of understory herb distribution may have further informed this analysis. Yet, given the raster GIS databases available to this study, and the inability to access the study area due to COVID restrictions placed on access to it by the U.S. government, the inclusion of an understory herb database—or field data that represent understory herb distributions—was not possible. The TNF is a very diverse landscape with stands of trees of varying canopy closure. However, given our prior experience in working on this landscape, only a field-based sample could have been collected (given enough time and resources); a remotely sensed (or interpolated) database representing understory herbs would have been elusive.
As with efforts to model any rare species, the identification of new occurrence locations in new areas (not proximal to known sites), would offer the greatest improvement for predicting likely suitable areas. Turkeybeard is one of the few known fire-adapted forest understory plant species, and fire has been shown to have a significant positive effect on the reproductive performance of turkeybeard [
4]. Based on this importance, future modeling efforts for turkeybeard habitat suitability should consider including fire frequency or fire history variables. In the future, studies such as this may also benefit from including socio-cultural considerations in addition to environmental gradients for describing turkeybeard habitat suitability. For instance, in field observations, turkeybeard was located near hilltops in sheltered areas, suggesting that its location may be related to structures used by ancient Native American peoples who may have protected this species around their living areas and used it as building and fire-starting materials. While we found no readily available evidence for this at present, an investigation into the socio-cultural history of the region may provide clues.
Species habitat suitability models can be used to model rare species in support of forest management goals if appropriate caution is used when interpreting results. Managers may also use these models to identify and plan for enhancements to the distribution of important species in their landscape. It may also be useful to the decision-making process to model potential plant responses to future climate change. This preliminary modeling effort can be useful for identifying new turkeybeard population locations in the TNF and for better understanding the relationship between turkeybeard and environmental gradients that likely influence its distribution.
5. Conclusions
This study represents the first effort to model potentially suitable habitats for the rare and threatened turkeybeard in the southern part of its range. This study’s findings show that the distribution of turkeybeard is highly related to ground slope and temperature seasonality. Excluding known locations, the new areas the models indicated as potentially suitable for turkeybeard were very small in size and extent. However, based on the very low known presence of the species in the study area, this was expected. To improve models, surveys of turkeybeard populations across a larger range of soil types throughout the TNF would be useful. Prediction maps from this study can be used to guide future field surveys and improve the detection of unknown populations in the study area. For this rare plant species field survey, an ecologically informed meander search technique can be used. This study is also important for further forest management of the TNF and to help schedule forest management activities. There are currently no apparent protection plans for turkeybeard species in the TNF management plan, but a protection plan could be developed in the future if interest exists. Knowledge gained from this research can be used to establish and implement habitat management strategies for turkeybeard across the TNF. Further, a focus might be applied to this rare species to help understand the potential impacts of climate change, by modeling suitable habitat locations using various climate change scenarios. Incorporating habitat suitability modeling into management planning efforts may therefore facilitate the selection of sustainable forest management practices.