1. Introduction
The management of biodiversity within conservation reserves requires the control of species that expand their range at the detriment of other species. In regions that are relatively pristine in terms of disturbance, this change is inherently obvious as so-called ‘weeds’ colonize available space rapidly [
1]. Often, this process is assisted by the disturbance generated by external factors such as fires, animals, and humans. Direct control of the weeds becomes a priority before the system is beyond repair and the ecosystems are required to accept the change to a novel ecosystem [
2].
Early intervention is difficult when the vast area of the conservation area is inaccessible except by foot, and weed control requires significant physical and chemical effort to have any noticeable effect [
3]. This translates directly to high yearly expenditure on weed control and detection with significant demands to spatially prioritize efforts [
4]. The development of a strategic plan to ensure the greatest effectiveness of control efforts is essential but these plans are often constructed in the face of high data uncertainty and inadequate weed behaviour models. On-ground surveys need to be fully leveraged to expand and interpolate weed presence/absence observations to regions that were unable to be surveyed. Supporting data, such as the spatial extent of vulnerable vegetation communities, are required to provide a regional assessment. However, combining expert opinion with survey data and remotely sensed imagery for the purpose of estimating the probability of a particular weed occurring is not trivial [
3,
5].
Utilizing the theories of an ecological niche and environmental gradients is the foundation of habitat suitability probability modelling [
6,
7]. In this framework, the observation of the presence of weeds is statistically correlated to a suite of environmental conditions. For many applications of this approach, the assumption is that the system is in equilibrium and the absence of a species at an observed location indicates the likelihood of unsuitable environmental conditions [
7]. However, for emerging weeds that are in the early stages of colonization, the observation of ‘species absent’ has an additional meaning that the survey space may simply have avoided colonization due mainly to stochastic events. This habitat suitability dynamic is also complicated when survey results are incomplete due to resource limitations. The statistical correlations then will be ‘weak’ for weed species that may only occur in small fractions of the available habitat. It may even be possible that a suitable habitat is incorrectly classified as ‘unsuitable’ because the correlation has not been observed. Increased sampling effort combined with systematic sampling design will assist [
6] but expert opinion on specific weed species preferences may also be required. This expertise can often be acquired from weed occurrences in adjunct regions.
With such uncertainty regarding the impact and colonization success of weeds in a conservation area, the use of an adaptive management framework is important [
8]. Routine field work such as track maintenance and visitor facility upkeep can be combined with biodiversity actions such as weed control and surveying [
9]. Ideally, the feedback mechanisms in place for conservation managers, from weed observations to modelled vulnerability, can assist with a dynamic prioritization of targeted control actions. Equipping land managers with both the tools and knowledge to capture weed observations and environmental conditions is optimal to modelling the extent of the issues in the region [
6]. Habitat suitability modelling will require a sophisticated capacity to integrate disparate data and provide rapid updates of the infestation extent and intensity including previous measures of success in infestation control and contributing factors (i.e., soil disturbance). Adaptive management of the conservation areas requires a close linkage between monitoring, objectives, and action [
10]. Critically, conservation managers require a model of the vulnerability of weed infestations across a range of habitat types (to assist in survey strategies) combined with another model of site-level contributing factors that can be physically controlled.
Models suitable for this environment management need to be able to combine disparate data and require a common ‘currency’ to determine the relationships within the model. Simply combining the presence/absence of a weed with the coincident observation of a suite of environmental parameters ignores the complexities of the multicollinearity relationships between dependent variables [
11], i.e., rainfall, soil type, and disturbance. In order to restrain the model complexity to maintain predictive power while negotiating uncertainty limits and yet offer spatially valid estimations of vegetation dynamics, alternative modelling approaches will be required [
12]. One such approach is to base the probabilistic predictions on correlations between observations over space and time rather than formulate a set of precise interaction equations [
13]. Correlations in a trophodynamic system do not necessarily directly equate to metabolic, behavioural, or ecological processes but the tradeoff is the ability to predict with increased precision in a diverse and uncertain environment [
14].
Bayesian Networks (BNs) are one such modelling technique that is particularly popular in ecology due to the capacity to support both complexity and uncertainty simultaneously [
15,
16]. BNs offer the capacity to encompass complex interactions of disparate data types within a probabilistic framework with only a few limitations [
17,
18,
19,
20]. The Bayes rule, combined with the chain rule, enables the efficient propagation of conditional probability throughout a network structure [
21,
22]. The network design is typically the result of expert opinion although machine learning algorithms exist to formulate a possible network structure through an analysis of correlations [
20]. The parameterization of a BN model is through the inclusion of observational cases that fully or partly describe a system state. The more cases used to inform the conditional probabilities, within the model, the more accurate the predictions [
13]. Algorithms, such as expectation maximization, can assist in adjusting for missing data [
23]. Expert opinion, equations, numerical (continuous, discrete, and censored) data, and categorical data can be included in the model, which is particularly useful for socioecological models [
15].
Limitations relevant to ecosystem models include the prohibition of feedback loops and the inability to predict outside of the observational space [
16]. Feedback loops, in particular, have severely limited the application of BN to trophic dynamics but recent advances in network analyses [
18] and time aggregation have established an acceptable compromise. Eklöf et al. [
18] demonstrated the application of BN to extinction rates in food web models via the simplification and retention of fundamental pathways between groups of species. The BN is able to predict the likelihood of a system being in a particular state given additional evidence. However, this requires that the conditional probabilities (from observed cases) have been previously included in the model parameters. Predicting how the system will respond to conditions outside of the observation space requires the inclusion of expert-derived predictions, often in the form of equations, generated from models such as IPCC climate models or experiments on metabolic thresholds. Even with such input, the propagation of predictions to unobserved biotic interactions becomes uncertain with a significant loss of accuracy.
Interestingly, the primary concepts behind BNs are familiar to the general population. For example, when assessing the appropriate clothes to wear for a walk in the forest, people will gather up information about the likely weather patterns, the seasonal influences, the past experiences (being too hot or cold), and the available selection of clothes. The walker has a priori knowledge that the weather is uncertain and that events have a range of probabilities depending on the season and daily factors. The estimation of these probabilities in our minds is a regular occurrence but few people would use a mathematical approach to carefully define the likelihoods. The Bayes theorem permits the calculation of these probabilities so that we are not solely reliant on expert opinion and vulnerable to surprises [
23].
In this manuscript, we utilize the weed surveys conducted over multiple years in a remote section of the Australian coastline. The East Gippsland series of coastal national parks extend along an uninhabited and pristine coastline [
24,
25,
26] for 176 km (
Figure 1). The surveys encountered 84 weed species [
9] although many are not considered a threat to the ecosystems present. However, if the key weed species are permitted to flourish then endangered ecosystems such as wetlands and coastal dunes are likely to be diminished [
24,
26].
Here, we present the results of the two BN models that incorporate a range of influential data sets to generate predictive maps of weed distributions. Complimentary BN models at two alternative spatial scales are presented as a mechanism to assist with the adaptive management of an expansive conservation area. The two models presented are, in themselves, interesting reflections of the influences that determine the weed dynamics. The questions we address have a different focus. What ecosystems are vulnerable to weed infestations across the entire East Gippsland national park (in Victoria, Australia)? What contributing factors can be managed at the site level to control weed infestations?
2. Materials and Methods
In brief, the methods consisted of four parts: the collection of weed observations and in situ environmental data across the study area, the compilation of geospatial data for use in a regional-scale model, and the development of a casual network to inform the Bayesian Network.
2.1. The East Gippsland Study Area
The spectacular and unspoilt coastline of the East Gippsland study area includes UNESCO World Biosphere Reserves amongst a diverse suite of inlets, rocky headlands, and isolated beaches (
Figure 1). The enormous diversity of ecosystems from heathlands, dunes, rainforests, to majestic forests attracts visitors both nationally and internationally. The study area includes Croajingolong NP, Cape Conran NP, and Peach Tree Creek Reserve. The study area is 100,094 Hectares with a 176 km length of coastline with no significant human habitation in the region.
2.2. The Weeds Survey
Within the study area, the following landforms and features were surveyed for weeds:
Beach Strand: The area of beach between the high tide line and dunes.
Dune Complex: Primary (first) dune and swale beyond above beach strand.
Rocky Headlands: Elevated cape or point of land reaching out into the water, devoid of beach strand or dune characteristics.
Estuarine Shores: Areas of land abutting estuarine waters at the time of survey to a maximum of 250 m inland.
Human Access Nodes: Areas readily and frequently accessed by recreational users comprising the last 100 m of vehicular tracks servicing carparks and lookouts, and 20 m buffer around lookouts, carparks, and campgrounds.
Three key survey methods were applied across the study area:
Random stratified sampling (unbiased) of transects: The generation of 90 random point locations (using ET Geowizard within ARCGIS 10) within the ecological vegetation class (EVC) layer based on each area of an ecological vegetation class.
Random sampling (biassed) of past infestations: Biassed random transects across 110 locations within areas where weed species have previously been recorded.
Opportunistic searching: Data on weed species were recorded throughout the entire study area through meander searching. This involved crews of two people walking the entire stretch of the coastline within the study area between Point Ricardo and the NSW border.
For the surveys along the dune complex, the 3-way transect method was used. This required the surveyors to start at the beach then head inland up to 100 m inland (perpendicular to the water’s edge) over the fore dune and into the swale (where practical). Then, the surveyors follow for 100 m along the swale or dune. Finally, the surveyors turn back out to the beach, recording along all three sections. The weed cover and extent were recorded by the two surveyors who walked either side of the centre of the transect line (covering an estimated survey width of 20 m along each transect). A GPS was used to record the start and end points of each transect line (including change in direction) and location of weed species and related attributes (
Table 1). Additional site-based observations were also collected (
Table 2).
For the estuary or campground and activity nodes, the transect location involved the completion of a 2-way transect. The transect was commenced at the estuary or campground activity-node edge, heading directly away from the approximate centre of the node for 20 m.
The weed surveys conducted in November 2015 and 2016 noted 6 key species that were significant invasive pests in the region [
9]. A total of 2522 survey sites (1486 in 2015 and 1036 in 2016) were recorded along the coastline and the presence and absence of key weed species were noted as well as a range of environmental conditions. A linear distance of approximately 176 km of coast was surveyed in 2015 and repeated in 2016. During the 2016 survey, 173 transects were completed and 27 transects were abandoned and not completed due to steep inaccessible terrain, very close proximity of a transect to another transect, or lack of time on the day of surveying to complete the transect. The combined linear distance of transects is 2.3 km. A total of 84 different weed species (of which 8 were on adjoining private land) and 1538 weed records were captured during the survey. The 10 most frequent weed species recorded during the survey were Milk Thistle (
Sonchus sp. 33), Flatweed (
Hypochaeris sp. 35), Blackberry (
Rubus fruticosus aggregate 38), Panic Veldgrass (
Ehrharta erecta 47), Dolichos Pea (
Dipogon lignosus 50), Sea Rocket (
Cakile sp. 76), Coast Gladiolus (
Gladiolus gueinzii 87), Marram Grass (
Ammophila arenaria 175), Coast Capeweed (
Arctotheca populifolia 209), and Sea Spurge (
Euphorbia paralias 521). Sea Rocket and Marram Grass are actually the most common and so the number of observations represents the intersects within the transects.
2.3. Model Development
The primary motive for this project was to develop a regional model of the vulnerability of key weed species for the entire study area. However, given the imperative to address adaptive management processes, a local-site-scale model was also developed directly from the environmental and weed observation data. While the regional-scale model utilized covariate data that were recorded or modelled across the region to develop a spatially explicit set of predictions, the local site model was not spatially explicit and captured fine-scale observations that were pertinent to field-based operatives.
2.4. Regional-Scale Weed Vulnerability BN
The critical first step to the regional model development is the construction of a causal diagram [
20] for the immergence of weeds across the region. This required many iterations based on expert opinion to successfully capture the environmental influences and their association to weed colonization. Many region-scale environmental variables could have been included but were excluded simply due to the constraint of keeping a model sufficiently simple and manageable. Complementing this process was the availability of data that were sufficiently high-resolution and temporally relevant and had regional coverage. Spatial information on the activities of feral animals, for example, was not available with sufficient accuracy to include. Finally, the network diagram showing the various parameters and the cross linkages was agreed on by the authors. The site-scale model, in contrast, used a machine learning tree-augmented naïve (TAN) algorithm based on the survey data alone to generate a BN model [
27].
The data collection of environmental variables at the scales of the model output was gathered or created using GIS modelling techniques. The various data sources and complimentary metadata are listed in
Table 3. The GIS analysis was conducted in QGis Version 2.18.2 (QGIS Development Team, 2009). The resolution of the output was determined at 30 m by 30 m in order to capture some fine-scale features (precision) but remain sufficiently robust (accuracy) for the regional approach.
For every weed species, the spatial points showing the observed occurrence and the observations without any weeds were placed in separate shapefiles. The values for the raster environmental and GIS model data were extracted to every survey point. The attributes were exported, examined, and consolidated in R (Version 3.3.2) (R Core Team 2017). The scripts in R created a text file (referred to here as a ‘case’ file) where every spatial point was a data frame row with column information pertaining to the various lists of model parameters. Three case files were created for each weed. The first was the full survey case file with the associated environmental data. The second and third case files were the same file but randomly sampled for 20% and a complimentary 80% of the data.
The causal network formed the basis of a naive Bayesian Network (BN) within the Netica V6.04 software environment (Norsys Software Corp 2016). The conditional probability tables (CPTs) were updated by importing the 80% survey case file for the single weed using an expectation maximization procedure. This algorithm is particularly suited to data that contain significant levels of missing data [
23]. The BN model was compiled and contained the marginal probabilities for each parameter. Essentially, this was a reflection of the observed likelihood of any parameter occurring in the survey data set, similar to a histogram but with bins’ sizes reflecting the frequency of data.
The BN was then tested for predictive accuracy for each weed species using the associated 20% reduced data set. The testing compared the observations of species occurrence with the BN predictions given the environmental data. This generated a number of indices (correlation matrix error, Gini coefficient, and Area under ROC) that provide a measure of accuracy of the model structure and parameterization [
28]. The full survey case file was then used to totally update the CPT probabilities.
The study region case file was compiled from the centroids of all 30 m × 30 m raster cells in the study polygon and attributed with the regional data sets listed in
Table 3. This was used to predict the likelihood of a selected weed occurring within the entire study area. A new file that recorded the probability of a particular weed occurring, given the conditional probability of the environmental and social parameters, was generated. This file was subsequently joined to the spatial points’ file and used to map the distributions in the GIS.
The process of CPT updating is repeated for every key weed species so that the BN model structure (based on the causal diagram) remains consistent but the marginal probabilities are adjusted accordingly.
2.5. Local-Site-Scale BN
A second model was also developed from the information contained in the survey data alone. This model was not spatially explicit due to the fine-scale nature of the field-based observations and was used to describe the mechanisms that determine the local-scale processes promoting the occurrence and spread of the weeds. The selection of parameters to collect was based on the expert opinion of field staff with particular focus on Victorian national park operational management. Expert opinion generated the structure of the field survey data associations to develop the BN with the ‘common weed names’ as the target variable. The survey parameters observed during the field trip are detailed in
Table 1 and
Table 2. This model, due to the key factors observable only at a site level (i.e., soil disturbance and drainage), cannot be extrapolated to a regional scale but still serves to provide insights into the influences affecting weed spread. Critically, this model can inform park managers about the actions required to control weed infestations at a site level. This approach of generating two models at different scales supports the adaptive management framework by providing synthesized information about weed behaviour. Following systematic repeated surveys, the data can also reveal the effectiveness of control measures, vulnerability of habit types, and influential socioecological factors in weed colonization.