*Article* **Incorporating Industrial and Climatic Covariates into Analyses of Fish Health Indicators Measured in a Stream in Canada's Oil Sands Region**

**Tim J. Arciszewski 1,\*, Erin J. Ussery <sup>2</sup> and Mark E. McMaster <sup>2</sup>**


**Abstract:** Industrial and other human activities in Canada's oil sands region (OSR) influence the environment. However, these impacts can be challenging to separate from natural stresses in flowing waters by comparing upstream reference sites to downstream exposure locations. For example, health indicators of lake chub (*Couesius plumbeus*) compared between locations in the Ells River (Upper and Lower) in 2013 to 2015 and 2018 demonstrated statistical differences. To further examine the potential sources of variation in fish, we also analyzed data at sites over time. When fish captured in 2018 were compared to pooled reference years (2013–2015), results indicated multiple differences in fish, but most of the differences disappeared when environmental covariates were included in the Elastic Net (EN) regularized regression models. However, when industrial covariates were included separately in the EN, the large differences in 2018 also disappeared, also suggesting the potential influence of these covariables on the health of fish. Further ENs incorporating both environmental and industrial covariates along with other variables which may describe industrial and natural influences, such as spring or summer precipitation and summer wind speeds and distance-based penalty factors, also support some of the suspected and potential mechanisms of impact. Further exploratory analyses simulating changes from zero and the mean (industrial) activity levels using the regression equations respectively suggest effects exceeding established critical effect sizes (CES) for fish measurements may already be present or effects may occur with small future changes in some industrial activities. Additional simulations also suggest that changing regional hydrological and thermal regimes in the future may also cause changes in fish measurements exceeding the CESs. The results of this study suggest the wide applicability of the approach for monitoring the health of fish in the OSR and beyond. The results also suggest follow-up work required to further evaluate the veracity of the suggested relationships identified in this analysis.

**Keywords:** oil sands; fish; Ells River; elastic net; industry; climate; environmental health

#### **1. Introduction**

Documenting and assessing the status of biological populations and communities and identifying sources of variation is a priority for ecological monitoring and management programs [1–4], including those in Canada's Oil Sands Region (OSR) [5]. Among the studies performed in the OSR to understand the effects of industrial developments, analyses of fish health indicators have been undertaken in streams in the minable sub-region since the late 1990s [6–8]. Using designs inspired by the Federal Environmental Effects Monitoring program (EEM; [9,10]), these fish studies typically collected sentinel species from an exposed site and statistically evaluate anatomical measurements relative to an upstream or local reference site not exposed to the stressor of interest e.g., [11]. This design has also been used for other indicators and has routinely identified differences at the downstream exposure locations, suggesting the potential influence of industrial activities [12–18].

**Citation:** Arciszewski, T.J.; Ussery, E.J.; McMaster, M.E. Incorporating Industrial and Climatic Covariates into Analyses of Fish Health Indicators Measured in a Stream in Canada's Oil Sands Region. *Environments* **2022**, *9*, 73. https://doi.org/10.3390/ environments9060073

Academic Editor: Matteo Convertino

Received: 27 April 2022 Accepted: 7 June 2022 Published: 17 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**<sup>\*</sup>** Correspondence: tima@unb.ca

Although differences in environmental indicators measured at downstream sites compared to upstream references are commonly observed in the OSR, isolating the potential effects of industrial development is often a substantial challenge for monitoring in flowing waters in the region [19,20]. More specifically, while industrial developments have known physical impacts on the landscape and hydrological processes [6,21,22], oil sands mines are preferentially constructed where the bitumen deposits are thickest and the overburden is thinnest [23]. Streams in the areas targeted for development are often influenced by natural exposure to bituminous compounds eroded from the McMurray Formation and other substances derived from other geological strata, including other hydrocarbon-bearing formations present between the study locations [12,15,16,24,25]. Substances originating from deeper rock layers may also be transported to the surface in groundwater in the OSR [26–31]. Although some of the observed changes in environmental indicators in these areas may be associated with facilities, other phenomena may also affect the efficacy of spatial comparisons in the OSR. Contaminants of concern (CoCs) are emitted to the atmosphere from stacks and fugitive sources and many are deposited to the landscape [20,32–36]. While the most intense deposition of materials often occurs within 10–20 km of the sources, e.g., [35], the deposition varies annually [20,32,37,38] and may also occur in areas used as references for studies in streams [39–41]. Additionally, in situ oil sand facilities may also be present upstream of reference sites [16,18], further altering the sensitivity of spatial designs to industrial influences.

The potential overlap of stressors from multiple sources and the confounding of spatial designs often constrains the unequivocal identification of industrial influence in the OSR [19,20]. These challenges of spatial designs in streams from the OSR coupled with the desire to quantify any industrial influence on organisms suggests alternative approaches are required. Although many options are available [18,42], analyses of data collected at individual sites over time may be used to identify industrial influences. These temporal and site-specific analyses can account for natural differences between locations, have been routinely used in the OSR, and have identified likely effects of industrial developments, including some potential influences in streams [13–16,19,25,36,41,43–51].

Although a site-specific approach has been useful in many studies, the design also has challenges. For example, the chosen ecological indicators often vary naturally over time and this may interfere with the identification of any industrial influences. Additionally, preindustrial baselines may be absent. Whereas some measurement types, such as chemical indicators in lake sediments and peat cores [47,52], may be less sensitive to these challenges, studies in streams require deliberate solutions. To overcome these challenges in flowing waters, generic covariates available from meteorological and hydrological stations can be incorporated to remove their influence on environmental indicators [32,39,53]. Although any residual variation could be qualitatively associated with extraneous sources, descriptors of industrial performance for individual facilities can also be included to identify any potential associations between the indicators and the local development. Although not definitive, this approach may improve the utility of monitoring in the OSR, including identifying industrial influence without a pre-industrial baseline [20,32,39,54–56]

The purpose of this work is to evaluate the status of fish (lake chub; *Couesius plumbeus*) residing in the Ells River relative to industrial activity. To examine the potential effects of industrial influence, several approaches were used and were compared. First, the typical EEM approach was used to identify statistical differences both spatially and temporally using ordinary least squares (OLS) and generalized linear models (GLM) and were compared to the use of regression diagnostics. Second, the effect of including various sets of covariates on the temporal analyses was examined and the elastic net regularized regression (EN). This study suggests the analysis of the ecological data in exposure environments such as the Athabasca tributaries in the OSR may need to integrate environmental and industrial covariates to better account for variability in the data set and to enhance the sensitivity of the analysis to local human activities.

#### **2. Materials and Methods**

#### *2.1. Study Area*

This study was performed in the Ells River, a tributary of the Athabasca River in Canada's OSR. Two sites, Upper and Lower, along the Ells River were sampled consecutively between 2013 and 2015 and again in 2018 (Figure 1A). The Lower site is in the midst of oil sands development, and the Upper site is located approximately 35 river-km upstream of the Lower site. The Upper site is underlain by the Clearwater Formation, while the Lower site is underlain by the McMurray Formation (Figure 1B). During the initiation of this study (2013), site preparation and construction of the (Total) Joslyn North mine was underway, and the sites were selected to straddle this development (Figure 1C). However, the development of the Joslyn North mine was suspended in May 2014 [57,58]. Most of the development and site preparation in the Ells basin occurred upstream of the Lower location and downstream of the Upper site before the initiation of fish collections (Figure 1C; Supplemental Figure S1).

in 2017 and its data were not included in the current analysis), facilities paused in 2018 (red dashed outlines), 25 km distances from Ells locations (white dashing); orange outline = Ells basin; black outlines show upslope areas for fishing locations; dark red dot shows Mildred Lake meteorological station (3064528); ESRI World Imagery basemap ~ 2019; (**B**): bedrock formations of the Mannville group (Grand Rapids, Clearwater, McMurray) and the Devonian basement (Waterways); all other geological strata not shown; pink diamonds = stream gauges (S45 and S14A); purple star = stream gauge for Upper location in 2018 (07DA039); (**C**): land disturbances in and around the Ells basin by year.

Lake chub were selected as the sentinel species for work in the Ells River. Following the Canadian EEM protocols, ~20 adult female and 20 adult male lake chub were targeted using backpack electrofishers (Smith-Root Type LR-24; Alberta Environment and Parks Fisheries Research Licenses: 13-0445, 14-0456, 15-0456, and 18-0408). September is the typical period of recrudescence for the spring spawning lake chub [59], and fish were collected during this period (22–24 September 2013; 24–26 September 2014; 6–8 October 2015; 19–21 September 2018). The collected fish were euthanized by spinal severance following the Animal Use Protocols approved by Environment and Climate Change Canada's Animal Care Committee (1315, 1415, 1515 and 1815). Sex, body length (BL; ±1 mm), body weight (BW; ±0.001 g), gonad weight (GW; ±0.001 g) and liver weight (LW; ±0.001 g) were recorded. Actual sample sizes range from 12 to 21.

#### *2.2. Analysis of Data*

#### 2.2.1. Conventional Statistics and Regression Diagnostics

Conventional analyses of fish health (downstream exposure relative to upstream references) were performed in this study. Site differences in fish health endpoints (GW, LW, BW) were analyzed using the 'lm()' and 'glm()' functions in R to assess differences in slope and intercepts. Ordinary least squares (OLS) analyses used log10-transformed data. Generalized linear models (GLM) were applied using a gamma distribution and a log-link function; visual examination of residuals suggested no substantial deviation from normality, and this technique was used for the remainder of the analyses. All other fish health endpoints were assessed using 1-factor ANOVA followed by Tukey's post hoc test (normality of all data was confirmed using a Shapiro–Wilk test before subsequent statistical analysis; if data were not normally distributed, the data were log10-transformed). An α level of 0.05 was used to determine statistical significance for all tests. These analyses were performed both spatially (within years) and temporally (within sites). To correspond with the regression diagnostics approach, GLMs testing the statistical differences at sites in 2018 compared to a pooled group of fish from 2013 to 2015 were also applied at each location. The results of these analyses are described in detail in the Supplementary Information and Appendix A but are also summarized in the main text.

Additional analyses based on regression diagnostics [60] were also performed in this study. First, GLMs were used to calculate regression equations for GW and LW relative to BW and BW relative to BL at sites over time. From these models, the fish collected from 2013 to 2015 were used to estimate the expected ranges of residuals for fish captured in 2018. Additional regression diagnostics were conducted spatially using the fish captured at the Upper site to predict the GW, LW, and BW of fish for the Lower site.

Bootstrapping was used to calculate both the observed and expected ranges of residuals per year for the fish measurements evaluated here. For the observed residuals, a single bootstrap was used to calculate the central 95% confidence intervals of means. In contrast, a double bootstrap was used to estimate the expected ranges of fish in 2018 compared to the previous 3 years. Details of the single and double bootstrap techniques are found elsewhere [55,61].

#### 2.2.2. Elastic Net Regularized Regressions

The temporal analyses described immediately above examining the health metrics of fish over time identified potentially large differences in several measurements of fish health in 2018 compared to previous years (e.g., Figure 2). In addition, variability within the baseline period of 2013 to 2015 was also apparent among the fish health metrics, including model errors (e.g., Figure 2). Using environmental (ENV) and industrial (IND) covariates, the observed variability and potentially relevant differences prompted additional analyses using the elastic net regularized regression (EN; [62]) to identify potentially relevant drivers of the variability in the fish health metrics. The ENs were applied using the glmnet package in R [63] using penalty ratio (α) of 0.5 and a Gamma distribution with a log-link function. The minimum λ penalty was selected using leave-one-out-cross validation using the 'cv.glmnet()' function in the glmnet package.

**Figure 2.** Mean residuals of female endpoints measured at the Upper Ells (**A**) and Lower Ells (**B**) locations; 2013:2015 (gray symbols) are reference years; 2018 is the test year; red horizontal dashed and solid lines show expected ranges of mean residuals in 2018 given the reference data; open symbols suggest no difference in mean residuals; closed symbols suggest a difference in 2018; 'Fish' panes include no environmental or industrial covariates; 'ENV' includes environmental covariates only; 'IND' includes industrial covariates only (with no limits and no penalty factors); ENV+IND with no limits and no penalty factors ('NL-NPF'); ENV+IND with no limits and penalty factors ('NL-PF'); ENV+IND with upper limits and no penalty factors ('UL-NPF'); ENV+IND with upper limits and penalty factors ('UL-PF'); blue shading of panes indicates the most parsimonious models selected by AIC; yellow shading indicates marginally relevant models selected by AIC.

First, ENs were conducted using environmental covariates only. To capture potential natural contributions to indicators of fish health, three environmental covariates were used in this analysis: water temperature (WT), air temperature (AT), and stream discharge (SD; Supplemental Table S1; Supplemental Figure S2). Among these covariates, we included four periods: August 5 until the start of fish sampling, 60 days before fish sampling, summer (June, July, August), and June, July, August, and September until the date of fish capture. Each of these periods corresponds with the numerical descriptors 1–4, with greater values indicating longer periods. Additionally, we also used both means (*x*) and medians (<sup>∼</sup> *x*) for each of these periods. For example, 'AT3-*x*' represents the mean air temperature from 1 June to 31 August for each respective year of study (Supplemental Table S1; Supplemental Figure S2). Hobo Tidbit temperature recorders were deployed at each Ells River station in each sampling year. In 2015, deployment of the probes did not occur until August 4, and only two estimates of WT were used in this analysis. To compensate for the limited availability of WT, AT data were also obtained and used to calculate possible physical descriptors of environmental conditions in the Ells River. These AT data were obtained from the Mildred Lake meteorological station (Climate ID: 3064528). Stream discharge data were also obtained for the Ells River (2013–2015: S45 and 2018: 07DA039). Given the potential influence of industrial activities at the Joslyn North project, only discharge data from the upstream gauging locations (S45 and 07DA039) were used to approximate natural and unimpeded flows in the Ells River (e.g., [6,64]) for the initial analyses using only the environmental covariates (ENV models); as described below, discharge data from the station near the Lower Ells, S14A, were used in the ENV+IND EN models for the fish captured at that site. Although not an environmental covariate, Julian Day (JD) was also included as a selectable covariate to account for slight differences in the collection of fish among sampling years.

Next, an EN using only industrial covariates was also used. Industrial covariates from eleven facilities (mines: Horizon Mine (HM), Jackpine Mine (JPM), Kearl Mine (KM), Muskeg River Mine (MRM), Syncrude Aurora North (SAN), Suncor Basemine (SBM), and Syncrude Mildred Lake (SML); in situ: Suncor Firebag (SFB), Suncor MacKay River (SMR), Husky Sunrise (HS), and West Ells Sunshine (WES); Figure 1) were obtained from the Alberta Energy Regulator (Supplemental Figure S3; [65,66]). Although they varied by facility, the covariates included fuels combusted (e.g., process (PG-F) and natural gases (NG-F) and petroleum coke (PC-F)), products produced (e.g., crude bitumen (CB-P) and synthetic crude (SC-P)), the mass of mined bitumen (OS-M), stockpiling rate of petroleum coke (PC-SP), and materials flared/wasted (e.g., sulphur (S-FW), diluent naphtha (DN-FW), and crude bitumen (CB-FW)). Steam injection rates (ST) and bitumen production rates (BP) were also obtained for the in situ facilities (HS, SMR, SFB, WES). Data were obtained from mines and in situ facilities with at least two years of operation by 2018 (Figure 1). Summer sums for the industrial facilities were calculated using the reported values from June, July, August, and September. The full list of initial industrial variables is available in the Supplemental Information (Supplemental Table S2).

Land disturbance data was also used. Land disturbance data were obtained from the Alberta Biodiversity Monitoring Institute's 2018 Human footprint data ([67]; Figure 1; Supplemental Figure S1). Not all features in this data layer are dated, but the data serve as a proxy for the proportional increase in land disturbance per year in the watershed areas draining to each sampling location. The sub-watersheds for each stream location were calculated using the 'Upslope Area' function in Quantum GIS (QGIS version 3.16).

Finally, analyses using both environmental and industrial covariates were performed. These analyses used the covariates described above, but also included additional 'mixture' variables to account for the potential cumulative industrial influence, the physical transport processes linking facilities with streams, and other seasonal effects (Supplemental Figure S4; [39,55]). These mixture variables included estimates of mean spring discharge (SP-SD) and air temperature (SP-MT; from Mildred Lake). Spring was estimated as April, May, and June to capture the freshet period. Spring precipitation

estimates, mean precipitation per day (SP-MP), number of rain days (SP-RD), and the cumulative total precipitation (SP-TP) were also included. Additional metrics in the summer were also obtained, including mean (e.g., WS4-*x*) and upper percentile wind speeds (e.g., WS4-99) from the Bertha Ganter station in the Wood Buffalo Environmental Association air monitoring network and precipitation from the Mildred Lake meteorological station (e.g., P1-*x*) for the same periods outlined above. In total, the number of potential environmental, industrial, and mixture covariates included in the ENV+IND models was 147 (Supplemental Tables S1 and S2; Supplemental Figures S1–S3).

In addition to the expanded list of covariates, the ENs were further configured using arguments in the glmnet() function in R [63]. More specifically, an 'upper.limit = 0 argument was used for industrial variables to return only negative coefficients. Secondly, the 'penalty.factor' argument was used to account for the proximity of projects to the sampling locations (Supplemental Table S3). For example, the project boundaries of three operational mines, Horizon, Muskeg River, and Aurora North, are within 10 km of the Lower Ells location, but are physically outside the Ells basin (Figure 1; Supplemental Table S3). A penalty factor (PF) of ((distance (km)/100) + 1) was explored here. The scaled PF was selected during preliminary analyses in which km/10 was likely too punitive, whereas km/1000 had no effect. An exploratory analysis of the effect of varying PFs was also undertaken but showed little effect on the mean squared error (MSE) of the final models (Supplemental Figure S5). However, this preliminary analysis is further considered below (Section 3.3)

Based on the inclusion or exclusion of upper limits and PFs, four ENs were performed when environmental, industrial, and mixture variables were included. These analyses included no limits and no penalty factors (NL-NPF), upper limits of zero and no penalty factors (UL-NPF), no limits and (distance-based) penalty factors (NL-PF), and upper limits of zero and the distance-based penalty factors (UL-PF). To compare the performance of various EN models, the deviance ratio (DR), MSE, and Akaike's Information Criterion (AIC) were calculated [53]. Criteria to evaluate AICs followed generic guidance from the literature [68].

#### 2.2.3. Retrospective and Prospective Model Predictions

Most ENs selected industrial variables, and the regression formulae were also used to estimate the potential current or future effects of these industrial influences and projected climatic and hydrological changes. Among the industrial variables, two sets of predictive models were used. The mean of each industrial variable selected in the best-fit ENs from 2013 to 2015 and 2018 was used to estimate the magnitude of differences causing a change equivalent to the fish Critical Effect Size (CES) used for the morphological ratios of gonado-somatic, liver-somatic, and condition indices and applied to the GW, LW, and BW measurements examined here (±25% for GW and LW and ±10% for BW; [69]). These CESs are used in the EEM program and indicate a potentially high risk to the environment [69]. The same approach was also used to estimate the magnitude of change in the absence of an industrial variable (by setting initial conditions to zero) eliciting a response in a fish endpoint exceeding the established CESs. In addition, potential (and predicted) changes in air temperature (1 to 4 ◦C) and stream discharge (up to 20% decline; [70,71]) were used to estimate potential future effects of these changes on GW, LW, and BW of lake chub captured in the Ells basin predicted by the various EN models selected by AIC.

#### **3. Results and Discussion**

#### *3.1. Analyses without Environmental or Industrial Covariates*

As described in detail in the Appendix A, statistical comparisons of fish captured at the Lower Ells compared to fish from the Upper Ells using OLS and GLMs often showed significant differences between sites, and many were larger than the stipulated CESs (Supplemental Tables S3–S6; Supplemental Figures S6–S13). Although the statistical differences between sites were not always present, and the directions or magnitudes of

observed differences are not always consistent, two large consecutive differences were observed in GW of females in 2015 and 2018 (Supplemental Table S6) and the GW of males at the Lower Ells was consistently below the value precited from the Upper location (Supplemental Figure S7). These differences parallel other studies in the Ells River showing some differences between upstream and downstream locations [8,14,16,24], but as in some of the previous studies, the role of industrial influences above any differences driven by habitat e.g., [56] are not clear. However, the results would prompt follow-up work in a standard EEM study, including site-specific analyses e.g., [9,72].

As described in the Appendix A, site-specific analyses were performed using traditional EEM statistics (OLS), GLMs, and regression diagnostics. Results from both the traditional EEM OLS analyses and the GLMs generally suggest no consistent differences within sites over time (Supplemental Tables S7 and S8), but some distinctions were also apparent. Among the largest was the common detection of significantly different slopes in GW of female lake chub captured at both sites in GLMs, but not in the OLS (Supplemental Tables S7 and S8). Although the detection of differences in the slope using the GLM, but not the OLS, may be related to the transformation of data before the OLS, overall, there are few temporally persistent patterns suggesting a potential influence of annually variable factors, such as temperature or industrial activity. However, the GW of females captured at the Lower Ells site was statistically greater and larger than the CES of 25% in 2013 compared to all other years (Supplemental Tables S7 and S8).

In this study, we used site-specific analyses over time to determine the health of fish in the Ells River to address the known interpretative limitations of comparing downstream exposure locations to upstream reference sites in the OSR. Corresponding with the remainder of the results described below (Section 3.2), site-specific analyses were also performed using GLMs and regression diagnostics using a pooled reference group of years (2013 to 2015) to determine the potential differences in 2018. The results from the regression (residual) diagnostics mirrored those from the Gamma GLMs performed using the pooled 2013–2015 group compared to 2018 (Figures 2 and 3; Supplemental Table S9). Differences in 2018 compared to 2013–2015 were observed in multiple endpoints, including LW of females from the Upper site and in GW, LW, and BW of females collected at the Lower site (Figure 2). Only one difference identified by the Gamma GLM, the slope of GW of males at the Lower site, was not also identified by regression diagnostics (Figure 3; Supplemental Table S9). However, as described in the Appendix A, tests of intercept may be robust to statistical differences in slope between 0.01 and 0.05 (*p* = 0.043). Some of the temporal differences at sites may be attributable to industrial development, including large declines in GW of females at the Upper site and declines in LW of females at the Upper site in 2018 compared to 2013–2015 (Supplemental Table S9), but the analytical approach used in this portion of the analysis also incorporates temporal variability which may be associated with environmental and/or industrial drivers, suggesting (along with the results of the spatial analyses described above) additional data, analyses, and interpretation are required.

#### *3.2. Site-Specific Analyses with Covariates*

#### 3.2.1. Which EN Models Performed the Best?

Analyses to include covariates in this study used the EN and included models with only environmental covariates (ENV), only industrial covariates (IND), and those which used both ENV and IND covariates along with various constraints, including ULs of zero for coefficients and PFs for industrial parameters based on the distance of a sampling site to a facility. Among the 12 fish measurements among sites and sexes, the fish-only models for LW and GW of males and females at both sites suggested these models performed the best and as expected that most of the variation in these measurements is influenced by BW of fish ([54]; Supplemental Figures S14 and S15). However, the MSE of the GLM 'fish' models (those including no ENV or IND covariates) were also occasionally the highest, especially in females (Supplemental Figures S14 and S15). In contrast, the BW (vs. BL) models suggested other ENs including ENV and/or IND covariates were preferred

(Supplemental Figures S14 and S15). Irrespective of the AICs suggesting the relative importance of the GLMs including only fish data, the occurrence of large differences in residual variation in 2018 compared to the baseline years (2013 to 2015; Figures 2 and 3) suggested the potential influence of drivers other than BW and BL on lake chub in the Ells River and the potential need for more complex explanatory models [55].

**Figure 3.** Mean residuals of male endpoints measured at the Upper Ells (**A**) and Lower Ells (**B**) locations; 2013–2015 (gray symbols) are reference years; 2018 is test year; red horizontal dashed and solid lines show expected ranges of mean residuals in 2018 given the reference data; open symbols suggest no difference in mean residuals; closed symbols suggest a difference in 2018; 'Fish' panes include models with no environmental or industrial covariates; 'ENV' includes environmental covariates only; IND includes industrial covariates only (with no limits and no penalty factors); ENV+IND with no limits and no penalty factors (NL-NPF); ENV+IND with no limits and penalty factors (NL-PF); ENV+IND with upper limits and no penalty factors (UL-NPF); ENV+IND with upper limits and penalty factors (UL-PF); blue shading of panes shows the most parsimonious models identified by AIC; yellow shading indicates marginally relevant models selected by AIC.

When only the more complex models with larger sets of initial predictors (i.e., those including environmental and/or industrial covariates) are considered, among the 12 fish measurements among sites and sexes, the ENV model was either the best or among the best explanatory models in 10 and was marginal in an 11th (LW of males at the Upper site; Figures 2 and 3; Supplemental Figures S16 and S17). In the 12th endpoint, BW of females at

the Upper site, the AIC of the ENV model was the highest suggesting this model performed the worst (Supplemental Figure S16). Although not always the minimum AIC, the IND models were also commonly selected. For example, the IND models were identified by AIC for GW, LW, and BW of males at the Upper site, BW of females at the Lower site, and GW of females at the Upper site (Supplemental Figures S16 and S17). Other ENs combining ENV and IND variables were also selected by AIC. For example, the UL-PF, NL-PF, and UL-NPF ENV+IND ENs (and the IND model) were among the best for the LW of males at the Upper site (Supplemental Figure S17). Similarly, the UL-PF and the NL-PF ENV+IND models for BW of females at the Upper site had the lowest AIC value (Supplemental Figure S16). The UL-PF and NL-PF models were also selected at the Lower site for the LW of males (Supplemental Figure S17) and GW and LW of females (Supplemental Figure S16). In contrast, the AICs for GW of males and females at the Upper site suggested all models performed similarly (Figures 2 and 3; Supplemental Figures S16 and S17). Finally, although not included in the best set of models, some were deemed marginally relevant by the AIC criteria [68], including several at the Lower site: IND for BW of males, NL-NPF for LW of females and GW of males, and UL-NPF for GW of females and LW of males (Supplemental Figures S16 and S17).

These results suggest some of these more complex models may include variables affecting fish health and may contribute to the declines in GW of females at the Lower site exceeding the CES compared to the Upper site (Supplemental Table S6). The selected models suggest the importance of including environmental covariates, but also the potential for industrial variables to explain the residual variation in the health indicators of lake chub to better focus on stressors of interest [54,55]. However, the analyses also suggested penalty factors may also be useful and are deserving of additional attention.

Although the remainder of this study focuses on the models selected by AIC, the selection of multiple models explaining the reducible error also suggests other criteria may be necessary for implementing the approach used here. For example, while the UL-PF models may represent the most realistic scenario examined in this study, there may be advantages to using the IND models by accounting for the greater likelihood of false positives, including defining testable hypotheses and follow-up studies [72]. Including covariates within the initial analyses and using multiple model forms may also accelerate the development of focused studies to quickly hone the monitoring approaches by further establishing testable and specific hypotheses [73,74].

3.2.2. Are Large Differences Apparent When Environmental and Industrial Covariates Are Included?

All estimates of fish health measured in lake chub captured in the Ells River in 2018 were within the ranges expected from baseline fish collected at each site from 2013 to 2015 when environmental and industrial covariates were included in the models, except LW of males from the Upper site (e.g., Figures 2 and 3). Although the tails of the estimated 95% CIs of the mean annual residuals were occasionally beyond the estimated outer tolerance limits, such as GW of females at the Upper site (Figure 2), these results typically suggest the environmental and industrial factors may be driving some of the variability in the measurements of fish health. In contrast, the mean residual difference in the LW of males from the Upper site was never inside the range of mean residuals irrespective of the covariates and the constraints used (e.g., Figure 3), suggesting that a correlated or causal factor was not included in the set of selectable variables. Despite this result, the correspondence of the GLM and regression diagnostic approach for evaluating sites over time and the common detection of spatial differences (e.g., Figure 3) suggests the utility of this approach. The correspondence between the GLMs and the regression diagnostics further supports the utility of the latter to identify large differences and prompting followup analyses or future field studies [72]. Additionally, the approach also highlights the potential sensitivity of these techniques to small influences and the likely utility of the temporal analyses irrespective of the results of spatial comparisons [39]. As discussed

below, the site-specific approach also identified the potential influence of industrial activity at both locations, further supporting site-specific analyses.

3.2.3. Variables Selected by Elastic Net Regularized Regression Influence of Environmental Variation

The results presented already suggest the potential influence of environmental and industrial covariates on the health indicators of lake chub captured in the Ells River, but which variables were selected also provides additional information on the potential drivers of ecological conditions. Among the ENs selected by AIC, all except the GW of males from the Upper site selected additional covariates compared to the independent variables included in the OLS and GLMs (Tables 1 and 2). This result suggests the GW of males at the Upper site is the least sensitive endpoint measured in this study. In contrast, the GW of males at the Lower site suggested the influence of recent AT (Table 2).

**Table 1.** Environmental and industrial covariates selected by EN for gonad weight (GW), liver wight (LW), and body weight (BW) of female fish captured at the Upper and Lower Ells locations; NL = no limits; UL = upper limits; NPF = no penalty factors; PF = penalty factors; bolded white text with saturated fill color in cells comprise best-fit judged by AIC values; red = positive coefficients; blue = negative coefficients; marginal models indicated as unsaturated fill and black text.



**Table 1.** *Cont.*

Among the ENV+IND ENs, the environmental or mixture variables tended to be selected more often than the industrial variables (Tables 1 and 2). Across sites, the parameters selected for males and females at the Lower site tended to be more consistent compared to the Upper site (Tables 1 and 2). For example, AT was commonly selected in models for both males and females at the Lower site (Tables 1 and 2), but AT was rarely identified by EN at the Upper site (Table 2).

Among the models of GW and LW for males at the Lower site, only median ATs from the shorter periods were selected: August 4 to the start of fishing (AT1-<sup>∼</sup> *x*) and 60 days before the collection of fish (AT2-<sup>∼</sup> *x*; Table 2). Although not definitive, there are also some data from female lake chub, suggesting that shorter periods of air temperature are more closely associated with GW and LW and BW with longer periods (Table 2) matching with the environmental biology of fishes [59].

Other spatial patterns among the selected parameters for the metrics of fish health were also identified. For example, estimates of stream discharge (SD) were more commonly selected at the Upper location compared to the Lower site (Tables 1 and 2). When metrics of SD were selected, greater SD tended to reduce fish health measurements, but the single SD measurement selected at the Lower site (female GW; SD3-<sup>∼</sup> *x*; Tables 1 and 2), had a positive effect. Although constrained by the late deployment of probes in 2015, the potential influence of water temperature (WT1-*x*) was identified by EN in GW and LW of females captured at the Lower site, and, similar to AT and consistent with expectations, higher WT was associated with increases in fish metrics (e.g., Table 1). Although not strictly an 'environmental' covariate, the later collection of fish estimated by the Julian Day (JD) was also associated with increases in some metrics of fish health, including GW and LW of females and BW of males at the Upper site (Tables 1 and 2).

**Table 2.** Environmental and industrial covariates selected by EN for gonad weight (GW), liver wight (LW), and body weight (BW) of male fish captured at the Upper and Lower Ells locations; NL = no limits; UL = upper limits; NPF = no penalty factors; PF = penalty factors; bolded white text with saturated fill color in cells comprise best-fit judged by AIC values; red = positive coefficients; blue = negative coefficients; marginal models indicated as unsaturated fill and black text.


This study identified potential drivers of annual variability often reflecting known or expected relationships with environmental predictors. For example, sampling of fish later in the year, estimated by Julian Day, tended to increase fish measurements, e.g., [59] as did warmer air and water temperatures [54,55]. In contrast, greater SD tended to reduce the fish health metrics, especially at the Upper site, and may be associated with greater energetic costs of higher flows [54,55]. Although similar results are also reported elsewhere [24,39,56], relationships with natural drivers suggest that effects in fish may occur in the future via expected changes in thermal and hydrological regimes in northeastern Alberta [70,71] and are explored below (See Section 3.2.4). Overall, the analyses reinforce the importance of accounting for background drivers of variability in temporal analyses of fish health (e.g., [53,54]) and how identifying these potential relationships improves on previous approaches [20]. There may, however, also be a role of temperature in greater emissions from some sources, such as tailings ponds and mine faces [75,76], suggesting that other studies may be needed to further improve the monitoring results.

#### Influence of Industrial Activity

The potential influence of industrial activity on the health indicators of lake chub residing in the Ells River were also identified in this study. The industrial influences were suggested in the IND models, including factors such as HM-DN-FW, HM-OS-M, MRM-DN-F, and SBM-PG-FW (Table 1) and others in the ENV+IND models (Tables 1 and 2). For example, the best fit ENV+IND models for GW of females at the Upper site suggested a negative influence of MRM-CB-FW and MRM-NG-F (Table 1). In contrast, female GW at the Lower site may be negatively influenced by SAN-SC-F (Table 1). Many of these potential relationships occurred only in the NPF ENs, but the potential effects of CB-P at MRM and SC-F at SAN were, respectively, identified in PF models for LW and GW of females at the Lower location (Table 2). Some of the results from the marginal AIC models, such as a negative influence of SBM-PC-F on LW of males (Table 2; UL-NPF) and females (Table 1; NL-NPF, UL-NPF, and IND) at the Lower site and increases in LW of females with greater SBM-NG-F at the Lower site (Table 2; NL-NPF) were also identified.

Although no estimates of relative land disturbance were selected as a relevant factor in any of the ENs (Tables 1 and 2), the selected models did identify the potential roles of multiple types of industrial practices, including flaring and wasting of materials (FW), the combustion of fuels (F), and the production of substances, such as crude bitumen (CB-P) and synthetic crude (SC-P), on indicators of fish health at both locations (Tables 1 and 2). The results also highlight the potential influence of mechanically generated dust, combustion products, and other normal operations along with operational inefficiencies or upsets on ecosystem conditions either suggested, supported, or shown in other studies, e.g., [35,77,78].

While there could be effects of local stressors on fish in the Ells, such as road traffic related to production at HM, all of the industrial facilities identified in the best-fit ENs are physically outside the Ells basin. The potential influence of extrabasin facilities on both sites in the Ells watershed, coupled with no effects of land disturbance, suggests that the primary exposure mechanism is via atmospheric emissions and subsequent deposition of CoCs [20,32,34–38,47,78–80]. These results reinforce findings elsewhere, e.g., [32], including additional analyses suggesting the potential influence of the MRM on benthic invertebrates in the Ells basin [39] and data suggest CoCs are depositing throughout the watershed [37,41,81–83]. Contaminants of concern may be accumulating in headwater lakes in the Ells basin and may originate from oil sands industrial activity [20,41], but additional work on sediments in lakes throughout the region, such as lake SE22 in the Steepbank basin [32,41,47] also suggests that some reference sites may be influenced by industrial activity, albeit to a lesser degree than the downstream locations. Whereas some studies may still detect differences over time despite the potential influence of atmospheric emissions and deposition from industrial development in reference areas [51], accounting for industrial activity at all locations may enhance the sensitivity of monitoring to the stressors of interest. However, industrial emissions may also be associated with enrichment effects [39,84,85], and more work is likely required to determine the veracity of these potentially meaningful associations. The implications of associations between aquatic ecosystems and industrial emissions for future monitoring in Canada's OSR e.g., [32], such as the need to incorporate the results of deposition models [82,86], more detailed records of industrial

performance, and/or investigations of more specific effect pathways into an integrated design [87] also likely requires additional work.

The analyses also highlight the likely importance of proximity of facilities to exposure locations. Among facilities identified by AIC in the IND and ENV+IND models, the influence of SML (14 km), MRM (16 km), and SBM (30 km) on fish captured at the Upper site was suggested in the AIC-selected models (Tables 1 and 2). At the Lower site, the influence of HM (5 km), MRM (6 km), SAN (7 km), KM (28 km), and SFB (36 km) were identified in the AIC-selected models (Tables 1 and 2; Supplemental Table S3). The identification of potential relationships between industrial features and indicators of fish health suggests study sites adjacent to industrial facilities may also be influenced by activity from multiple, and potentially remote, operations. However, while the amount of active land disturbance during the study period was low and was not associated with annual variability of fish health metrics, study sites adjacent to facilities may also influenced by activities at those local operations. The influence of land disturbance was apparent in other studies on benthic invertebrates [39] and may be partially responsible for slight temperature differences between sites in 2013 compared to the other study years (Supplemental Figure S18). Greater local land disturbance in watersheds may also lead to a greater influence of atmospherically deposited materials [64], suggesting the potential importance of activity levels or types of land disturbances and the potential need for additional analyses.

Additionally, some results also suggest that background environmental effects may mask industrial influences. The positive effects of AT and negative effects of MRM-CB-FW were concurrently identified in the LW of females from the Lower site (Table 1). This pattern also occurs with the additional 'mixture' variables, and its implications are discussed in the next section.

#### Influence of 'Mixture' Variables

Assessing the impacts of atmospheric deposition on the health of biota living in streams is also affected by the transport mechanisms linking industrial facilities and exposure environments. For example, the emission and deposition rates of materials from some sources may also be affected by climatic conditions, including wind speeds and precipitation [32,75,76], while these same processes may also affect their post-deposition mobilization and transport to waterbodies [88]. To account for this possibility, additional 'mixture' variables (along with an expanded list of environmental covariates), such as spring and summer precipitation, was also included in this analysis and the ENV+IND models. Among the ENs, these additional environmental covariates were commonly selected (Tables 1 and 2). Among the spring precipitation metrics, SP-MP, SP-TP, and SP-RD were only identified in EN models for lake chub captured at the Upper location (Tables 1 and 2). More specifically, increases in SP-MP and SP-TP were associated with reduced BW in females at the Upper site (Table 1), while greater rain days in the spring (SP-RD) may have reduced the LW of males from the same site (Table 2). In contrast, neither mean spring air temperature (SP-MT) nor mean spring discharge (SP-SD) were selected by any of the ENV+IND ENs (Tables 1 and 2).

Summer wind metrics and summer precipitation were also included as selectable variables in the ENV+IND ENs. The potential effects of summer wind, such as upper percentiles (e.g., WS3-99) and measures of central tendency over various periods, were not included in the best-fit models (Tables 1 and 2). However, both positive and negative effects of greater summer rain were identified in the metrics of fish health. Among females, increases in the P1-99 and P2-99 variables were associated with greater GW of females from the Lower site (Table 1). Increases in the number of rain days from June 1 to August 31 (P3- RD) was associated with increases in GW and BW of females at the Upper site (Table 1), but greater P2-75 was associated with smaller GW of females from the same location (Table 1).

These results suggest the potential influence of summer and spring precipitation on lake chub in the Ells basin supporting earlier hypotheses [64,88]. Although a plausible mechanism linking emissions from industrial facilities and effects on stream biota, the parsing of metrics such as precipitation to 'natural' or 'anthropogenic' causes is not clear. For example, while CoCs deposited on the landscape may migrate to surface water bodies, greater flows in rivers may also increase the natural scouring of the substrate [14,88–92], and, similar to potential interactions of some anthropogenic activities and AT mentioned already, the co-occurrence of drivers may obscure, mask, or counteract some industrial influences [14,88,89,91]. However, the results in this study also suggest that inputs of industrial CoCs in the spring, while they may be difficult to detect, may be more apparent at sites higher in watersheds.

The results also have further implications for how industrial activity may affect the environment. Along with pulses of CoCs during expected freshet, which are often difficult to disentangle from the accompanying in-stream scouring effects [64,89,93], additional complexity may also be present. The results of this study further suggest effects on fish metrics, such as GW of females at the Upper site (Table 1), may be associated with the magnitude (e.g., P2-75) and frequency (e.g., P3-RD) of rain events over slightly differing periods. Furthermore, the selection of precipitation metrics without industrial metrics (e.g., female BW at the Upper site; Table 1) and with industrial predictors showing opposite effects (GW of Lower females; Table 1), such as P1-99, P2-99, and SAN-SC-F suggest the potential for antagonistic effects of multiple factors, including mediating effects of additional transport mechanics, such as landscape retention [88]. However, there is also some evidence in GW of females from the Upper site suggesting synergistic associations between large precipitation events (P2-75) and metrics of industrial activity (MRM-CB-FW; Table 1). Although the effects of precipitation on stream chemistry are well-established [64,94], the data here and elsewhere [55] further suggest potential associations with the status of biological indicators [39]. However, as mentioned already, similar interactions between industrial activity and environmental variables may also extend to air temperature [75,76].

#### 3.2.4. Predicting Exceedances of Critical Effect Sizes

Potential Effects of Industrial Activity

The final portion of this study used the effect sizes of industrial predictors across the ENs to estimate the degree of change in those industrial predictors which may already be associated with effects on fish equal to or exceeding the standard CES used in studies of fish health indicators [69]. When predicting the potential existing effects of industrial activity by setting initial conditions to zero, increases in the mass of mined oil sands at MRM (MRM-OS-M) may currently be associated with declines in BW of males at the Upper site ranging from ~12% to 17%. Similarly, increases in the volume of crude bitumen produced at MRM (MRM-CB-P) may be associated with declines in LW in females from the Lower site ranging from 26% to 38% (Figure 4). In contrast, increases in BW of males at the Upper site ranging from ~41 to 73% may also be associated with greater SML-PC-F (Figure 4). Other effects exceeding the CES may also be occasionally or commonly observed in response to other industrial features, such as a potential influence of SFB-ST on BW of females from the Lower site, declines in GW of females at the Upper site with greater MRM-CB-FW, and increases in LW of males at the Upper site with greater SML-PC-P (Figure 4). In contrast, other effects on fish among the models estimating the onset of industrial activity are typically below the CES, including HM-DN-FW and SBM-PG-FW (Supplemental Figure S19). Although challenging to confirm with existing data, these results suggest the possibility of the existing influence of industrial development on the physiology of sentinel fish beyond the relevant CESs, but also highlight potential counteracting effects of activities and/or climatic conditions.

**Figure 4.** Effect size plots from zero industrial activity (horizontal blue lines) with greater or lesser industrial activities; purple lines = females; dark gray = males; solid lines = gonad weight; dashed lines = liver weight; dotted lines = body weight; L = Lower site; U = Upper site; D = decrease in fish measurement; I = increase in fish measurement; black lines = mean daily industrial metrics from 2010-2020; blue dotes = June, July, August, September; Red lines = mean summer values for industrial endpoints.

Change from the 2013 to 2015 and 2018 mean activity levels of various industrial facilities can also be estimated (Figure 5). When estimates from the mean values of industrial parameters from the summers of 2013–2015 and 2018 are calculated, many models suggested no differences would be expected in fish endpoints with known changes in industrial factors, even when those features are part of the best performing models selected by AIC, such as increases in LW in males from the Upper site in response to changes in SML-SC-P and decreases in GW of females at the Upper site with greater MRM-NG-F (Supplemental Figure S20) or with decreases in female LW at the Lower site with greater MRM-CB-P (Figure 5). Although activity levels at some facilities would need to increase by large degrees to affect fish relative to their status in 2013–2018, such as 53 to 373 times greater oil sand production at HM (HM-OS-M; Supplemental Figure S20), other current activity levels may require smaller increases in activity to potentially affect fish, such as 1.01 times the level of SML-PC-F in 2010 and 6.7 times the level of SML-PC-P in 2018 to increase BW of males at the Upper site (Figure 5) and may occur in the future. However, other predictions also suggest some exceedances may have already occurred. For example, the models predicted changes in BW and LW of males captured at the Upper site in 2019 (and BW in 2020) may have been larger than the CES compared to fish captured earlier in response to the estimated effect of SML-PC-F calculated using the AIC-selected IND model (Figure 5).

**Figure 5.** Effect size plots from mean (horizontal blue lines) with greater or lesser industrial activities; purple lines = females; dark gray = males; solid lines = gonad weight; dashed lines = liver weight; dotted lines = body weight; L = Lower site; U = Upper site; D = decrease in fish measurement; I = increase in fish measurement; black lines = monthly industrial values from 2010–2020; blue dotes = June, July, August, September; Red lines = mean summer values for industrial endpoints.

Predicting the impacts of industrial facilities is a goal of the Oil Sands Monitoring program [5]. Although the initial and ongoing influence of industrial development has been demonstrated in some chemical indicators [35,41,47], research in flowing waters has not been performed using undisturbed baseline data, although some potential impacts of industrial development have been identified despite this [50,51]. Although additional data suggest catch rates of lake chub in the Ells basin are increasing over time (Supplemental Figure S21), the analyses here suggest that some impacts of industrial development may be present and may be identified without measured pre-disturbance baselines. These analyses may also be used elsewhere to further simulate pre-disturbance baselines and expected ranges of fish health indicators in the absence of some or all industrial facilities.

Potential Effects of Future Changes in Air Temperature and Stream Discharge

Although industrial impacts may already be present or may occur in the future with changes in activity at the local facilities, physical conditions in water bodies also affects the health status of fish populations, e.g., [95]. Future changes in both air temperature and stream discharge are projected in northern Alberta and these drivers may affect fish [70,71]. When potential changes in summer stream discharge and summer air temperature are used to project future changes in fish health, some effects approaching or exceeding the CESs for GW, LW, and BW may occur in lake chub residing in the Ells River. For example, differences exceeding the CESs may occur in LW and BW of females captured at the Lower site with increases in mean air temperature exceeding ~2 to 3 ◦C when estimated using the ENV models (Figure 6). Increases approaching 25% are also predicted for GW of female lake chub captured at the Lower site with mean increases of 4 ◦C in AT, but could be reduced with concurrent and progressively larger declines in SD (Figure 6). Potential

effects exceeding the 25% CES for LW of females at the Lower site are also predicted in the UL-PF and NL-PF models if the mean AT increases by ~4 ◦C (Figure 6). Increases in LW of females at the Upper site and BW of males at the Lower site are also suggested with increases in temperature and/or declines in SD if the ENV models are correct (Figure 6). Similar to the current observations and the results of other studies showing potential responses to warming occurring over decades [41,96], these results suggest background environmental variables are necessary to account for the effects climate change on fish health measurements, especially in long-term monitoring programs [72]. However, as mentioned already, there may also be consequential relationships between greater air temperature and industrial emissions [75,76] or indirect effects of industry embedded in stream discharge measurements, including less mobilization of CoCs from the landscape.

**Figure 6.** Potential effects of projected future changes in mean air temperature (MT; 1 to 4 ◦C; vertical scale) and stream discharge (SD; −5% to −20%) in Alberta on male and female lake chub captured in the Ells River basin, 2013–2015 and 2018 only among the best-fit models selected by AIC.

#### *3.3. Challenges with the Analytical Approach*

Although these analyses suggest some compelling relationships between natural and anthropogenic factors and echo other results [20,32,39,97], many uncertainties remain. First, the magnitude and occurrence of an effect depended on the EN configuration, such as the inclusion of upper limits and/or PFs and the utility of using an upper limit of zero is not clear. Although negative impacts of oil sands industrial activity are often expected and emphasized in reporting [90], some enriching effects of industrial processes are also possible [39,85]. Additionally, zones of exposure from one facility to another may be especially challenging to approximate. Whereas a single PF was used for the main analyses here, additional exploratory analyses used a range (Supplemental Figure S5). The exploratory simulations of varying the PFs suggest different influences of different PFs on the MSE of selected models (Supplemental Figure S5). For example, some suggest PFs which remove all industrial features have the lowest MSE (e.g., female Lower LW), whereas some suggest the opposite (e.g., female Lower GW), and others suggest some other mid-point is optimal (e.g., male Lower GW).

Other uncertainties also remain. Although additional covariates account for some variability in fish endpoints, some residual variation remains suggesting additional factors may be influencing the health measurements of fish. Additionally, some proxies, such as air temperatures from Mildred Lake, may not precisely or accurately describe the conditions affecting study sites in the Ells basin, e.g., [31]. Although identifying natural influences may be used to isolate the impacts of industrial or other anthropogenic drivers [54,55], these analyses are also retrospective and limited to data recorded for parallel purposes. Consequently, any of the established relationships may be causal, and while correlations are relevant for an adaptive monitoring program [72], their explicitness and directness cannot be established with the existing retrospective and exploratory analyses; the results of this study cannot be used to prove the occurrence of industrial impacts on fish in the OSR but may be used to determine the necessity of future monitoring and importantly may direct its form.

#### *3.4. Future Work*

The results of this study suggest several patterns that may be relevant to managers, scientists, and or others. A potentially relevant observation in this study was the sensitivity of fish BW to both environmental and industrial covariates (e.g., Figures 2 and 3), echoing other work [39,55]. Among the fish health endpoints examined here, BW can be measured non-lethally [98] and can be used to identify the effects of human activity [99]. These results suggest studies could couple any lethal collections done every 3–4 years with more frequent non-lethal collections of adults to rapidly augment a dataset. Additional data from other species or lake chub from other sites sampled in the Ells River may also be available [8,14,56], but would require techniques suitable for random effects.

In addition to the predictions made using the data from 2010 to 2020, future industrial scheduling may be used to design programs. The Joslyn North project was acquired by Canadian Natural in 2018 and has been incorporated into the mine planning for Horizon [58]. Other changes in operations are also either planned or underway, such as the Mildred Lake expansion in the MacKay basin [100] and the replacement of the coke-fueled boilers at Suncor [101].

Finally, the analyses suggest some potential impacts on lake chub in the Ells River may already be present, diminishing the relevance of identifying differences over time. The potential for existing impacts at the Ells locations, the challenges of comparisons to a spatial reference (although there is some evidence suggesting the GW of females are commonly smaller at the Lower location relative to the upstream site), and the known issues of observational studies [102] suggest manipulative experiments, such as stream-side mesocosms [103] or other analytical approaches, such as structural equation modeling [104] or the more specific use of test and training data may be required to more clearly establish the influence of industrial activity.

#### **4. Conclusions**

Oil sands industrial development influences the environment. This study examined health indicators of lake chub captured in the Ells River between 2013 and 2018 and found differences at downstream sites compared to upstream reference locations. The study also identified the potential influence of environmental and industrial variables at both of the Upper and Lower locations. Overall, the analyses suggest environmental and industrial covariates may explain some of the residual variation in fish health metrics. These results suggest these environmental and industrial covariates may be required in future monitoring studies in the OSR to account for background climate signals and to identify potential industrial influence in the absence of pre-disturbance baselines.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/environments9060073/s1, Supplemental Table S1: Environmental covariates; WT = water temperature; AT = air temperature; SD = stream discharge; P = precipitation; WS = wind speed; WG = wind gusts; *x* = mean; <sup>∼</sup> *x* = median; values of environmental covariates by site shown in Supplemental Figures S1–S3; Supplemental Table S2 Industrial variable names and codes; Supplemental Table S3 Distances from oil sands project boundaries to fish sampling locations; arranged by distance from Lower Ells; Supplemental Table S4 Female Lake Chub metrics (mean ± SD(n)); \* denote significant difference at Lower site compared to

Upper site (*p* < 0.05).; Supplemental Table S5 Male Lake Chub metrics (mean ± SD(n)); \* denote significant difference at Lower site compared to Upper site (*p* < 0.05).; Supplemental Table S6 *p*-values for OLS and GLM spatial comparisons (Lower Ells vs. Upper Ells) of male and female lake chub collected in the Ells River in 2013–2015, and 2018; Int = intercept; yellow highlighting = slope with *p*-value < 0.05; red highlighting = intercept with *p*-value < 0.05; Supplemental Table S7 *p*-values for OLS temporal comparisons of male and female lake chub collected in the Ells River in 2013–2015, and 2018; Int = intercept; yellow highlighting = slope with *p*-value < 0.05; red highlighting = intercept with *p*-value < 0.05; Supplemental Table S8 *p*-values for GLM temporal comparisons of male and female lake chub collected in the Ells River in 2013–2015, and 2018; Int = intercept; yellow highlighting = slope with *p*-value < 0.05; red highlighting = intercept with *p*-value < 0.05; Supplemental Table S9 GLM *p*-values for grouped Reference years (2013–2015) compared to 2018; yellow highlighting = slope with *p*-value < 0.05; red highlighting = intercept with *p*-value < 0.05; Supplemental Figure S1 Relative land disturbance (%) in watershed above Upper site (grey), above the Lower site (red) and area of lower watershed below the Upper location (blue) over time.; Supplemental Figure S2 Plots of environmental covariates for the Upper and Lower Ells locations; SD values are from the Upper Location; WT = water temperature; AT = air temperature; SD = stream discharge; Supplemental Figure S3 Industrial covariates; values within each pane are the 2013–2015 + 2018 (monthly) means; values standardized to 0–1; Supplemental Figure S4 Extra covariates included in ENV+IND models (and land disturbance), which may capture some industrial influence; Supplemental Figure S5 Mean squared error for 1000 Elastic net model runs using penalty factors based on distance in km divided by 1 to 1000 of Upper and Lower sites to industrial facilities; Supplemental Figure S6 Mean residual gonad weight (GW), liver weight (LW), and body weight (BW) of females (with central 95% confidence interval) at the Upper and Lower Ells locations; residual GW, LW, and BW at the Lower site calculated using models estimated using fish from the Upper location; red lines showing expected range of mean residuals at the Lower site given the sample size for the Lower site; Supplemental Figure S7 Mean residual gonad weight (GW), liver weight (LW), and body weight (BW) of males (with central 95% confidence interval) at the Upper and Lower Ells locations; residual GW, LW, and BW at the Lower site calculated using models estimated using fish from the Upper location; red lines showing expected range of mean residuals at the Lower site given the sample size for the Lower site; Supplemental Figure S8 Gonad weight, liver weight, and body weight of female lake chub relative to anatomical covariates at the Lower and Upper Ells locations in 2013–2015, 2018; Supplemental Figure S9 Gonad weight, liver weight, and body weight of male lake chub relative to anatomical covariates at the Lower and Upper Ells locations by year 2013:2015, 2018; Supplemental Figure S10 Histograms of female lake chub gonad weight, liver weight, and body weight; Supplemental Figure S11 Histograms of male lake chub gonad weight, liver weight, and body weight; Supplemental Figure S12 Gonad weight, liver weight, and body weight of female lake chub relative to anatomical covariates in 2013, 2014, 2015, and 2018 by location (Lower and Upper Ells) Supplemental Figure S13 Gonad weight, liver weight, and body weight of male lake chub relative to anatomical covariates in 2013, 2014, 2015, and 2018 by location (Lower and Upper Ells); Supplemental Figure S14 Akaike's Information Criterion (AIC)s, Mean Squared Errors (MSE), and Deviance Ratios (DR) for female lake chub including fish GLM; Supplemental Figure S15 Akaike's Information Criterion (AIC)s, Mean Squared Errors (MSE), and Deviance Ratios (DR) for male lake chub including fish GLM; Supplemental Figure S16 AICs, MSEs, and deviance ratios for model scenarios among female lake chub at the Upper and Lower locations; Supplemental Figure S17 Akaike's Information Criterion (AIC)s, Mean Squared Errors (MSE), and Deviance Ratios (DR) for model scenarios among male lake chub at the Upper and Lower locations; Supplemental Figure S18 Mean water temperature per day in 2013–2015 and 2018 at the Upper and Lower Ells fishing locations; diagonal black line is 1:1 line; Supplemental Figure S19 Effect Size plots from zero (horizontal blue lines) with greater or lesser industrial activities; purple lines = females; dark gray = males; solid = gonad weight; dashed = liver weight; dotted = body weight; L = Lower site; U = Upper site; D = decrease in fish measurement; I = increase in fish measurement; black lines = monthly industrial values from 2010 to 2020; blue dotes = June, July, August, September; red lines = mean summer values for industrial endpoints; Supplemental Figure S20 Effect Size plots from mean 2013–2015 and 2018 activity levels (horizontal blue lines) with greater or lesser industrial activities; purple lines = females; dark gray = males; solid = gonad weight; dashed = liver weight; dotted = body weight; L = Lower site; U = Upper site; D = decrease in fish measurement; I = increase in fish measurement; black lines = monthly industrial values from 2010–2020; blue dotes = June, July, August, September; red lines =

mean summer values for industrial endpoints; Supplemental Figure S21 Catch-per-unit-effort (CPUE; fish per 100 s of electrofishing) for sites in the Ells Basin (data obtained from the Alberta Fish and Wildlife Management Information System; FWMIS; https://www.alberta.ca/access-fwmis-data.aspx, accessed on 7 March 2022).

**Author Contributions:** Conceptualization, M.E.M., E.J.U. and T.J.A.; methodology, M.E.M., E.J.U. and T.J.A.; formal analysis, E.J.U. and T.J.A.; data curation, M.E.M., E.J.U. and T.J.A.; writing—original draft preparation, T.J.A.; writing—review and editing, T.J.A., M.E.M. and E.J.U.; visualization, T.J.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Oil Sands Monitoring Program (W-LTM-S-2-2122).

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Canadian Council on Animal Care (CCAC) approved Animal Use protocols (1315, 1415, 1515 and 1815) issued by the National Water Research Institute Animal Care Committee.

**Data Availability Statement:** Most data used in this study are publicly available. Lake chub data: https://www.canada.ca/en/environment-climate-change/services/oil-sands-monitoring.html (accessed on 1 March 2021); stream discharge: https://wateroffice.ec.gc.ca/ (accessed on 15 March 2021); air temperature and precipitation: https://climate.weather.gc.ca/ (accessed on 5 March 2021); Industry data: https://www.aer.ca/providing-information/data-and-reports/activity-and-data (accessed on 10 January 2022); land disturbance data: https://abmi.ca/home/data-analytics/da-top/ da-product-overview/Human-Footprint-Products.html (accessed on 5 February 2021); wind data: https://wbea.org/air/wbea-air-monitoring/ (accessed on 14 January 2022); additional data, such as water temperature, is available upon request.

**Acknowledgments:** This work was funded under the Oil Sands Monitoring program but does not necessarily reflect the position of The Program or its participants. The authors thank Gerald Tetreault, Richard Frank, Sheena Campbell, Jason Miller, Thomas Clark, Abby Wynia, Jessie Cunningham, and Ross Neureuther from Environment and Climate Change Canada (ECCC), Keegan Hicks, Fred Noddin, Kristin Hynes, and Doug Rennie from Alberta Environment and Parks (AEP), Wood Buffalo and Aurora Helicopters (Fabian Moreau), and Jasmin Gee (Hatfield Consultants). The authors also thank the anonymous reviewers for their helpful comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Additional results and discussion describing the results of traditional Environmental Effects Monitoring study analyses, including ANOVAs and ANCOVAs for exposed fish (Lower Ells) relative to the reference (Upper Ells) fish.

#### **References**


**Matthew D. Petrie 1,\*, Neil P. Savage <sup>1</sup> and Haroon Stephen <sup>2</sup>**


**\*** Correspondence: matthew.petrie@unlv.edu

**Abstract:** The Sierra Nevada region has experienced substantial wildfire impacts. Uncertainty pertaining to fire risk may be reduced by better understanding how air temperature (Ta: ◦C) influences wildfire ignitions independently of other factors. We linked lightning-ignited wildfires to Ta patterns across the region from 1992 to 2015 and compared monthly high- and low-air-temperature patterns between ignition and non-ignition locations at local scales (4 km). Regionally, more ignitions occurred in springs with a greater number of high-Ta months and fewer cool Ta months (analyzed separately) and in summers with fewer cool Ta months. Locally, summer ignition locations experienced warmer summer months on a normalized scale than non-ignition locations. The probability of a wildfire ignition was positively associated with a greater number of high-Ta months during and prior to fire seasons. Regionally, springs with a greater number of high-Ta months had more wildfire ignitions. Locally, as individual locations in the region experienced a greater number of high-Ta months preceding and including the fire season, they exhibited substantial increases in spring (+1446%), summer (+365%), and fall (+248%) ignitions. Thus, the frequent occurrence of high-Ta months is positively associated with lightning-ignited wildfires in the Sierra Nevada region.

**Keywords:** fire risk; forest; climate; California; Nevada

#### **1. Introduction**

Measures of wildfire activity, including the number of large wildfires, burned area extent, and wildfire severity, have been increasing in many locations across the western United States since the 1980s [1–4]. These increases have largely been attributed to the interactive effects of climate warming and the legacy of human wildfire and forest management [1,3]. The average annual burned area in the western US increased at a rate of 355 km2 per year from 1984 to 2011 [5], and wildfires have caused considerable damage to ecosystems, human health, and human communities in the wildland–urban interface [6,7]. As conditions supporting wildfire—and especially very large wildfires—increase across the western US as a result of climate change, wildfire affected areas are expected to also increase considerably within this region [8–11]. The Sierra Nevada region of California and Nevada is one of the largest wildfire-affected areas of the western US and has experienced a greater than six-fold increase in average annual burned area since 1972 [2,12]. Yet, while wildfire increases are anticipated, fire risk assessments remain difficult to quantify across the diverse causative agents—human actions, climates, landscape and topographic conditions, and over- and understory characteristics of ecosystems—that comprise the Sierra Nevada region [13–16].

Many wildfire ignitions in the western US—including the Sierra Nevada region—are initiated by lightning strikes (e.g., natural wildfire ignitions; Short [17]). The fire risk associated with lightning-ignited wildfires is influenced by multiple factors. Short-term weather patterns influence the frequency and timing of lightning strikes, and lightning strikes vary

**Citation:** Petrie, M.D.; Savage, N.P.; Stephen, H. High and Low Air Temperatures and Natural Wildfire Ignitions in the Sierra Nevada Region. *Environments* **2022**, *9*, 96. https://doi.org/10.3390/ environments9080096

Academic Editors: Matteo Convertino and Jie Li

Received: 9 June 2022 Accepted: 18 July 2022 Published: 28 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

geographically and topographically [18–21]. At seasonal to interannual timescales, precipitation and temperature influence the ignition probability component of fire risk through their interactive effects on understory vegetation and fuel moisture [19,22], and wildfire has been found to coincide with dry periods and lightning strikes [23]. In the Sierra Nevada region, vegetation composition and structure vary at fine spatial scales, and over interannual to decadal time periods [24,25]. Thus, the components of natural fire risk—lightning occurrence and ignition probability—may often exhibit considerable spatial and temporal variation, and their interaction may in some instances enhance risk and in other instances lessen it [19]. Considerable research suggests that climate change will increase lightning occurrence and ignition probability [8,26,27], although Finney et al. [28] suggest that global lightning density may actually decline in coming decades. Increasing temperatures and earlier spring snowmelt are already lengthening and intensifying ecosystem drying and wildfire activity periods [29,30], and the associated occurrence of extreme climate events, including drought and heat waves, have been linked to regional patterns of wildfire extent and severity [31,32]. These studies point to a future where enhanced lightning occurrence and ignition probability could lead to enhanced fire risk across both space and time.

Wildfire characteristics—ignition, extent, and severity, for example—are enhanced by hot and dry conditions [20] but through different pathways. Once hot and dry conditions have been realized, wildfire ignition is strongly influenced by weather patterns and the favorability of ignition agents [19,27], wildfire severity is strongly influenced by topography, vegetation composition, and vegetation structural characteristics [33,34], and wildfire extent is strongly influenced by wind speed, topography, and the occurrence of precipitation [22,33,35]. Complex and interactive processes such as these may be simplified—if not fully resolved—by compartmentalizing their components independently. To this end, we propose that air temperature (Ta: ◦C) may have utility for quantifying fire risk. Ta is positively associated with lightning strikes, heat waves, and meteorological drought, such that warmer temperatures may capture variation in both lightning occurrence and ignition probability [20,26]. Recent research suggests that properties of extreme events may capture conditions exceeding normal environmental variation and help to identify changes in fire risk [32]. We therefore postulate that Ta may indirectly capture ignition probability through its strong negative association with fuel moisture (indicating enhanced ignition probability; Abatzoglou and Kolden [36], Flannigan et al. [37]) and through its positive association with dry atmospheric events and meteorological conditions supporting lightning strikes [20,26]. That is, Ta variation—and especially the occurrence of high-Ta periods—is associated with the meteorological and environmental conditions supporting natural wildfire ignition and may be a useful way to compartmentalize and simplify one component of fire risk.

Although considerable research has linked Ta to lightning strikes and wildfire ignitions, it is not clear if analyses of Ta can be used to quantify change to fire risk and at what spatial and temporal scales these changes may be anticipated. If Ta patterns are related to the probability of wildfire ignition, this determination can help to identify how ignition probabilities will change in a warming regional climate and indicate locations where management interventions should be prioritized. In this study, we explored relationships between Ta change, the properties of high (warm) and low (cool) Ta months, and the occurrence of lightning-ignited wildfires in the Sierra Nevada region from 1992 to 2015. During a similar timeframe (1992–2012), this region experienced the most lightning-ignited wildfires in the contiguous US [38]. The objectives of this research were to: (1) determine if regional temperature patterns from 1992 to 2015 in the Sierra Nevada region were related to the number of regional wildfire ignitions in spring, summer, and fall fire seasons; (2) contrast the frequency of occurrence and magnitude of high- and low-Ta months in spring, summer, and fall fire seasons between locations that experienced a wildfire ignition and locations that did not (fire ignition vs. non-ignition locations); and (3) explore to what degree the frequency of high- and low-Ta months prior to spring, summer, and fall fire seasons are related to ignition probability. Our analyses explore Ta–ignition relationships averaged across the region and for individual locations through time. Our analyses also

evaluate the magnitude and occurrence of high- and low-Ta months within individual wildfire seasons and also evaluate high- and low-Ta months in preceding seasons, which may indicate sustained conditions (or lack thereof) that amplify wildfire ignition probability. Because wildfires have been strongly tied to hot and dry conditions in this region [14,32], we hypothesized that regional analyses focusing on high Ta would show a greater association with wildfire ignition compared to analyses of low Ta. Similarly, we hypothesized that the magnitude of high-Ta months would have a greater association with wildfire ignitions (due to their general indication of drier conditions and favorable weather for lightning strikes) compared to the frequency of high-Ta months.

#### **2. Site Description**

The Sierra Nevada region encompasses an area of 64,544 km<sup>2</sup> ([39]; Figure 1a). This ecologically diverse region is comprised of forest, woodland, and shrubland vegetation, with a plant species composition that is influenced by the surrounding ecoregions: the Cascade–Sierra Mountains to the north, the Great Basin to the east, the Mojave Desert to the south, and the Central Valley to the west [24]. The Sierra Nevada has a Mediterranean climate, with the majority of annual precipitation (∼500–2030 mm total) falling between fall and spring, and with dry summers [24]. Average daily temperatures vary across seasons, topography, and elevation (∼100–3900 m; United States Forest Service [24], PRISM Climate Group [40]). Vegetation composition also varies across topography and elevation, often at relatively fine spatial scales [14,41]. Frequent, small fires that maintained ecosystem health and structure in this region were reduced due to human fire suppression in the 20th century, which resulted in increasing fuels and vegetation density and contributed to the occurrence of large wildfires over the past 40 years [42]. Vegetation mortality and die-offs are also increasing across the region due to the interplay of and altered fire regime, drought, and pathogen outbreaks, which portend to an uncertain future for the region's ecosystems and the services they provide [21,32].

**Figure 1.** Study area map of the Sierra Nevada region, including the number of fire ignitions in each 4 km grid cell from 1992 to 2015 (Panel **a**), and a barplot illustrating the timeseries of wildfire ignitions in spring, summer, and fall from 1992 to 2015 (Panel **b**). The black line in Panel (**b**) illustrates total wildfire ignitions in each year.

#### **3. Methods**

#### *3.1. Weather Estimates and Spatial Wildfire Data*

We obtained daily maximum (Ta*max*) and minimum (Ta*min*) air temperature estimates at 4 km resolution from 1990 to 2016 from the Parameter-elevation Regressions on Independent Slopes Model (PRISM; PRISM Climate Group [40]), which has been used in other, recent wildfire-focused research [43,44]. We clipped PRISM raster files to the Sierra Nevada region (4034 total 4 km resolution cells). Weather estimates, including PRISM, may experience autocorrelation due to both naturally occurring patterns as well as interpolation procedures, and we found a moderate degree of autocorrelation of Ta*max* and Ta*min* for our study area (Figure S1). Fortin and Dale [45] recommend the use of small blocks when autocorrelation is present, and we minimized potential autocorrelation effects by analyzing Ta in each grid cell independently of surrounding cells. We conducted spatial data harmonization using QGIS [46], and we compiled and analyzed all data using R [47].

We obtained point-based wildfire ignition data for the Sierra Nevada region from 1992 to 2015 from the Fire Program Analysis Fire-Occurrence Database [17]. Our analyses included only lightning-ignited wildfires and included fire size classes from >0.10 to ≥2023.43 ha (Class B to Class G), the characteristics of which are provided in the attribute data of the Fire Program Analysis Fire-Occurrence Database [17,48]. We did not include fires ≤ 0.10 ha (Class A; 7897 of 10,788 total fires; 73%), which may occur following lightning strikes even when ignition agents are unfavorable for wildfire spread. These small fires may include ignitions with greater wildfire potential that were quickly contained and suppressed, and there is therefore some uncertainty in whether the fires removed from our analysis were of lower or higher risk.

We assigned point-based wildfire occurrence data to the centroid of the nearest 4 km cell. In our analyses, we compared ignition and non-ignition cells, evaluated seasonally at discrete time steps. For example, a single 4 km cell was designated to be an ignition location in any season when it experienced a wildfire ignition (summer 2010, for example) and was designated a non-ignition location in seasons when it did not. We evaluated 3 fire seasons (spring: March–May; summer: June–August; fall: September–November). Only 10 ignitions during winter met the Class B to Class G criteria of this study, and we therefore did not analyze winter ignitions.

#### *3.2. High- and Low-Temperature Months*

We used a peak-over-threshold approach from Coelho et al. [49] to calculate the occurrence and properties of high- and low-Ta months. This is a nonstationary approach that is used to identify anomalous and extreme values at the monthly time step from the distribution of daily values, across a moving time window. This technique is especially useful for identifying unique features in climatic data that have a positive or negative trend in their mean or variance through time (due to climate change, for example). Using this technique, analysis can focus on the properties of daily values occurring above a time-varying threshold, or on variation in the threshold values (see Petrie et al. [50] for an example using temperature simulations and a focus on statistically extreme events). For each 4 km cell in each month, our analyses focused on determining the temperature threshold value designating a high- or low-Ta month. On average, 20–30% of all months in our analyses were high- or low-Ta months. Our reported high- and low-Ta months are therefore best understood as the occurrence and magnitude of the top 20–30% of warm or cool months, instead of extreme events in the top 5% of all values.

To determine the occurrence and threshold magnitude of high-Ta months, we first calculated a 3-year floating mean of daily Ta*max* values in each month. For example, the 3-year floating mean for a single cell in August 1995 was the mean of 93 daily Ta*max* values from August, 1994, 1995, and 1996. In this analysis, we found that window length had minimal influence on monthly floating mean values and chose a 3-year window length to maximize the number of observations. After calculating the 3-year floating mean, we then determined the daily Ta*max* values in the analysis month (August, 1995, in this example) that were greater than the 3-year floating mean value and ranked these daily values from the highest daily value to the lowest. A month with ≥20 daily Ta*max* values in this category therefore experienced at least 1 day with a Ta*max* value that exceeded the top 5% of days and was therefore a statistical extreme event for daily values. The observed daily Ta*max* value ranked sequentially below the top 5% of values (e.g., 1–2 values below the highest daily value) was the threshold value, which designated the boundary air temperature between extreme and non-extreme daily values. Although this threshold value indicates a month with an extreme Ta day, we clarify that this is best viewed as a month located in the top 27% of observed daily temperature values. We calculated high- and low-Ta months separately. To determine the occurrence of low-Ta months, we used the same procedure but evaluated daily values of Ta*min* below the 3-year floating mean.

Because the distribution of average Ta differed among the cells in our study region, we normalized the magnitude of high- and low-Ta months as an anomaly ((observed–longterm mean)/standard deviation of long-term mean). To allow for analysis of these anomaly values across cells with significant and insignificant trends in Ta magnitude, we detrended the magnitude of high and low Ta for cells that experienced a statistically significant linear trend in the threshold value over the 1991–2015 study period (at least 7 high- or low-Ta months observed; *p* < 0.05).

#### *3.3. Analysis*

Our regional analyses explored relationships between temperatures and wildfire ignitions from 1991 to 2015 and relationships between the frequency of occurrence and the magnitude of high- and low-Ta months and the occurrence of wildfire ignitions. Our analyses of individual locations (e.g., 4 km cells) focused on contrasting the frequency of occurrence and the magnitude of high- and low-Ta months between ignition locations and non-ignition locations in spring, summer, and fall fire seasons. We determined significant relationships between the number of high- and low-Ta months and wildfire ignitions using linear correlations (R<sup>2</sup> coefficient of determination; *p* < 0.05), and we determined significant differences in the average magnitude of high- and low-Ta months between ignition and non-ignition locations using ANOVA and Tukey's honest significant differences (*p* < 0.05).

We determined to what degree the occurrence and magnitude of high- and low-Ta months in seasons preceding the fire season were related to differences between ignition and non-ignition locations. To determine if ignition locations experienced a differing magnitude of high- or low-Ta months compared to non-ignition locations, we contrasted the magnitude of high- and low-Ta months between ignition and non-ignition locations in the seasons preceding each fire season, as well as in spring, summer, and fall fire seasons. To determine if ignition locations experienced a greater frequency of high- and low-Ta months compared to non-ignition locations, we counted the total number of high- or low-Ta months (separate analyses for high versus low) in seasons preceding each fire season. Our counts included the 3 seasons prior to each fire season (0 months minimum, 9 months maximum) as well as the 3 prior seasons up to and including spring, summer, and fall fire seasons (0 months minimum, 12 months maximum). We interpreted a positive relationship between the proportion of locations experiencing a wildfire ignition and a higher number of Ta months (either high or low) as evidence that Ta and wildfire ignitions were positively associated. To evaluate these relationships, we grouped all locations experiencing the same number of high- or low-Ta months into discrete categories (0 months, 1 month, 2 months, etc.) and calculated the ignition proportion of each category independently from the number of locations experiencing a wildfire ignition divided by the number not experiencing an ignition. We required at least 8 locations in each category, and we combined observations when necessary (for example, combining 9 and 10 month locations to reach the required number of 8 observations, resulting in a 9–10 category).

#### **4. Results**

#### *4.1. Regional Change from 1992 to 2015*

The number of wildfire ignitions was lower in the later third of our 1992–2015 study period (2008–2015) compared to the first two-thirds (1992–2007), and summer and fall accounted for 87.8% of the total ignitions (Figure 1b, Table 1b). The magnitude of the high-Ta months was generally lower in the first 8 years of our 24-year study period (1992–1999) compared to the last 16 years, but significant differences between study period intervals were inconsistent and were strongly influenced by a large number of observations (Table 1a, Figure 2). The magnitude of high-Ta months was generally greater in the last 8 years of our study period (2008–2015) compared to the first 16 years (ANOVA and Tukey's honest significant differences, *p* < 0.05; Table 1b). We observed a slight decline in the frequency of high-Ta months in spring and summer in the later third of our 1992–2015 study period (2008–2015) compared to the first two-thirds (1992–2007; Table 1a).

**Table 1.** Seasonal air temperature (Ta: ◦C) differences between 8-year time periods from 1992 to 2015 in the Sierra Nevada region. Ta differences were analyzed for high-Ta months (Section **a**) and locations with high-Ta months that experienced a concurrent wildfire ignition (Section **b**). The number of observations in each category is indicated in the table, and significance is indicated by differing letters (determined at *p* < 0.05).


**Figure 2.** Timeseries of the magnitude of high-air-temperature months (Ta: ◦C) across the Sierra Nevada region from 1992 to 2015, and the magnitude of high-Ta months for fire ignition locations in the month when ignition occurred. Lines were loess smoothed for illustration.

#### *4.2. The Magnitude of High and Low Temperatures and Wildfire Ignitions*

Across the Sierra Nevada region, there was no relationship between the normalized magnitude of high- or low-Ta months and the percentage of locations that experienced a wildfire ignition in the spring, summer, or fall fire seasons (Figures 3a and S2a). At local scales, the ignition locations in summer experienced (on a normalized scale) warmer high-Ta summer months (Figure 4b), and the ignition locations in fall experienced less cool low-Ta fall months (Figure S3c). The differences in the magnitude of high and low Ta in months preceding the fire seasons were inconclusive (Figures 3 and S2).

#### *4.3. The Frequency of High- and Low-Temperature Months and Wildfire Ignitions*

Regionally, a greater number of ignitions occurred in springs that experienced a greater proportion of locations experiencing one or more high-Ta months (0.003% ignition percentage at a proportion of 0.0% of locations; 0.4% ignition percentage at 100% of locations; Figure 3b), a lower proportion of the locations experiencing one or more low-Ta months (0.0% ignition percentage at a proportion of 100% of locations; 0.4% ignition percentage at 0.0% of locations; Figure S2b), and in summers with a lower proportion of locations experiencing one or more low-Ta months (1.3% ignition percentage at a proportion of 100% of locations; 3.0% ignition percentage at 0.0% of locations; Figure S2b). At local scales, the ignition locations in spring, summer, and fall experienced a greater number of high-Ta months, both in the fire season and in the previous seasons (Figure 5). In spring and summer, the ignition locations generally experienced fewer low-Ta months (Figure S4).

**Figure 3.** Relationships between the average anomaly of high-air-temperature months (anomaly = observed − long-term mean/standard deviation of long-term mean) and the percentage of locations in the Sierra Nevada region experiencing a wildfire ignition in spring, summer, and fall seasons (Panel **a**), and relationships between the percentage of locations in the Sierra Nevada region experiencing a high-air-temperature month and the percentage of locations in the Sierra Nevada region experiencing a wildfire ignition in spring, summer, and fall seasons (Panel **b**). Significant relationships are shown in each panel (R2 coefficient of determination; *p* < 0.05).

**Figure 4.** Boxplots illustrating differences in the anomaly of high-air-temperature months (anomaly = observed − long-term mean/standard deviation of long-term mean) between ignition (red) and non-ignition (gray) locations in the 3 seasons previous to and including the spring (Panel **a**), summer (Panel **b**), and fall (Panel **c**) fire seasons. Significant differences and the direction of difference are indicated for each boxplot pair (one-tailed *t*-tests; *p* < 0.05).

The number of high-Ta months occurring prior to and during fire seasons was positively correlated to the percentage of locations experiencing a wildfire ignition (Figure 6). In spring, wildfire ignitions occurred in ∼0.08% of the locations that did not experience a high-Ta month, and increased to ∼0.5% for the locations experiencing 5–6 of these months in the prior three seasons (of nine possible; Figure 6c), and to ∼0.9% of the locations experiencing 7–8 of these months in the prior 3 seasons plus the spring fire season (of 12 possible; Figure 6b). In summer, wildfire ignitions occurred in ∼1.2% of the locations that did not experience a high-Ta month, and increased to ∼4.2% for locations experiencing 7–8 of these months in the prior three seasons (of nine possible; Figure 6c), and to ∼4.4% for locations experiencing 9–10 of these months in the prior 3 seasons plus the summer fire season (of 12 possible; Figure 6d). In fall, wildfire ignitions occurred in ∼0.7% of the locations that did not experience a high-Ta month, and increased to ∼1.5% for the locations experiencing seven of these months in the prior three seasons (of nine possible), and to

∼1.7% for locations experiencing 8 of these months in the prior 3 seasons plus the fall fire season (of 12 possible; Figure 6e,f). We were not able to assess ignition increases for the three preceding seasons in spring due to too few observations (Figure 6a). We did not observe significant relationships between the number of low-Ta months and the percentage of locations experiencing a wildfire ignition (not shown).

**Figure 5.** Pie charts illustrating differences in the proportion of ignition and non-ignition locations experiencing a high-air-temperature month in the 3 seasons previous to and including the spring (Panel **a**), summer (Panel **b**), and fall (Panel **c**) fire seasons. The average proportion of non-ignition locations experiencing a high-Ta month in each season is illustrated in gray, and the proportional increase in ignition locations experiencing a high-Ta month in each season is shown in red.

**Figure 6.** Fire ignition probability for locations experiencing differing numbers of high-air-temperature months during and prior to the fire season, summarized across 1992–2015. Panels (**a**,**c**,**e**) illustrate fire ignition probability in the spring (Panel **a**), summer (Panel **c**), and fall (Panel **e**) fire season in response to high-Ta month occurrence during the 3 prior seasons (3 seasons, 9 months maximum). Panels (**b**,**d**,**f**) illustrate fire ignition probability in the spring (Panel **b**), summer (Panel **d**), and fall (Panel **f**) fire season in response to high-Ta month occurrence during the 3 prior seasons and also the fire season (4 seasons, 12 months maximum). In cases where too few events were observed in a single category, these events were merged with the next lowest category. Because multiple observations were combined in some categories, linear correlations were developed for category number instead of observation number (Panel (**a**) example: 0 observations = category 0; 5–6 observations = category 4).

#### **5. Discussion**

The slightly increasing magnitude of Ta across the Sierra Nevada region did not correspond to increased wildfire ignitions, which have been declining across the western US in recent decades [51]. In refutation of our first hypothesis, we did not find associations between the magnitude of high- and/or low-Ta months with wildfire ignitions at the regional scale. That is, the Sierra Nevada region as a whole did not experience variation in wildfire ignitions that can be tied to regionally warmer or cooler months. At the 4 km local

scale, however, ignition locations in summer had relatively warmer high-Ta months compared to non-ignition locations, and ignition locations in fall had less cool low-Ta months. This corroborates other work linking high air temperatures to the dry environmental and high-pressure systems that support lightning strikes and wildfire ignition [20,52].

In refutation of our second hypothesis, we found that the frequency of high- and low-Ta months was closely associated with wildfire ignitions. Regionally, more ignitions occurred in springs with a greater number of high and fewer cool Ta months, and in summers with fewer cool Ta months. Locally, ignition locations consistently experienced a greater number of high-Ta months in fire seasons and in the three preceding seasons. As the number of high-Ta months increased from 0 to a maximum of ∼7–10, we found a 1446% increase in spring ignitions, a 365% increase in summer ignitions, and a 248% increase in fall ignitions. This considerable increase in wildfire ignition probability suggests that fire risk in the Sierra Nevada region is enhanced by the more frequent occurrence of warm temperatures over multiple months. In summer and fall, an observation of high-Ta months prior to the fire season suggested ignition percentages that were similar to percentages observed in analyses that included observations during the fire season (summer: 4.2% prior, 4.4% prior + fire season; fall: 1.5%, 1.7%). This suggests that the variation in the probability of fire ignitions in summer and fall may in many cases be strongly influenced by the legacy effects of prior seasons.

Spring ignitions were notably emergent under a greater frequency of warm conditions and were enhanced by additional high-Ta months in spring (0.5% prior, 0.9% prior + fire season). Spring ignition locations experienced a greater number of high-Ta months and fewer low-Ta months in the spring and preceding winter, which corroborates previous research linking winter weather to spring temperatures, moisture patterns, and fire activity [29,30]. A higher Ta also decreases snowpack, resulting in more lightning strikes reaching surface fuels [53]. Although our analyses of spring ignitions were limited by relatively low ignition occurrence, our results suggest that consistent movement toward warmer and less cool winter conditions—and not change to episodic periods of the very coldest or warmest winter temperatures—may support the emergence of spring wildfire ignitions in this region.

#### *5.1. How Broadly Is Air Temperature Associated with Wildfire?*

Ta patterns have been previously linked to wildfire ignition [54,55], and our results show that observation of the frequency of high-Ta months has utility for the assessment of fire risk. It remains unclear, however, if Ta variation can help to understand wildfire extent and severity, which are underrepresented components of wildfire that differ from more commonly focused on components, such as fire risk and fire hazard [19]. Ta variation may capture conditions shaping wildfire severity and extent due to the influence of weather on wildfire activity [56] and climate on vegetation characteristics [27,57]. Understory fuel moisture influences both wildfire ignition and wildfire spread, and periods of low humidity associated with higher temperatures can increase dry fuel loads [37]. Thus, wildfire extent and severity may in some cases be associated with Ta, but these components are influenced more directly by over- and understory vegetation characteristics, such as tree stand health and density [34,36]. Additionally, Ta variation may have indirect effects on wildfire through other physical processes, including the potential for relatively cool and wet periods to increase wildfire severity by supporting the growth of understory vegetation [41]. Thus, although our results point to one pathway by which high temperatures may be associated with an increase in wildfire ignitions, there are many factors and mechanisms shaping ignitions and other wildfire components, especially over multiyear and decadal timescales.

We note a few caveats associated with the datasets we used in our study that should be considered when evaluating our reported Ta–ignition relationships. First, the accuracy of wildfire ignition coordinates may differ from the actual ignition location [58]. Second, spatial datasets such as PRISM provide an estimate of Ta patterns and do not fully resolve Ta in heterogeneous mountain landscapes [59]. Third, our analyses did not account for

multiple ignitions in the same month or season in a single location, for the effect of previous fires on future ignitions [60], and our study area includes locations with limited causative agents (such as low vegetation cover) that may limit ignition.

#### *5.2. Managing for Wildfire*

Climate change is increasing the occurrence and extent of large wildfires, and over 4 million hectares of burned area in the western US has been linked to Ta-associated increases in vapor-pressure deficit in recent decades [43]. The frequency and magnitude of high-Ta months will likely increase throughout the western US in coming decades [50]. When vapor-pressure deficit is high, sustained high Ta enables the drying of fuels and may be associated with weeks to months of severe fire weather, and these conditions may often be clustered spatially [23,37]. In our analyses, we used a nonstationary method to calculate high- and low-Ta months, which minimizes the effect of changing mean temperatures through time [49], and we detrended their magnitude when appropriate. It follows that the methodology we employed could provide consistent indication of wildfire ignition probability in a warmer future climate, but we caution that climate change may alter some or many of the physical processes that influence wildfire in ways that do not have a contemporary analog. It is not yet fully clear if the associations between Ta and wildfire ignition that we observed over a 24-year period in the Sierra Nevada region will be consistent in a changing climate and to what degree they may be extended to other semiarid regions that differ in regional climate, topography, and ecosystem characteristics.

The use of Ta in wildfire research is bolstered by the variable's accessibility and predictability, and relationships between Ta and wildfire ignitions can provide practical utility for fire season planning. Specifically, consistent observations of high monthly temperatures preceding spring, summer, and fall fire seasons indicate elevated wildfire ignition probability for locations in the Sierra Nevada region. Future work could improve these results by determining to what degree spatial clustering of high Ta may indicate higher ignition probabilities (see Podschwit and Cullen [23] for a similar focus over very broad spatial areas), although the determination of natural versus artificial spatial autocorrelation across topographically complex areas may complicate this determination. We recommend that the findings of the present study can help land managers efficiently deploy actions that minimize the potential for wildfire ignitions and spread, including vegetation thinning and using natural fires to reduce fuels [61,62]. Specifically, locations that are regularly experiencing high-Ta months and also have high vegetation density and/or high understory fuels should be prioritized for thinning and fuel reduction treatments, especially when these locations are experiencing abnormal high summer temperatures. Locations experiencing an increase in high-Ta months and/or a decline in low-Ta months in spring may be more likely to experience an emergence of spring wildfire ignitions. That is, better anticipation of ignitions can be achieved through analyses of Ta and can be a useful tool to direct resources that help to protect regional ecosystems.

#### **6. Conclusions**

In this work, we sought to improve the understanding and anticipation of fire risk in the Sierra Nevada region, USA, by linking wildfire ignitions at regional and local scales to high and low air temperatures (Ta: ◦C) and the properties of high and low Ta evaluated at the monthly time step. Regionally, more ignitions occurred in springs with a greater number of high-Ta months and fewer low-Ta months, and in summers with fewer low-Ta months. We found that wildfire ignitions were most strongly associated with the frequency of high-Ta months, and wildfire ignition probability increased substantially when a greater number of these months occurred both prior to and during the fire season. Summer ignitions experienced the highest percentage increases (up to 4.4% of the locations) and were enhanced by both the frequency and the magnitude of high-Ta months in summer. Spring ignitions became emergent when they experienced an increase in the occurrence of high-Ta months or a decline in low-Ta months. Thus, the more frequent occurrence of highTa months is positively associated with lightning-ignited wildfires in the Sierra Nevada region. Although there remains some uncertainty pertaining to the physical mechanisms and pathways that underlie these findings, Ta increases have the potential to enhance fire risk by increasing ignition probability and possibly by increasing lightning occurrence. The identification of change to the properties of Ta can help to identify locations and time periods experiencing enhanced fire risk and prioritize management interventions.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/environments9080096/s1, Figure S1: Spatial autocorrelation results; Figure S2: Regional relationships between ignitions and low Ta months; Figure S3: Low Ta month anomaly between ignition and non-ignition locations; Figure S4: Differences in the proportion of ignition and non-ignition locations experiencing a low Ta month.

**Author Contributions:** Conceptualization, M.D.P., H.S.; methodology, M.D.P., N.P.S., H.S.; software, M.D.P., N.P.S.; validation, M.D.P., N.P.S.; formal analysis, M.D.P., N.P.S.; investigation, M.D.P., N.P.S.; resources, M.D.P., N.P.S.; data curation, M.D.P., N.P.S.; writing—original draft preparation, M.D.P., N.P.S.; writing—review and editing, M.D.P., H.S.; visualization, M.D.P., N.P.S.; supervision, M.D.P., H.S.; project administration, M.D.P.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** All data and derived data products were obtained from online and publicly available sources, as described in the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

