5.1.2. Statistical Analysis

The environmental factors considered as potential drivers of congener composition were meteorological condition (monthly averages of total precipitation (mm), wind direction (tens of degrees), wind speed (km/hr), and minimum, maximum and mean air temperature (◦C) from nearby weather stations, Canadian Daily Climate Database) and nutrient concentrations (total phosphorus (TP) and total nitrogen (TN) from the raw water samples collected for MC analysis). Due to the proximity of Lake St. Clair and the Detroit River raw water intake sites (Figure 1), we used the same meteorological station for both water sources. We also considered the depth and distance from shore of the intake sites as these variables may track differences in water currents, light, and nutrient availability. Finally, we tested for any detectable effect of raw water pre-chlorination (used for dreissenid mussel control) as this may affect the concentration of total MC and MC congener composition (both via the direct effect of chlorine on the cyclic peptide structure and indirectly via secondary effects on dreissenid mussels [46,47]). Although the total chlorine concentration in raw water samples was measured by the OMECP, concentrations were often below detection (only detected in sites along Lake Ontario,

Lake Erie, and the Detroit River (Figure S5a)). To provide chlorine treatment data for all sample dates and sites, we thus used a categorical classification to indicate whether chlorination treatment was confirmed by the water treatment plants (Yes, No or turned Off prior to sample collection (Figure S5b)). This classification was an accurate indication of chlorine concentrations when detected (i.e., in Lake Ontario, Lake Erie, and the Detroit River (Figure S5c)). To create a variable that more broadly indicated chlorination treatment as well as the presence of dreissenid mussels, we grouped the one site where chlorination was turned off prior to sample collection (Union Water Treatment Plant along Lake Erie) with sites having continued chlorination.

To examine how the concentrations of all four congeners (MC-LR, -RR, -LA, and -YR) varied with changes in the environmental factors over time and space, we conducted a multivariate canonical ordination (redundancy analysis (RDA)) using the *rda* function from the {vegan} package in R [75]. To test for changes in the relative concentrations of MC congeners, we transformed the response matrix into relative abundances using the argument "total" of the *decostand* function. Although RDA examines the relationship between the multivariate response matrix and the suite of environmental factors, relationships are restricted to linear regressions. In addition, the method does not allow for missing data. We thus removed observations with missing environmental data prior to running the RDA.

Given that the RDA only tests for linear relationships between the multivariate response matrix and environmental drivers, we coupled this analysis with a multivariate regression tree (MRT, function *mvpart* in R [76,77]), which allows for non-linear and threshold responses [78]. In addition, MRTs allow for missing data. We used the cross-validation relative error (CVRE), which is the ratio of the variation unexplained by the tree to the total variation in the response, as the criterion for selecting the most parsimonious tree (i.e., the tree with the least splits whose CVRE value is within one standard error of the tree with the lowest CVRE [79]).

To further test whether similarity in meteorological and nutrient parameters among sites of closer proximity (Figure 1) could lead to the dominance of any particular MC congener, we ran a linear mixed effect regression tree (using the {glmertree} package in R [80]) with a random effect for year and site (main water body). By teasing apart the effect of co-location, this additional analysis evaluated whether the relationships observed with the multivariate regression tree on all sites and years were biased by the sampling design.

Lastly, to identify which congener was most often associated with which water body, we conducted an indicator species analysis using the indicator value index (function *indval* of the {labdsv} package [81]). Briefly, *indval* measures the fidelity and specificity of a "species" (congener) to a group (water body). Where specificity is defined as the mean abundance of the species within the targeted group compared to its mean abundance across all groups, and fidelity is the proportion of sites of the targeted group where the species is present [82]. The index is thus maximized when a species (congener) is observed at all sites belonging to a single group (water body), and not elsewhere [76].

### *5.2. Global Analysis*

To assess the patterns in MC congener composition across many regions, we conducted a synthesis of published literature that provided MC congeners concentration data from freshwater lakes and reservoirs. Specifically, we conducted an ISI Web of Science and Google Scholar search using the keywords "microcystin", "congener", "cyanotoxin", "cyanobacteria", "lake", "eutrophication", "microcystin-LA", "MC-LA", "microcystin-LR" and/or "MC-LR" published between 2000 and 2017. We screened the studies using the criteria that: 1) Concentrations of MC congeners were measured (i.e., not just total MCs), 2) MC-LR, -LA, -RR and -YR were quantified (i.e., the use of the relevant standards was mentioned), and 3) data tables or graphs from which the raw data could be digitized were provided in each publication. A total of 146 sites which covered all continents (to the exception of Antarctica) met these search criteria (Table S1). Most studies presented their data as concentrations in the water column (μg/L), while some presented their data as seston content (g/dry weight). To analyze both types of concentration data, we calculated the composition as percentages of total MCs reported. **Supplementary Materials:** The following are available online at http://www.mdpi.com/2072-6651/11/11/620/s1, Figure S1: Bar plot of average MC-congener concentrations across each main water body, Figure S2: Bar plots of yearly averaged (2006-2016) MC congener concentrations across each main water body, Figure S3: Boxplots of concentrations of MC–LA, MC–LR, total phosphorus (TP) and total nitrogen (TN) from the most frequently sampled water bodies, Figure S4: Boxplots of weather conditions from weather stations nearest to the most frequently sampled water bodies, Figure S5: Summary of chlorination treatment and chlorine concentration of the OMECP raw water intake sites, Figure S6: Multivariate regression tree (MRT) of congener relative abundance across the most frequently sampled water bodies of the Laurentian Great Lakes basin (excluding Total Phosphorus), Figure S7: Summary of microcystin data (boxplots and scatterplots) from the global dataset, Figure S8: Relationship between congener composition and raw water intake location and water treatment plant chlorination status, Figure S9: Relationship between total MC concentration and location of raw water intake sites, Table S1: Summary of microcystin literature review.

**Author Contributions:** Conceptualization, Z.E.T. and F.R.P.; statistical analysis, Z.E.T.; writing—Z.E.T., F.R.P., I.F.C., S.B.W., and A.Z.; supervision, F.R.P. and I.F.C.; project administration, F.R.P.; funding acquisition, Z.E.T., F.R.P. and I.F.C.

**Funding:** This research was funded by an NSERC postdoctoral grant, NSERC CREATE Algal Bloom Assessment through Technology & Education (ABATE) grant and a Best in Science grant from the Ontario Ministry of the Environment, Conservation and Parks (OMECP).

**Acknowledgments:** We thank Gillian Kingston, Patrick Cheung, Michelle Palmer, Claire Holeton and Jenny Kwong from the Environmental Monitoring and Reporting Branch of the Ontario Ministry of the Environment, Conservation and Parks for providing the raw data for analysis, as well as Xavier Ortiz for details on the OMECP's analytical methods for microcystin congeners. We are also most grateful for the help and information provided on water treatment facilities by Kyle Davis, Carolyn de Groot, Ainslie Timmons, Lindsay Ariss, John Armour, Bill Anderson, Susan Andrews, Sangeeta Chopra, Warren Higgins, Karen Burgess, Ryan Peterson, Amy Russell, Dean Walker, Sarah Clarke, Todd Harvey, Dale Dillen, John Hemingway, Paul Dyrda, Mike Purcell, John Hoos, Kayla Beach, Monica Reid, and Christa Paquette.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


coupled to liquid chromatography-quadrupole time-of-flight high resolution mass spectrometry. *Anal. Bioanal. Chem.* **2017**, *409*, 4959–4969. [CrossRef]


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
