1. Introduction
Chironomids are amongst the most speciose families of aquatic insects, with about 15,000 described species, contributing to a large portion of the insect diversity in the benthos. Their ecological tolerance varies among species, with some being tolerant to extreme environmental conditions [
1]. Various species have been shown to respond to a broad suite of different environmental (=abiotic) factors, such as water temperature, salinity and sediment composition [
2], although biotic factors (competition, predation) are also important [
3]. Furthermore, interactions between different combinations of abiotic and biotic factors as well as spatial and temporal variables are complex, making the species distributions difficult to interpret. Such complexities are further exacerbated in environmentally unstable aquatic habitats such as those in the Mediterranean [
4], which are known to show broad hydro-morphological variations among years. The hydrological regime of those habitats will likely become even more unstable due to global climate change [
5].
Historical biogeographic factors are known to influence chironomid species composition in aquatic habitats at large (continental) spatial scales [
6], but in more specific habitats such as headwater streams, it can be supposed that environmental factors may act at a smaller spatial scale [
7]. Despite this described complexity, often only a few key factors may account for the largest portion of observed variation [
8]. The scarcity of samples and data available for study coupled with different sampling methods and tools and differing taxonomic levels used (i.e., family, genus, species), produces further uncertainty in interpreting results. Many studies considered responses at different spatial scales [
9], but few studies evaluated the responses of chironomids at different spatial scales [
10], and they were primarily focused on small spatial scales [
11,
12].
The aim of the present study is to confirm the relevance of environmental factors acting at small spatial scales and to assess the potential importance of factors acting at larger spatial scales by using traditional multivariate methods, such as partial canonical correspondence analysis (pCCA) and multiple discriminant analysis (DISCR), and more recent tools used to analyze spatial data based on Moran’s eigenvector maps (MEMs) [
13].
2. Materials and Methods
The input data matrix included 788 study sites in 11 different habitat types (
Table 1) encompassing lotic and lentic water bodies; the former were classified according to Illies and Botosaneanu [
14], and the latter according to Buraschi et al. [
15] and Tartari et al. [
16].
The running water types were separated into kryal, crenal, rhithral and potamal zones, adding the kryal zone (glacial streams) to the Illies classification [
14]. The lake classification following [
15,
16] separates lakes with a prevalent profundal zone (AL03), from littoral (AL04), littoral–sublittoral (AL05) and prevalent sublittoral (AL06) lakes.
The samples were collected with different sampling techniques according to habitat type. Kick sampling was used in kryal, crenal and rhithral habitats, drift nets were used in potamal stretches of rivers, PONAR grabs were used in different lake types, Ekman grabs were used in Alpine lakes. Species abundances were standardized to 1 m
2 bottom surface to allow comparison between samples collected in different habitat. Samples were collected mostly in Italy [
2,
8], and secondarily in other countries: Austria, Germany [
17], Switzerland [
18,
19], Montenegro [
20], Greece [
21], Algeria [
22,
23] and France (Garonna River). The maps with sampling sites were prepared using QGIS version 3.34.1 Prizren (2009) (
https://www.qgis.org/it/site/, accessed on 1 February 2024) [
24].
Raw data comprised 19,531 samples collected in 788 study sites with 255 variables, i.e., 22 environmental factors/variables and 233 chironomid species. Only samples including values of at least ten variables and variables present in at least 100 samples were considered for analysis. This selection left 782 sites with 101 species and nine environmental variables: altitude (altit) (m a.s.l.), distance from the source (dist) (km), conductivity (cond) (µS cm
−1), dissolved oxygen (O
2) (mg L
−1), oxygen saturation (O
2%) (% saturation), water temperature (temp) (°C), pH, total phosphorous (TP) (µg L
−1) and ammonium hydroxide (NH
4-N) (mg L
−1). Latitude and longitude expressed in the coordinate reference system WGS 84/UTM zone 32, three spatial scales (regional, large and small water basin), habitat type, year and month were included as factors in the database. Mean values of the variables calculated for each of the 39 large water basins are provided in
Supplementary Table S1.
To analyze factors responsible for species distribution in the presence of potential influence of spatial factors, data were analyzed with a pCCA after log(x + 1) transformation of the species matrix and standardization of environmental variables. The pCCA was carried out with the species matrix as dependent variables, the nine environmental variables as constraining variables (independent variables) and spatial coordinates as conditioning variables (covariates). Forward and backward selection of environmental variables was performed to select the variables accounting for the largest source of variation.
To confirm the results of pCCA, an inverse pCCA (pCCAi) was calculated, with spatial variables as constraining factors and environmental variables as conditioning factors; in other words, constraining and conditioning factors were exchanged.
To confirm the importance of habitat type in driving species composition, a DISCR analysis was performed with species as dependent variables and habitat type as a discriminant factor. Data were analyzed using the R-package ‘vegan’ [
25].
To analyze the influence of spatial factors, the sites having the same spatial coordinates, but sampled on more than one date, were pooled into a single record, leaving a matrix with 264 sites, with 54 species present in at least 50 sites.
The analysis of spatial factors is needed because both environmental variables and species abundances can be influenced by values of the same variable measured at the surrounding sites. A correlogram measures this influence at increasing spatial scale. Univariate Moran’s I spatial correlograms were calculated to analyze the response of each environmental variable and of each species to increasing spatial scale. Multivariate Mantel’s correlograms were also calculated to analyze the spatial influence on the full set of environmental variables and the full set of species [
25]. Correlograms were analyzed using the R-packages ‘vegan’ and ‘spdep’ [
26].
To further examine the species response at different spatial scales, MEMs were calculated using the R function
mem, which transforms the distance matrix into an eigenvector matrix, with sites as rows and eigenvectors as columns; the eigenvectors represent the spatial structure at decreasing spatial scales [
26]. The species matrix was submitted to a multiscale pattern analysis (MSPA) [
27] using the R function
mspa, to recalculate the spatial structure of the data and to identify the scales of spatial variation more correlated with species. Calculations were performed using the R-packages ‘adespatial’, ‘spdep’ and ‘vegan’. As a further step, a spatial weighting matrix (SWM) was calculated using the R functions
mem.select,
listw.candidates and
listw.select [
28] to further select the subset of MEM eigenvectors which yields the highest correlation (i.e., the highest adjusted R
2) with species matrix. As a last step, a canonical analysis (=redundancy analysis using the function
pcaiv in the package ‘ade4’) was used to explain the structure of the species matrix constrained by MEM spatial variables selected starting from the best SWM (
sel.lw$best). See Jombart et al. [
27] for a detailed explanation of all of these steps.
The last step in data analysis was the calculation of diversity using the R-package ‘vegan’. To estimate diversity, three spatial scales (i.e., region, large and small water basin) were considered. The mean species number present in each area (=alpha diversity), calculated as the mean of values for each site within the area, the total number of species present in the area (=gamma diversity) and the ratio between gamma and alpha diversity (=beta diversity) were then calculated.
3. Results
The high number of species found (233) was expected, because of the diverse habitat analyzed, including both running waters and lakes from different countries. The list of species included in the data analysis is provided in
Table 2, with the number of samples for each species.
The response of species to environmental factors, removing the potential confounding effect of spatial variables, was analyzed using pCCA. The nine environmental variables included as constraining variables accounted for 11% of total inertia, while spatial variables included as covariates explained less than 5% (
Table 3a).
The inverse pCCAi, with spatial variables as constraining variables and environmental ones as conditioning variables was carried out to establish if spatial variables can have a direct influence on species response, having removed the possible influence of environmental variables. In pCCAi, spatial variables accounted for an even lower proportion of inertia, less than 4% (
Table 3b).
The first eigenvalue accounted for 5.5%, and the second eigenvalue for 2.6% of variation in the pCCA (
Table 4a) but only 1.1% and <1%, respectively, in the pCCAi (
Table 4b); the other eigenvalues accounted for negligible proportions in both analyses.
Results of direct pCCA show a clear separation of high-altitude sites with low water temperature, from lowland sites (
Figure 1a). The separation was evident along the first axis and was explained by an altitudinal temperature gradient. The second axis emphasized a clear trophic gradient, with sites rich in total phosphorus and poor in oxygen concentration separated from sites with low total phosphorous and high oxygen concentration.
The plot of sites evidenced a clear habitat separation (
Figure 1b), with kryal sites grouped to the left and potamal sites at the bottom right. Large and small lakes (AL03–AL06) prevailed on the right. The alpine lakes (ALAlps) at high altitude were plotted near the kryal sites. The separation of lowland lakes (ME04 and ME07) from rhithral sites was less evident, even if rhithral sites were grouped in the center of the graph, while lakes surrounded them and potamal sites were well separated.
Species distribution in the plot (
Figure 1c) mirrored that of sites (
Figure 1b), with taxa characteristic of kryal and alpine lakes plotted on the left, species characterizing potamal and rhithral plotted on the right. The separation of species preferring crenal habitats or lakes was less evident here.
The pCCAi emphasized that the second axis separated sites and species according to latitude (Y) and longitude (X) (
Figure 1d). The axes describing smaller spatial scales are all plotted in the left part of the graph. No clear separation of species according to their preferences is visible; only Tanytarsini are separated better than the species belonging to other tribes, but the species ordination did not separate groups with different ecological needs.
DISCR analysis using habitat types as factors showed a very good agreement between expected and observed classification (
Figure 2,
Table 5 and
Table S2). Some habitats such as AL03 and rhithral had more predicted sites than initially assigned, while others such as lakes (AL05–AL06) and potamal had a lower number of predicted sites than assigned, but in general the agreement was very good. These results confirmed direct pCCA results, i.e., that habitat type was a good predictor of chironomid assemblages.
The response of each environmental variable and of each species to increasing spatial scale, analyzed with univariate spatial correlograms, showed that spatial autocorrelation decreased rapidly with increasing distance class for all variables. The autocorrelations were generally low, below 1. The highest autocorrelation of environmental variables was observed for conductivity (
Figure 3a), the highest autocorrelation of species was observed for
T. gregarius, which appeared to be one of a few species with autocorrelations observable also at larger distances, but it was an exception (
Figure 3b). Low values of autocorrelation mean that the value of a variable at a given site is only slightly related to the values in nearby sites and a still lower influence is observed at increasing spatial scales.
This result was confirmed examining the Mantel correlograms, which showed that Mantel’s autocorrelation was very low and vanished just after the first distance class; this was evident both for environmental variables (
Figure 3c), and for chironomid assemblages (
Figure 3d), confirming the absence of spatial autocorrelations except at the lowest distance.
The MEMs generated from SWM were calculated and numbered from 1 to 264, with 264 being the number of sites; MEM1 is the eigenvector map associated with the highest eigenvalue and maps the largest distances, MEM264 is the one associated with the lowest eigenvalue and maps the shortest distances. The full matrix of MEMs is given in
Table S3. The most significant MEMs were plotted in European maps; the eigenvectors were divided into five classes and are represented by different colors (
Figure 4).
The MEM of the first axis was related to latitude, the MEM of the second and third to longitude, while the other axes could not be associated to a well-defined factor. Few significant correlations between environmental variables and MEM eigenvectors were observed (see
Table S4). The R
2 correlations between the species matrix and MEM spatial predictors, calculated from the best SWM (see Data analysis), were filed in decreasing order (
Table 6). In
Table S5, the completed list of R
2 correlations calculated with two different methods (mem.gab.sel from
mem.select and sel.lw
$best from
listw.select, see
Section 2) are given [
27].
Finally, redundancy analysis (RDA) was carried out to relate the sites x species matrix with the MEM eigenvector matrix. Detailed results of the RDA can be examined in
Table S6, where the scores of all species, sites and MEM variables are given. A summary of results is shown in
Figure 5 and
Figure 6. The separation of sites is still evident according to habitat type (
Figure 5); in this case, the lakes are separated and plotted on the left, even if large, small and volcanic lakes are not well separated, kryal and alpine lakes are in the upper part of figure, while rhithral sites are on the right in the center, and potamal sites are at the bottom right. The separation according to habitat is in agreement with pCCA and DISCR analysis, even if the position of each habitat is changed. The species separation is consequent, with the species with highest spatial autocorrelation as
T. gregarius, and
P. H. choreus have the highest scores in RDA analysis (
Figure 6a,
Table S6). Species preferring lentic habitat are plotted on the right, cold stenothermal species at the top and species characteristic of lotic habitats at the bottom right. The MEM variables are also plotted (
Figure 6b); those with the highest R
2 values are the ones most distant from the center, as is clearly seen when comparing
Table 6 with
Figure 6b.
Diversity for different habitats and for different spatial scales was then calculated to support the previous conclusions. The highest gamma diversity was observed in rhithral (95 species) and potamal (87) sites, and the lowest in volcanic lakes (28), while the highest alpha diversity was observed in lakes AL06 (16) and the lowest in alpine lakes (ALAlps) (13). The highest beta diversity was observed in rhithral (6) habitats and the lowest in volcanic lakes (2). A classification based on water basins shows that the highest gamma diversity (93) was observed in the Sarca basin, and the highest beta diversity in Adda (7); at a lower spatial scale, the highest gamma diversity (77) was observed in Lambro stream, and the highest beta diversity (5) in Bormida stream (see
Table S7 for detailed results). The gamma and beta diversity decreased from the largest to the smallest spatial scale, respectively measuring 67 and 4.6 at the largest scale (regional), 47 and 3.2 at the intermediate scale (large water basin) and 26 and 1.8 at the smallest scale (small water basin).
4. Discussion
Our study showed that environmental variables such as substrate, salinity, water temperature, current velocity, conductivity, acidity, dissolved oxygen, nutrient content, etc., can be summarized in a single factor, i.e., habitat type, classified as kryal, crenal, rhithral, potamal zones of running waters, littoral, sublittoral, and profundal zones of lakes, summarizing preferences for different factors interacting with each other. Habitat type is able to explain the chironomid species composition, while geographic (i.e., latitude, longitude, altitude) and spatial factors (i.e., source distance, water depth) have only an indirect influence, corroborating results of previous studies [
2,
8]. In fact, pCCA emphasized that the highest proportion of inertia was accounted for by environmental variables summarized in the habitat type (
Figure 1b), while spatial factors (i.e., latitude, longitude and their polynomial expansions) accounted for a much smaller proportion of inertia. DISCR analysis of the species matrix carried out using habitat as the discriminant factor confirmed that there was very good agreement between the ‘a priori’ and ‘a posteriori’ classification based on habitat type.
The hypothesis that spatial factors at different scales could significantly influence chironomid response was tested with spatial correlograms and with Moran’s maps. At small scales, spatial autocorrelation was evident only at the shortest distances, with the possible exception of few species such as
T. gregarius, which also showed a moderate autocorrelation at larger distances, but for most species it was negligible also at the shortest distances. Considering large spatial scales, the Moran’s maps evidenced a spatial separation of species according to latitude and longitude, but this was interpreted as an artifact, because the sites at the highest altitudes with lower temperature are more frequent in the western part of the sampled area (Western Alps), suggesting a misleading influence of longitude, and the sites at lower latitude suffer an indirect effect of temperature, having higher temperatures [
30,
31] than the northern sites.
As regards diversity [
32], in previous studies, the influence of spatial factors on chironomid diversity [
11,
12] was analyzed considering the effect of river order. The highest diversity of chironomids was found for streams of intermediate order, suggesting that factors acting at intermediate spatial scales are more relevant than factors at large and small spatial scales, in agreement with the old intermediate disturbance hypothesis [
33]. We did not test influence of river order on diversity; our data emphasized that beta and gamma diversity decreased at the three different spatial scales (regional, large and small water basin), but this was expected given the decreasing extent of the geographic areas. We also observed the highest gamma and beta diversity in rhithral and potamal sites, and the lowest in kryal habitats, again confirming the importance of habitat in explaining chironomid distribution.
Taxonomic composition was considered too cumbersome in ecological studies in recent years by many authors, who tried to substitute or augment taxonomic composition with functional composition descriptors [
34,
35] or with species trait analysis [
36,
37]. The choice between taxonomic, functional and trait composition influenced the interpretation of results at different spatial scales. For example, it was emphasized [
5] that species trait composition could be less affected by regional scale than taxonomic composition, while ecoregion and season accounted for 20.5% of the variance in functional composition, but only 10.9% of the variance in taxonomic structure [
10]. In the present case, we considered only taxonomic composition, because in our opinion, trait analysis and functional descriptors are not sufficiently developed to be applied to chironomid species [
37].
There are few or no publications that can be compared with our study in considering response of chironomids to different spatial scales. Feld & Hering [
10] investigated benthic macroinvertebrates at different spatial scales (ecoregion, catchment, reach, site) in four different countries (Sweden, the Netherlands, Germany and Poland). Mykrä et al. [
38] focused their study on the Fennoscandia bioregion with pCCA, and concluded that local factors are prevalent in explaining macroinvertebrate distribution, although regional factors are also relevant at a larger spatial scale. In the present analysis, the area sampled included Italy, France, Switzerland, Austria, Germany, Greece, Montenegro and Algeria, although samples from Italy were largely prevalent and included Western, Central, Eastern Alps, Prealps, lowland Po River, different rivers in Central–South Italy, large and small prealpine lakes and volcanic lakes (
Table S1), and the whole area was analyzed at three different spatial scales (regional, large and small water basin). Similar analyses (i.e., pCCA) were used [
10,
38] as in the current study; however, for various reasons, a direct comparison with our results is complicated. For instance, in those studies, only running water samples were analyzed, while we included both lotic and lentic samples. Furthermore, although both the current and previous studies included species-level data, in the current study, we focused only on chironomid assemblages, while in the studies of Feld & Hering [
10] and Mykrä et al. [
38], the whole benthic macrofauna was analyzed. Finally, fewer environmental and spatial variables were included in the current study (nine environmental and two spatial variables) compared to the previous studies: in [
10], 31 and in [
38], 15 environmental variables at four [
10] (i.e., mega, macro, meso, micro), or three [
38] (i.e., bioregion, ecoregion, drainage) hierarchical extents. Feld & Hering [
10] concluded that interactions at different spatial scales confounded the interpretation of the results, but the meso-scale variables accounted for the largest source of variation; Mykrä et al. [
38] considered local factors more important.
We can conclude that spatial scales are not relevant in modeling chironomid taxa assemblages. It is well known that at a very large spatial scale, considering wide global areas (Palaearctic, Nearctic, Oriental, Neotropic, Australian), chironomid species composition is quite different [
6], but within the restricted area considered (i.e., the Mediterranean–Alpine area), the spatial factors have no influence, or they are mediated by other factors (e.g., altitude, water temperature etc.). Anthropogenic stress was not a focus of the present research [
39], but even if many altered sites were included in the dataset, they did not substantially alter our conclusions about the prominent effect of habitat, although the well-known response of chironomids to oxygen shortage and eutrophication (measured as TP) [
2,
8,
18] was confirmed here (
Figure 1a).
5. Conclusions
Considering the response of benthic macroinvertebrates, and of chironomids in particular, to environmental factors, one concern is that important predictors may be missing in the model. This issue can be addressed by considering a more general factor such as habitat type, classified as littoral, sublittoral and profundal zone for lakes, and kryal, crenal, rhithral and potamal zone for running waters, as the best predictor of benthic macroinvertebrates in general and of chironomids in particular. Habitat type summarizes a suite of different environmental variables, such as substrate, salinity, water temperature, current velocity, conductivity, acidity, dissolved oxygen, nutrient content etc., avoiding the risk that some important predictors may be overlooked. The present database was analyzed with multivariate methods (pCCA, DISCR) able to combine different variables into few descriptors, i.e., eigenvectors with high inertia; we found that the eigenvectors with the highest inertia could be easily associated with habitat types. This was also confirmed by the analysis of spatial autocorrelation and Moran’s I eigenvector maps, which supported the conclusion that spatial factors act only as indirect drivers determining species composition, at least within an area with an extension comparable to the Mediterranean region.