1. Introduction
The Earth’s climate systems have important implications for the ecology and physiology of key marine species. One of these systems is the 1997–98 El Niño–Southern Oscillation (ENSO) phenomenon, which impacts the small pelagic ecosystem in northern Chile [
1]. The findings of Ulloa et al. [
2] considered the importance of studying species’ dependent attributes for evaluating the biological impacts of environmental perturbations produced by the ENSO phenomenon. Specifically, a long-term climate variability estimate was produced in northern Chile [
1], showing anomalies associated with El Niño events, which affected the abundance, recruitment, reproduction, adult biomass and environment of coastal pelagic fish such as the anchovy (
Engraulis ringens) on different temporal and spatial scales [
3].
In addition, large local-scale phenomena such as regime shifts, ENSO cycle (El Niño-La Niña), seasonality, coastal-trapped waves and upwelling events have been postulated as driving biological processes in northern Chile [
3]. Examples include, (1) distribution of anchovy changes in space and depth during warm conditions such as El Niño; (2) after 2002 female length classes showing a positive trend of the gonadosomatic index (GSI), indicating favorable development of anchovy gonads coinciding with the period of prevailing cool conditions without strong, warm events (i.e., El Niño 1997–98); and (3) cool conditions will favor the presence of cold coastal water, a shallower thermocline, stronger upwelling and higher productivity [
4]. Differences could be related to how anchovies synchronize reproductive dynamics with local environmental conditions to foster survival of offspring [
5]. These dynamics are ultimately influenced by other long-term climatic processes, such as mild temperature changes [
6]. This is due to warm/cool phase changes produced by environmental changes associated with sea surface temperature (SST) which affects upwelling habitat conditions. Thus, regional factors could reduce the horizontal upwelling habitat, leading to more anchovy abundance on the coast [
4,
5].
More recently, [
7] characterized how the anchovy synchronizes reproductive dynamics. They investigated how local environmental variables such as the multivariate ENSO index (MEI) and Humboldt Current index (HCI) [
8,
9] couple with the reproductive timing of anchovy, measured by GSI. Contreras-Reyes et al. [
7] concluded that the beginning and end of the anchovy spawning period fluctuate mainly because of environmental factors, that the environment is affecting the timing of gonad development of anchovies off northern Chile, that the strength of the relationship varied according to female body size and that this leads to at most two-monthly spawning events per year. However, the authors presented these relationships from an autocorrelation function perspective, which ignores the possibility that an environmental variable could cause the biological one. Distinguishing causality from correlation is a crucial problem in connected dynamic systems, especially when system variables appear positively coupled at certain times but negatively at others, depending on system state [
10,
11]. Additionally, studies are lacking on important aspects such as short- and long-term trends of local and regional environmental conditions, which influence the local and regional oceanographic conditions linked to climate change.
To test the statistical significance of causality between two time series and determine the direction of causality, the Granger-causality method [
12] was employed in [
13,
14,
15]. This approach motivated the present study, as a need has arisen to characterize the short- and long-term causality of the environment in anchovies’ biological processes. We applied the Granger-causality method to test whether statistically significant feedback among environmental indexes and GSI and condition factor (CF) [
16] time series exists. In this case, HCI and regional environmental variability drivers such as Pacific decadal oscillation (PDO) and Antarctic decadal oscillation (AAO), plus local environmental variability drivers such as Ekman transport (ET), the sea turbulence index (TI) and SST can provide additional predictive information from the past for the anchovy’s reproductive and feeding activities. The Granger-causality method focuses on the type of dependence among the variables; the reproductive and feeding activities by interval of length with more dependence on environmental variables; and the expected variance produced in GSI monthly means and intra-annual behavior [
17].
This paper is organized as follows: Granger-causality based on cross-spectrum methods is described in
Section 2.
Section 3 presents the data collected off northern Chile to illustrate the evidence for causality for local and regional environmental drivers and biological ones. Discussion concludes the paper in
Section 4.
2. Frequency-Domain Granger-Causality Test
To analyze the cause–effect interaction between the variables, i.e., to identify the stimuli and responses for the variables along the time axis, the Granger-causality test [
12] was carried out. Granger-causality is a feedback mechanism method based on cross-spectral methods. Let
be a stationary process with no-periodic components, so the spectral representation of
is
where
are the Fourier frequencies.
is the spectral density of
given by
where
,
, is the autocovariance function (ACF) of
and
. Given that for some processes the exact ACF is difficult to obtain, it is approximated by the sample ACF,
, to then obtain the periodogram as estimator of the spectral density [
18].
The main idea of Granger-causality is measuring the causality between two time series based on spectral representation of Equation (
1). To this end, the cross-spectral density and the coherence function will be defined next.
Definition 1. Let and be two stationary time series, where has a spectral representation as in Equation (1) but with spectral density , , given in Equation (2) with autocovariance function . Thus, the coherence function between and is defined asrespectively, where is the cross-spectral density between and , is the conjugate of the complex function and “” denotes that Granger-causes . The cross-spectrum density is the Fourier transformation of cross-covariance of two time series, which gives us the degree of relationship between two time series at different frequencies. The cross-spectral density of Definition 1 is a complex function of and the absolute value of the coherence function is a real function defined in that measures the relationship degree between and , given by the correlation coefficient square between the frequency components of the time series. If , , i.e., independence exists between spectral densities of both time series. Additionally, on the contrary, if , strong dependence exists between spectral densities of both time series.
Considering the univariate time series
and
of Definition 1, [
19] showed that in a two-dimensional vector of time series,
at time
, where
is a finite
p-order vector autoregressive (VAR) process (see, e.g., [
20]) of the form
where
is a
lag polynomial with
and
, and the error vector
is a white noise process with
and
, where
is a
positive definite variance-covariance matrix. The VAR process may include a constant, a trend or dummy variables.
Considering a Cholesky decomposition using a lower triangular matrix
, the
matrix can be decomposed as
such that
and
. With the assumption that
is a stationary process, the moving average (MA) representation of the process is
where
. Then, the spectral density of
is
Definition 2. [21] Considering the spectral density of Equation (4), the measure of causality of over is The null hypothesis that
does not Granger-cause
and the alternative hypothesis that
Granger-causes
at frequency
is then given by
To obtain a suitable Wald statistic,
is obtained by replacing
and
in Equation (
5) by the estimated values obtained from the fitted VAR [
20]. However, as in [
19], the disadvantage of Wald’s statistic is the exact computation of
. To solve this, the following facts should be considered:
- (a)
From Equation (
5), if
, then
, which implies that
does not Granger-cause
x at frequency
;
- (b)
From Cholesky decomposition, we have
and
where
is the lower-diagonal element of
and
is a positive matrix given that
is a positive definite variance-covariance matrix.
From (a) and (b) and the Euler representation, we have that
for a frequency
if
with
the
th-element of
defined in Equation (
3),
. Given that
if
or
, from Equation (
7) we have that
Let
, and
and
; the process
can be modeled by harmonic regression
If
i exists such that
, then
Granger-causes
; if
for all
i,
does not Granger-cause
. This implies the mean square error (MSE) of
is smaller than
. Then, the null hypothesis of Equation (
6) is equivalent to
, where
,
, is a
matrix defined by
and
.
Under null hypothesis,
is Granger-cause of
if
for a specific
i versus the alternative hypothesis,
does not Granger-cause
if
, for all
i. Therefore, the null hypothesis is tested with a joint Fisher
F-test of the harmonic regression model given in Equation (
9). The
F statistic is approximately Fisher-distributed as
for a frequency
. A critical value for rejecting the null hypothesis is 0.05 if a 95% confidence level is considered.
The proposed test developed by [
19] postulates that a variable
affects another variable
at a finite time horizon. Typically, some environmental and biological systems are modeled as cointegrated (non-stationary) ones [
13]; thus, the definition of causality at frequency zero is equivalent to the concept of long-run causality. For stationary systems, it is assumed that
is predicted using only the past of the series. If the spectral density of the resulting forecast error at low frequencies can be explained by the additional past information of
, then
is said to be a long-run cause for
. Both cases must be considered in marine ecosystem modeling.
3. Application to Southern Humboldt Current Ecosystem
The anchovy is the main small pelagic fishery species in the upwelling southern Humboldt Current ecosystem (SHCE), with stock in southern Peru and northern Chile (
–
S). In this system, the PDO has been described as the main driver of pelagic species alternation in several upwelling systems on eastern coasts, such as California’s and the Humboldt Current [
1,
9]. The PDO oscillates between warm and cool phases associated with sardine/anchovy dominance. Specifically, the PDO presented a warm phase from 1989 to 2000, and then a cool phase started that is predicted to last until 2025. Anchovies’ existence on northern Chile’s coasts is related to local environmental conditions, characterized by high-intensity coastal upwelling processes in summer due to the intensity of southern winds [
22]. The coastal upwelling processes generate strong temperature gradients coupled with the phytoplankton community, with the coolest and warmest habitats inside and outside the coast, respectively, and abundant anchovy biomass in hot zones.
In this section, we analyze the influence and Granger-causality of regional (HCI, PDO and AAO) and local (IT, ET and SST) factors in anchovy reproductive and body condition indicators (GSI and CF, respectively) in the upwelling Humboldt Current system in northern Chile.
3.1. Data and Software
The study area was restricted to anchovy landings in northern Chile (
21
–
00
S), along the Peru–Chile maritime border at the port of Antofagasta (
26
S; see
Figure 1).
3.1.1. Environmental Data
On a local scale, the 1989–2017 monthly means of meteorological and oceanographical station records of Antofagasta port (
26
S) were analyzed for SST, ET and TI. SST data correspond to images of global SST level 4 and are produced daily in a grid of
by NOAA’s National Climatic Data Center with a spatial resolution of 25 km obtained from the Group for High Resolution SST. The satellite information on SST was analyzed in [
4] with the help of daily images by constructing 1896 interpolated weekly images using empirical orthogonal functions (DINEOF) to complete the missing data between 1981 and 2017 [
4]. For each interpolated weekly image, the average SST was extracted for the first 10 nautical miles from the coast. SST, air temperature, chlorophyll, mean sea level and wind magnitude and direction were variables used to estimate ET and TI, according to methods proposed by [
23,
24], respectively.
3.1.2. Biological Data
The study period was from 1989 to 2017 and the studied small pelagic species were part of the Chile–Peru shared stock, located between
S and
S. Analyzed biological information came from the Fisheries Development Institute’s (IFOP) monitoring program and was financed by the Chilean Undersecretariat of Fisheries and Aquaculture (SUBPESCA). Data were generated from biological sampling at landings or aboard fishing vessels. A total of 25,000 samplings are available on average per year; 50 specimens are selected for each sampling. Sex, total weight and gonad weight (0.01 g) were determined (visual inspection according to [
25]).
Reproductive condition was determined by calculating the GSI [
4,
7] at month
t as the percentage of gonad monthly mean weight (
) in relation to monthly mean body weight (
) for all fish sampled as
Fulton’s condition factor (CF) [
26] of all fish sampled was calculated at month
t as follows:
where
is the monthly mean weight. In the denominator of Equation (
12) exponent 3 is imposed as an approximation of the estimated exponent in the regression model given in [
16].
3.1.3. Software and Computational Implementation
All estimations and computational implementations were carried out with
R software [
27].
To evaluate the presence of unit roots in time series, the augmented Dickey–Fuller (ADF) [
28] test was considered. For the presence of seasonal unit roots in time series, the Hylleberg–Engle– Granger–Yoo (HEGY) test [
29] was considered. The ADF test was implemented in the
adf.test function of the
tseries package and the HEGY test was implemented in the
hegy.test function of the
uroot package. To detect significant frequencies in time series, the robust G-test was considered based on robust regression [
30] and implemented in the
robust.spectrum function of the
GeneCycle package.
The estimation of cross-spectral density and coherence function was implemented in the
crossSpectrum function of
IRISSeismic package. The Granger-causality test described in
Section 2 was implemented in
grangertest function of
lmtest package. The VAR parameter estimation and frequency-domain-based Granger-causality test were respectively implemented in
VAR and
causality functions of
vars package.
3.2. Results
Figure 2 plots the environmental and biological variables described. The plot of the regional environmental drivers seems to be non-stationary, though it could well be trend-stationary. Additionally, the plot of local environmental and biological indicators looks stationary with an annual cyclical pattern. Some relationships can be highlighted between indicators, mainly produced by the 1996–1998 El Niño phenomenon, which can be crucial for analysis of causality (see
Section 2). Considered time series include the most relevant events (significant trend breaks identified in [
7]): the first in 1995:5, the second and highest in 1998:6 and the last one in 2002:3 [
31].
We started with unit root tests to verify if the time series are stationary
(and/or
).
Table 1 shows, based on the ADF unit root tests, that the null hypothesis of a non-stationary unit root was rejected for all variables, except HCI and TI (at 95% confidence level), and ET (at 98% confidence level). Next, the ADF test was applied to the detrended processes but gave a rejected null hypothesis of a non-stationary unit root. As mentioned in
Section 3.1.2, local environmental and biological time series presented significant frequencies at 6 and 12 months, as confirmed by the p-values of the robust G-test. The HEGY test did not present seasonal unit roots but confirmed the results of the ADF test for HCI and ET. Therefore, p-values smaller than 0.01 are not shown in
Table 1, where p-values of GSI and CF also suggest seasonal stationarity. Therefore, the differentiation
was only imposed for HCI, ET and TI time series for the next analysis.
Figure 3 illustrates the coherence functions by frequency based on cross-spectral density between environmental and biological time series. We can probably consider a linear relationship between two tested time series’ frequencies if the coherence function is large at specific frequencies. As can be seen in
Figure 3a–c, the coherence function was close to 0 for frequencies in
and the highest coherence was obtained at
(12 months or annual cycle), and the second one was obtained at
(6 months or inter-annual cycle), obtained by seasonal components of GSI and CF. However, in
Figure 3d–f, the coherence function is close to 1 for frequencies in
and close to 0 for frequencies in
, where the highest coherence was obtained at
. This means that evidence exists of causality at different lags (but related to low number of predictors) based on the cross-spectrum analysis when compared the local environmental and biological time series, mainly produced by the significant frequency detected by robust G-test.
Subsequently VAR models were evaluated for bivariate time series, composed from the local/regional environmental and biological time series.
Figure 4 illustrates Akaike’s information criterion (AIC) with respect to number of predictors/regressors (
p). The AIC was used to determine the number of predictors/regressors, where the smallest AIC values indicate the “best” models. Given that the Granger-causality test is very dependent and highly influenced by the selection of predictors/regressors, an appropriate
p for the objective variable
in Equation (
3) should be determined by AIC. No environmental indicator had the minimum AIC value when the GSI was considered as a biological indicator; thus in the first instance a cut-off point (or marked reflection point, vertical dotted line) was considered. However, when the CF was considered as a biological indicator, a minimum AIC value emerged among
and
predictors in panels (a)–(d), but for panels (e) and (f), the same situation as in the GSI case occurred.
According to AIC results of
Figure 4, a VAR model with a low number of predictors was considered for frequency-domain Granger-causality test in the first instance, and a high number of predictors (
) in the second instance, except for bivariate time series composed of CF and detrended HCI, PDO, AAO and SST. The results of these frequency-domain Granger-causality tests are presented in
Table 2. For the low
p and GSI as
X case, the null hypothesis that
Y does not Granger-cause
X was rejected at a 95% confidence level only for PDO, SST and detrended TI as
Y cases, but the null hypothesis could not be rejected for a high
p. This means that evidence exists of PDO, SST and detrended TI Granger-causing GSI. For the CF case, only for SST and detrended ET was the null hypothesis rejected with a 95% confidence level for a low
p, but again, the null hypothesis could not be rejected for a high
p, proving that SST and detrended ET Granger-cause CF. This corroborates the evidence of Granger-causality at different lags in the cross-spectrum analysis when compared to the local environmental and biological time series.
Contreras-Reyes et al. [
7] detected cross-correlation between detrended HCI and GSI; however, we observed in
Table 2 evidence that Granger-causality running from detrended HCI to biological processes is weak for no-coupling. Moreover, the weakest causal effect holds for AAO according to tests of both biological processes. This occurs because the Antarctic dynamic system is not synchronized with the biological processes of small pelagics from the Chile-Peru shared stock. Therefore, AAO could be discarded as a strong signal in biological indicator estimates and cannot be used to detect the presence of external drivers that might be unknown in the modeling.
4. Conclusions and Discussion
The SHCE is an important topic among researchers working on the drivers of pelagic species’ biological indicators. However, the selection of “correct” drivers for identifying causality in the SHCE can be difficult. Sometimes the variables are positively coupled, but at other times they appear unrelated or even negatively coupled depending on the local/regional environmental indicator [
4]. Chile–Peru shared stock exhibit radically different dynamic control regimes by large local-scale phenomena, such as regime shifts, ENSO cycle, seasonality, coastal-trapped waves and upwelling events, causing the correlations between small pelagic species and phytoplankton and producing a change sign [
5].
Given the importance of the issue of climate change, this study revisited the question of whether environmental factors influence reproductive and body conditions of the anchovy by using a frequency-domain Granger-causality test. This technique can capture nonlinearities, potentially intrinsic to data generating processes for local/regional environmental and biological processes, for instance, due to the structural breaks explained in [
4,
7]. These structural breaks in SST, GSI and CF were determined by the 1996–1998 El Niño event, and this study presented evidence that a regional indicator such as PDO could be an important factor in anchovy development. Moreover, given that the Granger-causality test is highly influenced by the number of predictors/regressors
p in the VAR model, for a high
p (∼100), this implies that a variable
affects another variable
at a infinite time horizon; however, this concept is not addressed by the frequency-domain Granger-causality test, as is postulated in the conclusions of [
19]. Therefore, our study presented the evidence of Granger-causality for a low
p (
, see
Table 2) based on a cut-off point criterion. In addition, the study highlighted that PDO, SST, TI and ET have always been important in predicting reproductive and body condition activity, as researchers working on links between environmental conditions and pelagic species can predict movements of the maximum and minimum peaks of biological indicators. This study could also be useful for predicting anchovy abundance in the SHCE [
5,
11].
While SST is believed to be a major cause of GSI and CF time series [
3,
4], there is a debate that suggests that regional drivers such as PDO anomalies also drive these biological indicators. However, the evidence in terms of the latter line of reasoning is mixed. Hernández-Santoro et al. [
4] highlighted that seasonal change of SST explained and caused GSI and CF, determining a delay of the start and maximum GSI, and a negative relationship with CF. In addition, [
4] showed a gradual SST increase mainly during the austral winter starting in 2006, due to a phase change in the PDO [
32]. Therefore, this study corroborates that a warm condition could trigger a rise in anchovy gonad development, so the GSI could be explained by SST as local environmental indicator. Additionally, anchovy synchronize their body condition dynamics with the local environmental conditions given by TI and ET.
This study is based on previous cross-correlation analysis of [
4,
7], where the the question of causality in a dynamic ocean SHCE was addressed with a different methodology. To answer that question, the Granger-causality concept provides predictability, rather than correlation of these studies, giving more evidence of causation between time-series variables [
11], and filling the gap of determining Granger-causation over correlation. Although correlation is neither necessary nor sufficient to establish causation, it remains deeply ingrained in our heuristic thinking [
10]. For example, detrended HCI does not Granger-cause GSI and/or CF, but they are correlated. On the other hand, lack of correlation does not imply lack of Granger-causation. However, Granger-causality addresses prediction rather than correlation as the criterion for causation in time series and assumes that causes can be separated from effects. This is possible in purely stochastic system; however, it is not defined for all systems, such as deterministic dynamic systems. Additionally, while we only analyzed Granger-causality in addition to the correlation analysis by [
4,
7], we must highlight that GSI and CF were used as partial proxies for reproductive and body condition factors of anchovy, respectively. Thus our evidence of causality between the processes should not be associated with a real correlation between two variables. To confront this issue, a detrended cross-correlation analysis [
33] will be required, which was beyond the scope of this study.
Finally, further work must consider a spatial-temporal approach for causality [
34]. Neglecting these issues could also lead to spurious research outcomes, ignoring more significant influences local/regional environmental drivers have on biological ones. However, our objective was to obtain the first evidence of causality at a space-point scale but over a long period.