Next Article in Journal
Comparative Measurements of Air Pollution Using Low-Cost Sensors in the Center of Athens and the Mt. Hymettus Aesthetic Forest
Previous Article in Journal
Development and Validation of an Enhanced Aerosol Product for Aeolus (L2A+)
 
 
Please note that, as of 4 December 2024, Environmental Sciences Proceedings has been renamed to Environmental and Earth Sciences Proceedings and is now published here.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

An Improved Indicator for Causal Interaction in Non-Linear Systems †

1
Department of Physics, University of Patras, 26504 Patras, Greece
2
Ecodevelopment S.A., 57010 Filyro, Greece
*
Author to whom correspondence should be addressed.
Presented at the 16th International Conference on Meteorology, Climatology and Atmospheric Physics—COMECAP 2023, Athens, Greece, 25–29 September 2023.
Environ. Sci. Proc. 2023, 26(1), 92; https://doi.org/10.3390/environsciproc2023026092
Published: 28 August 2023

Abstract

:
Utilizing an extension of Pearson’s correlation in the case of random vectors, we improve the empirical dynamic modeling causal analysis of non-linear systems. To prove the effectiveness of the use of such an extension we analyze two real-world examples, the paramecium-didinium protozoan system and the influence of environmental variables on mosquito abundance in northern Greece. In both examples it is shown that the causal analysis based on the extended metric outperforms the usual method of measuring the correlation between observed and predicted values of a single vector component.

1. Introduction

Empirical dynamic modeling (EDM) is a useful technique for the study of causal interactions of non-linear systems. The basic idea involves creating time-lagged histories of an observed time series from which (due to a theorem by Takens [1]) it is possible to construct an embedding of the attractor manifold that describes the evolution of the underlying dynamical system. Cross-mapping the time-lagged manifolds of two time series, belonging to the same dynamical system, forwards and backwards in time one can infer whether a causal relationship exists between the two [2,3,4]. This method has often been employed in the study of causal interactions of various systems [5,6,7,8].
Although the reconstructed points are vectors in an E dimensional state space, the quality of cross mapping in the literature is determined by the fidelity between observed and predicted values of a single component. Since two embeddings of the same manifold are, by definition, homeomorphic to each other and share the same topology, it makes more sense to enquire to what degree the full vector of observations (consisting of the current value of the timeseries along with its E 1 previous values) is correlated with its prediction under cross mapping. To achieve this, it is necessary to extend the notion of correlation in the case of vectors [9,10].
As an example, we apply the techniques to the causal analysis of a protozoan predator-prey system and on the study of environmental effects on mosquito abundance. In the first case it is shown that the vector correlation is more robust and results in a consistent causal network under changes of the embedding dimension, while in the second case it is able of detecting more causal factors compared to its scalar counterpart.

2. Methods

EDM is concerned with the reconstruction of the state space of a system from observations of a single time series X . As was already mentioned, Takens’ theorem [1] ensures that for an appropriate embedding dimension E and time lag τ , the manifold of time-lagged vectors
X t = x t , x t τ , , x t E 1 τ
is an embedding of the attractor of the system. For a fixed τ , the best choice for the value of the embedding dimension can be determined by requiring the correlation between observed and predicted values (first component of Equation (1)) of the forward trajectory of the manifold, one time step into the future, to be maximized.
Predictions are made by use of Simplex projection [11]. This is a k-nearest neighbors regression algorithm which uses information about the forward trajectory of dynamics similar to that of the predictee. For any point X t in the reconstructed state space, the prediction T time steps into the future is given by a weighted average of its E + 1 closest neighbors, forward in time
X ^ t + T = i = 1 E + 1 w i X t i + T
with exponential weights equal to
w i = exp d i / d ¯ j = 1 E + 1 exp d j / d ¯ ,   i = 1 , , E + 1
where d i is the Euclidean distance of X t from its ith closest neighbor X t i and d ¯ is the mean distance.
Causality between two time series X and Y can be deduced by cross mapping their lagged embeddings. Once again one employs Simplex projection to predict points of one manifold, e.g., Y t , but in this case the appropriate weighting is determined by the structure of the other manifold (the distances of X t from its closest neighbors as before, note that in this case T can take both positive as well as negative values). The reasoning behind this thinking is that if the two variables are part of the same dynamical system, then their embeddings are homeomorphic to each other, so in principle it should be possible to use information about the neighborhoods of one variable to make predictions about the other. Since both the observed and predicted points are part of an E + 1 dimensional manifold, the quality of predictions in this case is given by the vector correlation [10]
ρ Y , Y ^ = tr Y Y ^ tr Y Y Y ^   Y ^ 1 / 2
where AB denotes the cross-correlation matrix of random vectors A and B . If the maximum value of the vector correlation for T 0 is greater than the one for T 0 then past information about the structure of the manifold of X can be used to predict Y forward in time. This suggests that X is a causal connected to Y . Conversely it the future topological structure of X results in a higher vector correlation for the predictions of Y , i.e., the maximum cross mapping for predicting Y measured with the respect to the vector correlation is better for T 0 than for T 0 , then the reverse is true and suggests that Y is causing X .

3. Results

We will now analyze two real-world examples and compare their causal structures when both the scalar and vector correlation metrics are employed.

3.1. The Paramecium-Didinium Protozoan System

The paramecium-didinium protozoan system by Veilleux [12] consists of two single celled organisms acting as predator (Didinium nasutum) and prey (Paramecium aurelia). Every twelve hours cultivations of the bacteria in petri dishes were sampled non-destructively and their counts recorded. The evolution of the system depends on the concentration of Cenophyl medium (CC) on which the initial population of paramecium was grown [12,13]. Here we analyze two instances which allow for coexistence between the two species, one for which CC = 0.375 g/L and one for which CC = 0.5 g/L. It is known that for this system embedding dimensions E 3 can be used for the analysis [2,3].
In Table 1 and Table 2 we present the optimum time-lag T that maximizes cross mapping for embedding dimensions 3 E 10 . As was mentioned above in the Methods Section, information about the topology of the lagged manifold of one species is used to make predictions about the other. Due to the limited number of available data a leave-one out cross validation was performed in which the model is trained on every point except for the one in question. The maximum time-lag chosen for the analysis was equal to T max = ± 72 h (extending the maximum had no effect on the analysis). In the table X Y stands for “variable X is used to cross map (or predict) variable Y , T time steps forwards of backwards”.
We observe that the causal analysis based on the vector correlation metric performs better than the scalar case. When CC = 0.375 g/L vector correlation consistently implies a top-down control by the predator in which didinium causally affects paramecium while the scalar case switches between a top-down and a two-way causation structure where both species affect each other. For CC = 0.5 g/L even though both metrics imply the same causal structure for every embedding dimension (a bottom-up control by the prey in the case of vector correlation and two-way causality in the case of scalar correlation) the optimum time-lag when scalar correlation is employed decreases with increasing dimension while in the case of vector correlation remains relatively constant.

3.2. Environmental Effects on Mosquito Abundance

We will now study the causal effect of environmental variables on daily mosquito abundance (of the culex genus). The data was collected every fortnight between the 21st and the 39th week of 2012 in the regional district of Thessaloniki (see Figure 1). Mosquito abundances were sampled by trap placement every two weeks and the following environmental variables on the day of capture were recorded: vegetation density and distribution (NDVI), changes in water content (NDWI), vegetation water content (NDMI), day mean of land surface temperature (LST), accumulated precipitation two weeks before of date of placement (RAIN) and mean hourly magnitude of wind (WIND).
Due to the very short length of each individual time series (only 10 available values per station), we performed a spatial extension of EDM on the data where lagged vectors from multiple spatial replicates are combined into a reconstruction of the state space [4,14]. To ensure a high density of points in embedded space and a good fit for the model, we consider only those replicates with a high degree of correlation between abundances (Table 3) and perform a min-max regularization on the time series. By predicting the forward trajectory of the lagged manifold with the help of Simplex projection two weeks into the future, as described in the Methods Section, we find that the best embedding dimension in this case, presented in Figure 1, is equal to E = 6 . In order to increase the number of available points and max time-lag for the causal analysis we instead choose E = 5 . This allows us to check for time-lags up to T max = ± 4 weeks (To avoid under fitting the data, we require that the total number of available points for training the model with N replicates of length L , equal to N L E T + 1 , to be greater than four times the number of required neighbors).
In Table 4 we present the results of the optimum time-lag (in weeks) for cross-mapping between each environmental variable and culex abundance for both correlation metrics.
By using the vector correlation as the metric, all the environmental variables except for NDMI are found to be causal factors of culex abundance with both cross mappings displaying approximately the same correlation (we only consider cases for which both cross mappings have a positive value). In contrast using the scalar correlation reveals only three environmental variables as causal factors (NDMI, LST, RAIN) (see Figure 2) with the differences in correlation for the mean land surface temperature being more pronounced.

4. Conclusions

It was demonstrated that, in the study of causal interactions with EDM techniques, performing cross-mapping with a vector-based measure of fidelity is preferable to measuring the fidelity of only a single component as is usually done in the literature. Specifically, the analysis with the extended metric was shown to be robust under changes of the embedded dimension, this is important in situations where the dimension is hard to determine due to either noise or a limited number of available data. This also seems to agree well with the idea of nearest-false-neighbor techniques that for embedding dimensions above the correct value the manifold is essentially ‘unfolded’ so no qualitative changes should be expected [15].
Vector correlation also enhanced the study of the effects of the environment on the number of culex mosquitoes in northern Greece and was able to detect a larger number of causal factors. Since mosquitoes are known to be vectors of diseases such as West Nile virus, these applications could be of particular interest especially in guiding vector control strategies and health policy assessments.

Author Contributions

Conceptualization, N.K. and I.K.; methodology, N.K.; data curation, S.G. and S.M.; writing—original draft preparation, N.K.; writing—review and editing, N.K.; supervision, I.K.; funding acquisition, I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH–CREATE–INNOVATE (project code: Τ2ΕΔΚ-02070). This work was also partially supported from the EIC Horizon Prize “Early Warning for Epidemics”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Culex data was obtained from Ecodevelopment S. A. and are available from the authors with the permission of Ecodevelopment S. A.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980; Rand, D., Young, L.-S., Eds.; Springer: Berlin/Heidelberg, Germany, 1981; pp. 366–381. [Google Scholar]
  2. Sugihara, G.; May, R.; Ye, H.; Hsieh, C.H.; Deyle, E.; Fogarty, M.; Munch, S. Detecting causality in complex ecosystems. Science 2012, 338, 496–500. [Google Scholar] [CrossRef] [PubMed]
  3. Ye, H.; Deyle, E.R.; Gilarranz, L.J.; Sugihara, G. Distinguishing time-delayed causal interactions using convergent cross mapping. Sci. Rep. 2015, 5, 14750. [Google Scholar] [CrossRef] [PubMed]
  4. Clark, A.T.; Ye, H.; Isbell, F.; Deyle, E.R.; Cowles, J.; Tilman, G.D.; Sugihara, G. Spatial convergent cross mapping to detect causal relationships from short time series. Ecology 2015, 96, 1174–1181. [Google Scholar] [CrossRef]
  5. Tsonis, A.A.; Deyle, E.R.; May, R.M.; Sugihara, G.; Swanson, K.; Verbeten, J.D.; Wang, G. Dynamical evidence for causality between galactic cosmic rays and interannual variation in global temperature. Proc. Natl. Acad. Sci. USA 2012, 112, 3253–3256. [Google Scholar] [CrossRef]
  6. McBride, J.C.; Xiaopeng, Z.; Munro, N.B.; Jicha, A.G.; Schmitt, F.A.; Kryscio, R.J.; Smith, D.H.; Jiang, Y. Sugihara causality analysis of scalp EEG for detection of early Alzheimer’s disease. NeuroImage Clin. 2015, 7, 258–265. [Google Scholar] [CrossRef] [PubMed]
  7. Stathopoulos, S.; Tsonis, A.A.; Kourtidis, K. On the cause-and-effect relations between aerosols, water vapor, and clouds over East Asia. Theor. Appl. Climatol. 2021, 144, 711–722. [Google Scholar] [CrossRef]
  8. Emiliano, D.; Adsuara, J.E.; Martínez, M.Á.; Piles, M.; Camps-Valls, G. Inferring causal relations from observational long-term carbon and water fluxes records. Sci. Rep. 2022, 12, 2045–2322. [Google Scholar] [CrossRef]
  9. Hanson, B.; Klink, K.; Matsuura, K.; Robeson, S.M.; Willmott, C.J. Vector correlation: Review, exposition, and geographic application. Ann. Assoc. Am. Geogr. 1992, 82, 103–116. [Google Scholar] [CrossRef]
  10. Puccetti, G. Measuring linear correlation between random vectors. Inf. Sci. 2022, 607, 1328–1347. [Google Scholar] [CrossRef]
  11. Sugihara, G.; May, R.M. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 1990, 344, 734–741. [Google Scholar] [CrossRef] [PubMed]
  12. Veilleux, B.G. An analysis of the predatory interaction between Paramecium and Didinium. J. Anim. Ecol. 1979, 48, 787–803. [Google Scholar] [CrossRef]
  13. Jost, C.; Ellner, S.P. Testing for predator dependence in predator-prey dynamics: A non-parametric approach. Proc. R. Soc. London. Ser. B Biol. Sci. 2000, 267, 1611–1620. [Google Scholar] [CrossRef] [PubMed]
  14. Hsieh, C.; Anderson, C.; Sugihara, G. Extending nonlinear analysis to short ecological time series. Am. Nat. 2008, 171, 71–80. [Google Scholar] [CrossRef] [PubMed]
  15. Krakovská, A.; Mezeiová, K.; Budáčová, H. Use of false nearest neighbours for selecting variables and embedding parameters for state space reconstruction. J. Complex Syst. 2015, 2015, 932750. [Google Scholar] [CrossRef]
Figure 1. (a) Location of sampling stations; (b) Leave one out cross-correlation between observations and predictions of the forward trajectory of the manifold two weeks into the future.
Figure 1. (a) Location of sampling stations; (b) Leave one out cross-correlation between observations and predictions of the forward trajectory of the manifold two weeks into the future.
Environsciproc 26 00092 g001
Figure 2. Cross-mapping causal network for mosquito abundance with (a) a scalar correlation metric; (b) a vector correlation metric.
Figure 2. Cross-mapping causal network for mosquito abundance with (a) a scalar correlation metric; (b) a vector correlation metric.
Environsciproc 26 00092 g002
Table 1. Optimum cross mapping time-lag T (in hours) between paramecium and didinium for an initial Cerophyl concentration CC = 0.375 g/L (values in parentheses).
Table 1. Optimum cross mapping time-lag T (in hours) between paramecium and didinium for an initial Cerophyl concentration CC = 0.375 g/L (values in parentheses).
Scalar CorrelationVector Correlation
E par did did par par did did par
3−36 h (0.80)−60 h (0.77)−36 h (0.79)36 h (0.71)
4−48 h (0.83)12 h (0.80)−36 h (0.82)36 h (0.76)
5−60 h (0.85)12 h (0.86)−24 h (0.85)36 h (0.79)
6−60 h (0.85)0 h (0.88)−24 h (0.86)48 h (0.83)
7−60 h (0.86)12 h (0.88)−36 h (0.88)48 h (0.85)
8−48 h (0.86)−24 h (0.88)−24 h (0.88)24 h (0.87)
9−60 h (0.88)−24 h (0.88)−24 h (0.89)48 h (0.89)
10−60 h (0.88)12 h (0.87)−24 h (0.90)36 h (0.90)
Table 2. Optimum cross mapping time-lag T (in hours) between paramecium and didinium for an initial Cerophyl concentration CC = 0.5 g/L (values in parentheses).
Table 2. Optimum cross mapping time-lag T (in hours) between paramecium and didinium for an initial Cerophyl concentration CC = 0.5 g/L (values in parentheses).
Scalar CorrelationVector Correlation
E par did did par par did did par
3−12 h (0.88)−48 h (0.86)0 h (0.86)−24 h (0.85)
4−12 h (0.89)−36 h (0.88)12 h (0.87)−24 h (0.87)
5−24 h (0.90)−36 h (0.91)12 h (0.88)−24 h (0.89)
6−24 h (0.90)−36 h (0.91)12 h (0.89)−12 h (0.90)
7−24 h (0.92)−36 h (0.92)24 h (0.90)−24 h (0.91)
8−36 h (0.91)−36 h (0.94)12 h (0.90)−24 h (0.92)
9−36 h (0.92)−48 h (0.94)24 h (0.91)−24 h (0.93)
10−60 h (0.92)−48 h (0.94)24 h (0.91)−24 h (0.93)
Table 3. Correlation matrix of mosquito abundance between sampling stations.
Table 3. Correlation matrix of mosquito abundance between sampling stations.
ADETRANTTRCHATRKLCTRSINTRVRATR
AATTR0.790.830.600.360.120.40
ADETR-0.860.420.540.100.20
ANTTR--0.390.410.380.42
CHATR---0.490.340.60
KLCTR----0.390.56
SINTR-----0.71
Table 4. Optimum cross-mapping time-lag (in weeks) between culex abundance and environmental variables in the regional district of Thessaloniki (values in parentheses).
Table 4. Optimum cross-mapping time-lag (in weeks) between culex abundance and environmental variables in the regional district of Thessaloniki (values in parentheses).
Scalar CorrelationVector Correlation
var culex var var culex culex var var culex
NDVI−4 w (−0.09)2 w (0.55)−4 w (0.23)2 w (0.23)
NDMI−4 w (0.08)2 w (0.16)−4 w (0.10)4 w (−0.07)
NDWI0 w (−0.13)2 w (0.61)−4 w (0.27)4 w (0.28)
LST−2 w (0.17)2 w (0.53)−4 w (0.20)0 w (0.15)
RAIN0 w (0.62)2 w (0.69)−4 w (0.66)0 w (0.45)
WIND2 w (0.38)2 w (0.78)−2 w (0.40)4 w (0.55)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kollas, N.; Gewehr, S.; Mourelatos, S.; Kioutsioukis, I. An Improved Indicator for Causal Interaction in Non-Linear Systems. Environ. Sci. Proc. 2023, 26, 92. https://doi.org/10.3390/environsciproc2023026092

AMA Style

Kollas N, Gewehr S, Mourelatos S, Kioutsioukis I. An Improved Indicator for Causal Interaction in Non-Linear Systems. Environmental Sciences Proceedings. 2023; 26(1):92. https://doi.org/10.3390/environsciproc2023026092

Chicago/Turabian Style

Kollas, Nikolaos, Sandra Gewehr, Spiros Mourelatos, and Ioannis Kioutsioukis. 2023. "An Improved Indicator for Causal Interaction in Non-Linear Systems" Environmental Sciences Proceedings 26, no. 1: 92. https://doi.org/10.3390/environsciproc2023026092

APA Style

Kollas, N., Gewehr, S., Mourelatos, S., & Kioutsioukis, I. (2023). An Improved Indicator for Causal Interaction in Non-Linear Systems. Environmental Sciences Proceedings, 26(1), 92. https://doi.org/10.3390/environsciproc2023026092

Article Metrics

Back to TopTop