*Proceeding Paper* **Using Forecasting Methods on Crime Data: The SKALA Approach of the State Office for Criminal Investigation of North Rhine-Westphalia †**

**Kai Seidensticker \* and Katharina Schwarz**

State Office for Criminal Investigation of North Rhine-Westphalia, 40221 Düsseldorf, Germany; katharina01.schwarz@polizei.nrw.de


**Abstract:** In this article, we introduce the topic of crime forecasting performed in North Rhine-Westphalia, Germany. We give a brief overview of three forecasting methods used in theory and practice: predictive policing, risk terrain modeling, and time series analysis. As a result, spatiotemporally-based statistical techniques offered high potential to optimize operational and strategic planning for policing.

**Keywords:** crime forecasting; predictive policing; risk terrain modeling; time series analysis

2022018039

**Citation:** Seidensticker, K.; Schwarz, K. Using Forecasting Methods on Crime Data: The SKALA Approach of the State Office for Criminal Investigation of North Rhine-Westphalia. *Eng. Proc.* **2022**, *18*, 39. https://doi.org/10.3390/engproc

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 11 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Crime is a complex social phenomenon that has a lasting impact on the population's sense of security and can burden society. The complexity of this phenomenon makes it difficult to predict. Every day, new crimes of different types occur because of different decisions and the routine activities of perpetrators, victims, and capable guardians, resulting in many new trends in crime. Investigating these crime characteristics, crime patterns, and criminal behavior is a major objective of criminology. In the last decades, many new options in criminal investigation have arisen as a result of rapid technological development and newly designed technologies. Because of the difficulty in dealing with the massive amount of data produced every day, data mining techniques have also become increasingly crucial in terms of criminal investigation. The digitalization of society also led to the development of different policing strategies and philosophies, which are often based on extensive data systems (Big Data). In addition, the importance of system-related decisionmaking processes in policing increases [1]. Nowadays, in the investigation, prosecution, and prevention of crime, crime analysis and the use of big data play a significant role.

Attempting to anticipate the future is not a new way of thinking in the context of policing, but rather a standard approach. For example, prevention efforts—which are a foremost task of the police—aim to prevent potential future threats in a pre-emptive manner. Two insights are essential for the idea of crime forecasting: First, human actions can vary due to different values and preferences and also due to the influence of spatial/temporal changes [2]; and second, crimes do not occur uniformly or randomly in space and time [3] (p. 5). What is new is the systematic use of technical solutions, big data, and the gaining complexity of data analytics in policing. With the police trying to explore what they might already know based on their existing data, police open the door to using big data and linking data to intelligence. In this spirit of optimism, a wide variety of data analysis methods have been implemented in German police forces since 2014. These include methods such as predictive policing. The State Office for Criminal Investigation of North Rhine-Westphalia (LKA NRW) tested and implemented its own predictive policing model as part of a holistic

crime analysis and forecasting approach, called SKALA (System for Crime Analysis and Anticipation). The SKALA approach intends to understand the main use of crime analysis and forecast algorithms, such as predictive policing methods, risk terrain modeling, and time series analysis, to investigate crime patterns at different spatial and temporal scales to support crime prevention. Currently, the focus is on residential burglaries, commercial burglaries, and motor vehicle offenses. This is because numerous theoretical and empirical scientific works are already available both on the phenomenology and explanation of these particular offenses and on the distribution of crime phenomena in space and time. These formed an essential cornerstone for the conception and evaluation of the models and of the forecast development.

This paper gives an overview of the different methods used to forecast crimes in the context of SKALA. The crime-forecasting project in North Rhine-Westphalia is presented in the first part. The second part presents the three forecasting procedures, crime forecasting, risk terrain modeling, and time series analysis. We present the (classical) forecasting approach of the LKA NRW, which has already been in operational use since 2015. Afterward, we discuss the spatial risk assessment of crime for NRW, including risk terrain modeling. The temporal component is then considered, taking into account the time series decomposition and ARIMA modeling of crime data. Finally, the conclusions on the three procedures are drawn, and the potential and limitations of each are discussed.

#### **2. Crime Forecasting in North Rhine-Westphalia**

In Germany, crime rates of residential burglaries increased rapidly between 2009 (113,800 cases) and 2015 (167,136 cases) [4] (p. 70). This increase and the high level of public and political attention to the issue of a residential burglary led to the need for innovative approaches to tackling this crime phenomenon. Because predictive policing ought to be a successful method for reducing crime in countries, this approach was tested in Germany. Predictive policing is still an umbrella term that describes methodological processes utilized by law enforcement agencies to predict crime and aid in planning operational responses [5] (p. 47). Available definitions range from calculating the probability of future crimes to specific methods to investigate crimes already omitted [6] (pp. 8–9). Pearsall defines predictive policing as " ... taking data from disparate sources, analyzing them, and then using results to anticipate, prevent and respond more effectively to future crime" [7] (p. 16). In Germany, predictive policing is commonly defined as a computer-assisted method for spatially-based probability calculations of crime [8]. This method aims to identify risk areas so that the police can plan appropriate policing measures and optimally allocate their limited forces.

In North Rhine-Westphalia, predictive policing was tested between 2015 and 2018 within the project SKALA to study the spatio-temporal dimension of crime phenomena. In detail, the objectives of the project were (1) to examine the possibilities and limits of predicting crime hot spots, and (2) to examine the efficiency and effectiveness of police interventions based on them [9]. For this purpose, an attempt was made to calculate the crime risks using spatial data for each residential district of the participating cities. The method used was initially simple decision tree models and later random forest models. After successfully completing the project in 2018, SKALA was expanded into an independent research area within the Criminological Research Department of the LKA NRW, and the crime forecasts were extended to the entire state of NRW in 2021. Several tests of other forecasting methods followed these initial experiences with the predictive policing method to support police work in NRW by SKALA. These methods include a variant of the classic predictive policing approach for operational planning, risk terrain modeling, and trend analysis for long-term predictions. Although there is a difference between prediction and forecasting, for the purpose of this study, we use them inter-changeably. Perry et al. [6] (p. xiii) argued that the most common distinction is that forecasting is objective, scientific, and reproducible, whereas prediction is subjective, mostly intuitive, and non-reproducible.

#### **3. Material and Methods**

#### *3.1. Crime Forecasting*

The classical crime forecasting approach in the SKALA project focuses on using predictive policing as a method to compute spatial crime risks. In practice, predictive policing involves several steps and processes that build on each other, starting with analyzing the specific offense and the collection and processing of data required for crime prediction. The illustrated methodical process (Figure 1) allows an insight into the individual steps for implementing predictive policing from the police department's point of view, as it also took place in SKALA. Deviations are conceivable, but at least similar designs are likely to be present whenever machine learning technologies are used.).

**Figure 1.** The predictive policing process. Adapted with permission from Ref. [10]. 2017, Bode/Stoffel/Keim.

All process steps depend on the available data, the data acquisition, and the preparation of the data for further processing. For predictive policing, data quality is therefore of crucial importance. In this context, data acquisition problems are conceivable, for example, when recording a residential burglary's suspected and not precisely determinable time. Moreover, data uncertainties can arise on the side of the police forces if crimes are legally misjudged or reported late by the victims, which cannot be ruled out for the crime of residential burglary, for example. In addition, the fundamental problem is that crime phenomena cannot usually be fully described with the available data, especially when unobservable or unquantifiable effects are important. Thus, when making crime forecasts, the question must be asked whether the anticipated crime event occurred independently of the criminological and mathematical models used.

A deliberate decision was made to avoid an exclusively data-driven approach in order to produce crime forecasts, and a theory-driven approach was adopted. Specifically, due to the increasing digital availability of data and the further development in the field of big data processing, data mining approaches—for the prediction of feature characteristics or data points—are also increasingly represented in the methodological discussion on predictive policing [11] (p. 8). Data mining covers (partially) automated methods for analyzing data sets, which can find statistical correlations and provable patterns [12]. Data mining approaches go beyond assumption-driven multivariate data analysis. They can also find non-linear or only partially existing correlations that might have remained undetected during the theoretical derivation of hypotheses from tentatively proven theories. An exclusively data-driven approach must be viewed critically, especially in the context of predictive policing, since plausible assumptions about possible causal relationships already exist for the vast majority of criminal phenomena, which cannot simply be ignored. Likewise, methods of data-driven approaches are subject to the assumption that the input data describe the phenomenon to be analyzed to the degree that automatic pattern and group identification do not occur based on spurious correlations. Especially with regard

to data protection, principles such as data economy, and the difficulty of analyzing highly complex phenomena such as crime, it becomes clear that purely data-driven methods should not be used alone in the area of predictive policing. This is based on the fact that crime cannot always be clearly objectified and thus represented within a data set [13].

The theory-driven approach ensured that the used model was based on robust scientific theories and research findings. This distinguished the approach from many other predictive policing methods, which are often based only on the near-repeat approach, which refers to the empirically-proven observation that crime recurs in the same or adjacent locations within a given period [14] (p. 414) [15] (pp. 368 ff.). In practice, the probabilities of residential burglaries, burglaries from commercial properties, and motor vehicle offenses were calculated based on spatial data for residential quarters in selected police districts of NRW.

A three-stage procedure was chosen for this purpose. First, spatio-temporal clusters of crime data were calculated based on the near-repeat approach, which refers to a period of 14 days and a radius of 500 m for the residential burglary. The collected crime data mainly included the time and location of the offense, the modus operandi, and the proceeds of the crimes (property stolen). The calculated spatio-temporal clusters were transferred to the residential district level. The residential districts were areas characterized by a number of 400 households per quarter (see Figure 2). For NRW, this resulted in 18,875 residential districts. A uniform size at the household level was chosen because grid cells can lead to over- or underestimation of offenses.

**Figure 2.** Example of a residential district. Reprinted with permission from Ref. [9]. 2018, LKA NRW.

Second, random forest models got a subset of influencing variables out of the socioeconomic data. These data were chosen because of their statistical impact on the occurrence of crime. They included information on the residential location, such as population structure, building construction, income, infrastructure connections, and mobility indicators. In this step, random forest models were used to avoid overfitting [16,17]. The advantage of this application is the relatively easy operability and the possibility to perform initial data analysis tasks in a short time, including model and forecast generation. The decision tree models had a comparatively good performance. In addition, they are transparent and comprehensible, so they were favored within this framework. Third, the modeling and forecasting were performed based on the previous results and selected data within a linear regression model.

The areas for which the highest crime probabilities were calculated compared with other areas of the entire forecast area are defined as forecast areas. Their share was limited to about 1.5 percent of the total number of quarters in each police district for practical reasons, such as planning policy measures and allocating and distributing limited police resources.

The calculation of crime probabilities was based on the total area of every single police authority. This procedure ensured that the individual risk of residential burglary for each

residential quarter could be determined in the forecast week, as many other predictive policing methods only refer to sub-areas of cities or regions.

The methodological implementation of the model and forecast generation described above focuses primarily on long-term statistical considerations. In the course of SKALA, the pilot authorities repeatedly criticized that future crime series, such as a particular modus operandi, was not reflected in the forecasts. Analyses based on the available series of crimes from the police authorities showed that differentiation from other series or individual crimes was not possible based on the available data material. However, the homogeneity of the characteristics within the respective series was always high. Nevertheless, as a supplement to the statistical and decision tree-based approach of the model and prognosis generation, a second forecast model was developed, which focused on possible series of crimes. This so-called analytical model was independent of the more comprehensive statistical model and supplemented it, depending on the data situation.

#### *3.2. Risk Terrain Modelling*

The Risk Terrain Modelling (RTM) approach is a method used to compute the future risk of crime in specific places. Unlike other methods such as hotspot mapping, the calculations with RTM are not, or rather not only, based on crime data but also include geographical characteristics of places [6] (p. 50). RTM is a classification approach that characterizes an area's risk for crime based on its environmental characteristics [6] (p. 51). Therefore, new hotspots are predicted based on their similarities to other 'known' hotspots. The result might be a map of places that did not see any crime but should be considered risky places because of their similar spatial risks compared to crime hotspots. The spatial risk profiles generated are derived from the respective geographic characteristics of the space, which can be identified as risk factors for crime. Studies suggest that the prediction accuracy of the RTM models is better than that of classic hotspot models [18]. However, the correlations found in such models do not necessarily imply a meaningful and substantively justifiable causality. They are usually the results of the "unsupervised learning" method, intended to identify previously unknown correlations or patterns without categorizing the data [19].

In SKALA, the RTM instrument should primarily be made available to police forces with a deficient number of cases in relation to their area (rural regions), as this makes it difficult to use the previously described crime forecasting model, which particularly results from the fact that the models are primarily based on patterns of crime events that are close in time and space. Therefore, alternative model approaches were sought that are more strongly based on the influence and distribution of socio-economic, building-specific, and infrastructural attributes and their distribution in space.

In the classical RTM, risk factors are determined from a large number of space-specific attributes using statistical methods, which significantly influence the risk burden (number of events per unit of space and time) for a selected offense. The mathematical model underlying the relationship between the significant characteristics and the risk burden enables a forecast calculation of the expected risk of crime. Following this, a risk map can be generated for the entire space of a selected authority. The temporal validity of such a map is strictly linked to changes in the area-specific characteristics and the number of cases, which in rural regions are subject to relatively minor changes on short time scales.

The existing socio-economic and other freely available data, such as point data of bus stops, banks, and supermarkets, were processed, calculated, and aggregated for a uniform grid of 100 × 100 m to test the RTM approach. Figure 3 shows example density maps of different variables for the same city. In the classic RTM approach, the individual input variables are processed in binary form for each raster cell. This means that for each variable and raster cell, it is determined whether the variable is present in the raster cell or not. In the course of development, it quickly became apparent that this approach was insufficient to provide satisfactory results. Therefore, a modified approach was used to convert the existing variables to density-based units, allowing better differentiation of the individual variables' spatial distribution. Another deviation from the classical RTM is the inclusion of variables calculated from the characteristics of the historical processes, such as the average distance of residential burglary to the three nearest residential burglaries (relative to the center of each grid cell).

**Figure 3.** Input variables using the example of Düsseldorf with the (**a**) variability of parking lots, (**b**) street light density, (**c**) car density, and (**d**) minimum distance to train stops taken from Open Street Map (OSM).

The result for a risk map depends significantly on the assumed relationship between the target variable and the input variables. The different model algorithms realize the mathematical formulation of this problem. The classical RTM formulation is based on Poisson regression. Figure 4 shows an example of a risk map based on Poisson regression. Alternatively, the risk determination can be conceived as a classification problem using common methods such as RF (random forest) or SVM (support vector machine). The main difference between the two formulations in a risk map is the interpretation of the risk load. While a Poisson regression gives a non-scalable risk number for a grid cell that is only valid for the respective authority (for example, a grid cell with a risk number of 5.7 in one city would have a different meaning than a grid cell with the same risk number in another city), the classification gives a probability for the occurrence of an event for each grid cell similar to the previous forecast model used in SKALA (see Figure 5).

**Figure 4.** RTM (Poisson model) for 100 × 100 m grid cells in risk classes over six main percentiles for the police authority Düsseldorf.

**Figure 5.** RTM (classification model) of the police authority Mettmann.

As quality criteria for the comparison of the different models, the quality measure of deviance was chosen for the Poisson regression. Deviance indicates how well the significant variables and model parameters determined in the model can describe or reproduce the original statistical distribution of the target variable. For the RF and SVM classification models, the quality measures AUC and PrecRec were chosen. For the AUC value, the closer the AUC value is to 1, the better the model prediction. An AUC value of 0.5 would mean that the model does not predict better than chance. A maximum AUC value of 1 means that all areas were predicted exactly correctly. PrecRec stands for the sum of the quality measures Precision and Recall.

#### *3.3. Time Series Analysis*

Time series analysis offers the possibility to detect patterns in the temporal change of crime data, which play a major role, especially in crime prediction and prevention. Methods such as time series decomposition and autoregressive integrated moving average (ARIMA) modeling can be used for this purpose. A decomposed crime time series consists of the original time series and the three decomposed parts with the estimated trend component, seasonal component, and remainder component. The calculation of trends and seasonality are used for the long-term prediction of crime events to facilitate strategic planning in police agencies [20,21]. Several studies, e.g., Malik et al. [22] and Borges et al. [23], use STL as an initial time series analysis for later modeling a predictive policing approach. Furthermore, this approach allows statements about recurring and changing temporal components of crime without considering socioeconomic or demographical information.

ARIMA modeling is a powerful tool for time series analysis and short-term forecasting. Since ARIMA was successfully applied for predictions in economics, marketing, industry production, and social problems, it was also applied for forecasting property crime [20]. In this study, the ARIMA model was used to predict one week in advance from the observations on property crime, but only for the whole city and not for districts. ARIMA has the potential to enable crime prediction by only using offense data, as well as analyzing temporal crime patterns on different spatial scales by spatial data aggregation [24]. In crime research and strategic planning in police work, information on possible future events in crime data enables short- and long-term orientation and the recognition of changes and recurring patterns.

For time series analysis, a raw time-series signal was constructed out of the criminal records for each analyzed offense, e.g., residential burglary, over a period of five years from 1 January 2017 to 31 December 2021. The number of cases per offense per time unit was aggregated for a sub-region. Thus, only the time and location information of the offense were used. Weekly aggregated offense data on residential district and city level were considered for the time series analysis. Analyses based on daily data were considered problematic due to the small offense numbers.

In the first step, a time series decomposition was performed on daily aggregated offense data. The Seasonal Time Series Decomposition based on Loess (STL) is a filtering method for separating three components from a seasonal time series, namely trend (T), season (S), and the remainder (R) [25]. The first component, the trend at the time, indicates the series' long-term increase or decrease. The second component, the seasonality, indicates whether the time series is modified by seasonal influences, referring to one cycle per year in this case. The third component, the remainder, represents any residual noise in the data. Since STL is an additive model, the time series (Y) at the time (t) can be described as follows [23]:

$$Y\_T = T\_l + \mathcal{S}\_l + \mathcal{R}\_l \tag{1}$$

The output of this process is the separation of the original time series of criminal record entries into two distinct derivatives: the trend and the season.

Daily aggregated offense data on residential districts were used for time series decomposition.

Time series analysis can also be utilized in the process of prediction. In a second step, the autoregressive integrated moving average (ARIMA) model introduced by Box and Jenkins [26] was used to forecast the number of crime events for periods of one week and one month. This model can provide accurate forecasts over relatively short periods [20].

The autoregressive (AR) models attribute current observations only to past observations. In moving average (MA) processes, however, observations are attributed not only to the observations but also to the non-observed error of the past periods, which also influences future observations. Thus, ARIMA models take advantage not only of the observed past observations but also of information that is not described directly in the time series but is defined as the error of the prediction. ARIMA models combine AR models and MA processes as follows:

$$y\_t = \mathfrak{a} + \mathcal{D}\_{1y\_{dt-1}} + \mathcal{D}\_{py\_{dt-p}} + \theta\_{1\varepsilon\_{t-1}} + \theta\_{q\varepsilon\_{t-1}} + \varepsilon\_{\tau} \tag{2}$$

Here, *yd* is the *d-*fold differentiated observations that can follow an AR process with *p* orders and an MA process with *q* orders. Therefore, the task is to specify the parameters *d*, i.e., the order of the necessary integration or differentiation, as well as *p* and *q* in the ARIMA (*p*, *d*, and *q*) model [20,27].

The ARIMA model was applied for weekly aggregated offense data at the residential district and city levels.

#### **4. Results and Discussion**

#### *4.1. Crime Forecasting*

By testing predictive policing as a theory-based approach, the primary purpose of this study was to support strategic and target-oriented police work that identified potential hotspots at an early stage based on known crime-relevant factors. The aim was to achieve a resource-efficient deployment of police forces and, ideally, reduce the frequency of crime. The results show that it is often possible to calculate higher crime probabilities in the chosen spatial reference for residential burglaries, commercial burglaries, and motor vehicle offenses compared with the basic statistical probabilities. Depending on the modeling, the probabilities of residential burglaries in selected residential districts were, on average, about ten times higher than those of a random area selection. Nevertheless, it was found that the compilation of weekly crime forecasts in more rural districts was ineffective due to a relatively low number of cases. However, analyses focusing on the structural differences between urban and rural regions have found approaches that make it possible to determine the probability of burglary in more rural regions so that it was possible to compute crime forecasts for the entire state.

In addition, the influence of the socio-structural data on residential burglaries was also examined. In summary, the results show that the influencing strengths of the variables varied greatly depending on the season and district. Accordingly, the results are not automatically transferable to other districts or periods. Furthermore, it was shown in this context that the model quality crucially depended on the quality and temporal availability of the data.

#### *4.2. Risk Terrain Modelling*

In both cases, modifying the classical RTM approach to classification and Poisson models led to significantly better model performance. For example, a Poisson model showed a model improvement of about 35 to 40 percent. Deviance with dynamic variables (calculated based on historical police data) was significantly lower than deviance with purely static variables like sociostructural data. Conversely, this finding means that a RTM map created with only static variables is less reliable in its informative value than a RTM map with dynamic and static variables. The quality of the dynamic variables, on the other hand, decreases with decreasing density of the number of cases, i.e., with progression from urban to rural regions.

In addition, the analyses showed that the weighting of the variables among each other is an important aspect of variable selection. The rough distribution of weights between two different model approaches is largely similar. For example, the dynamic variables are often in the upper weight range. The analyses have also shown that both the weighting of the variables among each other and the selection of the variables themselves can be subject to strong local variations. Depending on the selected period, both model approaches enable a statement to be made for the same temporal scope in each case. For example, a model that has been trained with historical data from the last six months enables a forecast for the following six months. Due to the very slowly changing socio-structural data and strongly fluctuating local dynamic variables, there is a compelling lower temporal horizon limit for which a RTM map can be generated reliably. However, no universally valid lower boundary can be defined due to the strong local fluctuations.

In summary, it can be shown that a spatial risk assessment is also possible for more rural police departments in NRW. Due to the high computation times of the model for the large spaces and the desire for a fine-scale grid, a seasonal risk assessment is necessary to support strategical decision-making processes in the police authorities.

#### *4.3. Time Series Analysis*

The results of the time series decomposition of the weekly aggregated offense data on the residential district level for NRW are presented in Figure 6, exemplarily for residential burglary. The estimated trend component revealed a decrease in residential district level residential burglaries from 2017 until December 2021. Regarding the seasonal component, an increase in the winter terms was detected, and a decrease in the summer terms throughout the entire analysis period was also detected. These temporal and spatial patterns can be applied to metropolitan cities. The procedure cannot be transferred to more rural regions due to the small number of cases.

**Figure 6.** Time series decomposition of residential burglary in one exemplary residential district in Dusseldorf from 2017 to 2021 with (**a**) raw time series, (**b**) trend, (**c**) season, and (**d**) remainder.

The results can likewise be transferred to all residential districts. In general, areas close to the city center, which are densely populated, show similar temporal patterns, as shown in Figure 6. The procedure cannot be transferred to peri-urban areas, which tend to be characterized by a lower population density and thus have lower offense numbers. In these areas, approaches such as integrating the spatio-temporal clusters or residential district aggregation can help make valid calculations.

The experimental results produced for the cities and residential districts in NRW using an ARIMA model of the weekly aggregated offense time series support the value of crime time series analysis. The results for residential burglary are shown exemplarily for a selected residential district in Dusseldorf in Figure 7. The fitting of the ARIMA model shows that both seasonal components and the initially increasing and subsequently decreasing trend could be captured. The higher case numbers in the winter months could also be mapped. The prediction of offense numbers for the following 16 weeks and one year show that both the decreasing trend and the seasonal component are included. The offense numbers recorded by the police fall within the confidence intervals of the predicted values. Thus, this method is promising, for example, for the prediction of residential burglaries and long-term planning and strategic orientation of the police authorities. The ARIMA model is a way to look at both long-term changes and seasonal components of crime. The use of the model is advantageous for crime data, as studies have shown that ARIMA works better than complex Artificial New Networks, for example, in the case of a small number of datasets [28]. The fitted ARIMA model can forecast future points in a series [20].

**Figure 7.** Time series of residential burglary events in a residential district in Dusseldorf with the ARIMA-modelled prediction for the first 16 weeks (dashed dark-green line) and 52 weeks (dashed light-green line) in 2023.

In general, time series analysis is a helpful approach to analyzing seasonal and longterm patterns of offense numbers. The conclusions can be drawn from these results for the following weeks or months as to what extent offense numbers are likely to increase, decrease, or remain constant. Therefore, long-term strategic decision-making processes in the police authorities can be supported.

#### **5. Potential and Limitations**

Spatio-temporally-based statistical techniques offer police agencies the potential to optimize their operational and strategic planning based on predictions and to take necessary action before a crime occurs. Targeted control of the forces or the alignment of operational focal points in specific crime areas is possible. Changes in crime can be detected earlier based on such calculations.

In evaluating crime forecasting methods, data quality is crucial concerning data uncertainty. Data uncertainty refers to the problem that it is usually unknown as to what extent errors are contained in the collected and utilized data. In this context, data collection problems such as measurement uncertainties are conceivable, e.g., when re-cording the suspected time of a burglary. For data collection in the context of police forces, a source of uncertainty is the fact that criminal offenses are either legally misjudged or are reported late by the victims, which is not uncommon for the criminal charges of burglaries [8]. In addition, the data quality of the data collected in the field of policing is often low, as periods and localities are not (or cannot) always be precisely defined. Similarly, it is challenging to predict offenses that should be prevented by increased police presence. There is often a lack of knowledge about whether police presence has prevented offenses or whether the predictive models are outputting inaccurate risk estimates.

Overall, crime forecasting methods offer a wide range of possible applications that enable criminal expertise to be enriched by scientific findings.

**Author Contributions:** Conceptualization, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); methodology, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); software, K.S. (Katharina Schwarz); validation, K.S. (Katharina Schwarz); formal analysis, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); investigation, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); resources, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); data curation, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); writing—original draft preparation, K.S. (Katharina Schwarz); writing—review and editing, K.S. (Kai Seidensticker); visualization, K.S. (Kai Seidensticker) and K.S. (Katharina Schwarz); supervision, K.S. (Kai Seidensticker); project administration, K.S. (Kai Seidensticker); funding acquisition, K.S. (Kai Seidensticker). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Proceeding Paper* **Reconstructed Phase Spaces and LSTM Neural Network Ensemble Predictions †**

**Sebastian Raubitzek \* and Thomas Neubauer**

Information and Software Engineering Group, Institute of Information Systems Engineering, Faculty of Informatics, TU Wien, Favoritenstrasse 9-11/194, 1040 Vienna, Austria; thomas.neubauer@tuwien.ac.at

**\*** Correspondence: sebastian.raubitzek@tuwien.ac.at

† Presented at the 8th International Conference on Time Series and Forecasting, Gran Canaria, Spain, 27–30 June 2022.

**Abstract:** We present a novel approach that combines the concept of reconstructed phase spaces with neural network time-series predictions. The presented methodology aims to reduce the parametrization problem of neural networks and improve autoregressive neural network time-series predictions. First, the idea is to interpolate a dataset based on its reconstructed phase space properties and then filter an ensemble prediction based on its phase space properties. The corresponding ensemble predictions are made using randomly parameterized LSTM (Long Short-Term Memory) neural networks. These neural networks then produce a multitude of auto-regressive predictions, which are then filtered to achieve a smooth reconstructed phase space trajectory. Thus, we can circumvent the problem of parameterizing the neural network for each dataset individually. Here, the interpolation and the ensemble prediction aim to produce a smooth trajectory in a reconstructed phase space. The best results are compared to a single hidden layer LSTM neural network and benchmark results from the literature. The results show that the baseline predictions are outperformed for all three discussed datasets, and one of the benchmark results from the literature is bested by the presented approach.

**Keywords:** phase space reconstruction; LSTM; neural networks; ensemble prediction; stochastic interpolation

#### **1. Introduction**

The rise of artificial intelligence, i.e., machine learning and deep learning, motivates many researchers to perform predictions and analyses based on historical data using these methods, rather than employing mechanistic expert models. The reason for making predictions in the first place is to answer important questions, e.g., future population estimates, predicting epileptic seizures, and estimating future stock market prices. The outcomes of these predictions are encouraging, e.g., solar radiation can be predicted using machine learning methods [1].

One reason for machine learning's poor performance is an overall lack of data. A means of overcoming the lack of data for time-series analysis approaches is to employ an interpolation technique to increase the amount of data. Here, one can choose from many different techniques, such as polynomial, fractal [2], or stochastic interpolation methods [3]. In this article, we use an improved version of the Brownian multi-point bridges developed by [3], which is discussed and validated in detail in ref. [4]. For simplicity, we refer to this method as PhaSpaSto interpolation, which is an abbreviation for **pha**se **spa**ce trajectory smoothing **sto**chastic interpolation.

Next, when it comes to the autoregressive prediction of time-series data, we want to consider the properties of the reconstructed phase space trajectories of a given set of time-series data, i.e., we want the reconstructed phase space trajectory of our prediction to be as smooth as possible.

**Citation:** Raubitzek, S.; Neubauer, T. Reconstructed Phase Spaces and LSTM Neural Network Ensemble Predictions. *Eng. Proc.* **2022**, *18*, 40. https://doi.org/10.3390/ engproc2022018040

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 25 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Thus, we want to find out to what extent the idea of the reconstructed phase spaces of time-series data can be used to improve neural network time-series predictions. For this reason, we present the following scheme, depicted in Figure 1. We first interpolate a given time series using the discussed PhaSpaSto interpolation (Section 4); next, we employ the randomly parameterized neural networks developed in ref. [5] (Section 6), thus generating a multitude of different autoregressive predictions for each set of time-series data. Finally, we filter these predictions based on the smoothness of their reconstructed phase space trajectories, i.e., we want to keep only the smoothest phase space trajectories (Section 6.2).

**Figure 1.** Schematic depiction of the filtering process. The whole pipeline is applied, first, to the original non-interpolated data, and second, to the stoch.-interpolated data set.

This article is structured as follows. The current section, i.e., Section 1, provides a brief introduction and explains the developed scheme. Section 2 lists related work and briefly describes the connections to this article. Section 3 discusses the idea of reconstructed phase spaces and introduces the used terminology and notations. Next, Section 4 introduces the employed stochastic interpolation method, whereas Section 6 describes the employed neural network approach for autoregressive time-series prediction and the prediction filter. All datasets are described and plotted with their interpolation and corresponding phase space reconstructions in Section 5. Section 7 gives all prediction results. Section 8 concludes this article and gives ideas for future research.

#### **2. Related Work**

The presented research is mainly motivated by the findings of [2,4–6]. This section will briefly describe the mentioned publications, list them chronologically, i.e., by their publication date, and put them into context.


interpolation. The basic idea is to filter/improve a multitude of stochastic interpolations of the same time-series data using a genetic algorithm and the variance of second derivatives along a reconstructed phase space trajectory to generate smooth phase space embeddings.

The interpolation technique developed in this paper is briefly described in Section 4 and used to improve the presented predictions.

#### **3. Phase Space Reconstruction**

First, we need to introduce the concept of reconstructed phase spaces [8,9].

We estimate a phase space embedding for all data under study. To find a suitable phase space embedding, one has to determine two parameters, the embedding dimension *dE* and the time delay *τ*.

To estimate the time delay *τ*, i.e., the delay between two consecutive time steps, we use the method based on the average information between two signals [10].

To estimate the embedding dimension *dE*, we use the algorithm of false nearest neighbors [11].

The phase space embedding for a given signal [*x*1, *x*2,..., *xn*], thus, is

$$\vec{y}(i) = \begin{bmatrix} \mathbf{x}\_{i\prime}\mathbf{x}\_{i+\tau\prime} \dots \mathbf{x}\_{i+(d\_E-1)\*\tau} \end{bmatrix} \tag{1}$$

and a corresponding three-dimensional phase space embedding, thus, is

$$\vec{y}(i) = [\mathbf{x}\_{i\prime}\mathbf{x}\_{i+\tau\prime}\mathbf{x}\_{i+2\tau}].\tag{2}$$

#### **4. PhaSpaSto Interpolation**

The used interpolation technique consists of two parts: first, the multi-point fractional Brownian Bridges from [3], and second, a corresponding genetic algorithm choosing the best parts of the so-created fractional Brownian bridges.

#### *4.1. Multi-Point Fractional Brownian Bridges*

The employed genetic algorithm is fueled by a population of stochastically interpolated time-series data—in our case, multi-point fractional Brownian bridges. To generate these stochastically interpolated time-series data, multi-point fractional Brownian bridges [3] were used. Thus, we briefly summarize this approach.

We consider a Gaussian random process *X*(*t*), whose covariance is defined as *C*(*t*, *t* ) = *X*(*t*)*X*(*t* ). In the following, we focus on fractional Browian motion where the covariance is given according to *X*(*t*)*X*(*t* ) <sup>=</sup> <sup>1</sup> 2 *t* <sup>2</sup>*<sup>H</sup>* + *t* 2*<sup>H</sup>* − |*<sup>t</sup>* <sup>−</sup> *<sup>t</sup>* | <sup>2</sup>*H* , where *H* is the Hurst exponent. To elucidate our interpolation scheme, we first define a so-called fractional Brownian bridge [12,13], which is a construction of fBm starting from 0 at *t* = 0 and ending at *X*<sup>1</sup> at *t* = *t*1, i.e.,

$$X^{B}(t) = X(t) - (X(t\_1) - X\_1) \frac{\langle X(t)X(t\_1) \rangle}{\langle X(t\_1)^2 \rangle} \,. \tag{3}$$

This construction ensures that *XB*(*t*1) = *X*1, which is also depicted in Figure 1. This single bridge can now be generalized to an arbitrary number of (non-equidistant) prescribed points *Xi* at *ti* by virtue of a multi-point fractional Brownian bridge [3]

$$X^{\mathcal{B}}(t) = X(t) - (X(t\_l) - X\_l)\sigma\_{lj}^{-1} \left< X(t)X(t\_j) \right> ,\tag{4}$$

where *σij* = *X*(*ti*)*X*(*tj*) denotes the covariance matrix. Furthermore, we imply summation over identical indices. The latter linear operation on the Gaussian random process *X*(*t*) ensures that the bridge takes on exactly the values *Xk* at *tk*, which can be seen from *<sup>X</sup>B*(*tk*) = *<sup>X</sup>*(*tk*) <sup>−</sup> (*X*(*ti*) <sup>−</sup> *Xi*)*σ*−<sup>1</sup> *ij σkj* = *X*(*tk*) − (*X*(*ti*) − *Xi*)*δik* = *Xk*, where *δik* denotes

the Kronecker delta. Hence, this method allows for the reconstruction of a sparse signal, where small-scale correlations are determined by the choice of the Hurst exponent *H*.

#### *4.2. Genetic Algorithm*

We build a simple genetic algorithm to find the best possible interpolation given the data's phase space reconstruction using Taken's theorem. We want our reconstructed phase space curve to be as smooth as possible and thus define the trajectory's fitness as follows.

The basic idea is to use a concept from image processing, i.e., the blurriness of a picture, and apply it to phase space trajectories. We want our trajectory to be as blurry, i.e., as smooth, as possible. In image processing, the blurriness is determined via second-order derivatives of grey-scale images at each pixel [7]. We employ this concept, but instead of using it at each pixel, we calculate the variance of second-order derivatives along our phase space trajectories. Similar to the idea from image processing, where the low variance of second-order derivatives implies more blurriness, curves with a low variance of secondorder derivatives exhibit comparatively smooth trajectories. The reason here is intuitively apparent: whereas curves with a high variance of second-order derivatives have a range of straight and pointy sections, curves with a low variance of second-order derivatives have a similar curvature along the trajectory and thus are smoother. Hence, in order to guarantee smoothness along the trajectory, we want this variance to be as low as possible, which thus is our loss *L*. Concluding, our fitness is maximal when our loss *L* is minimal.

Again, we start with the phase space vector and the corresponding embedding dimension *dE* and time delay *τ* (see Section 3) of each signal as

$$\vec{y}(i) = \begin{bmatrix} \mathbf{x}\_{i\prime}\mathbf{x}\_{i+\tau\prime} \dots \mathbf{x}\_{i+(d\_{\rm E}-1)\cdot\tau} \end{bmatrix} \quad . \tag{5}$$

Thus, we have one component for each dimension of the phase space. Consequently, we can write the individual components as

$$y\_j(i) = x\_{i + (j-1) \ast \tau} \quad \text{ } \tag{6}$$

where *j* = 1, 2, ... , *dE*. We then take the second-order finite difference central derivative of a discrete function [14]

$$
\mu\_j^{\prime\prime}(i) = \mathbf{x}\_{i + (j - 1) \ast \tau + 1} - 2\mathbf{x}\_{i + (j - 1) \ast \tau} + \mathbf{x}\_{i + (j - 1) \ast \tau - 1} \tag{7}
$$

at each point, and for each component. Next, we add up all the components as

$$u''(i) = \sqrt{\sum\_{j=1}^{d\_E} u\_j''(i)^2} \quad . \tag{8}$$

Then, finally, we use the variance of the absolute values of second derivatives along the phase space curve as our loss *L* of a phase space trajectory:

$$L = \text{Var}\_i\left[\boldsymbol{\mu}^{\prime\prime}(i)\right]. \tag{9}$$

The employed genetic algorithm consists of the following building blocks.

A candidate solution is an interpolated time series using a random Hurst exponent *H* ∈ (0; 1). The corresponding population of candidates is, e.g., 1000 of these stochastically interpolated time-series data with random Hurst exponents. A population of interpolated time-series data is generated using the multi-point Brownian bridges such that, for each member of the population, a random Hurst exponent with *H* ∈ (0; 1) is chosen, which then defines the interpolation of the member of the population. After generating the population, all members are sorted with respect to their fitness, i.e., the lower the loss *L*, the better an interpolation is. The mating is implemented such that only the best 50%, with respect to fitness, can mate to produce new offspring. The mating is done such that, for every gene,

i.e., each interpolation between two data points, there is a 50:50 chance to inherit it from either one of the parents. The mutation was implemented such that, in each generation, there is a 20% chance that a randomly chosen interpolated time series is replaced with a new interpolated time series within a corresponding randomly chosen new Hurst exponent. Moreover, we implemented a criterion for aborting the program, which was fulfilled if the population fitness mean did not change for ten generations. This described procedure was then performed for 1000 generations. However, the 1000 generations were never reached, as the criterion for abortion was always triggered, and the program was ended, thus yielding the best interpolation with respect to the fitness of the phase space trajectories before reaching the 1000th generation.

#### **5. Datasets**

We chose three datasets to test and demonstrate our approach. Two of the featured datasets are from the Time Series Data Library [15] and thus are known test datasets and provide us with benchmark results, which are discussed in Section 7.4.

The third dataset is the annual maize yields in Austria, which can be obtained from http://www.fao.org/faostat/, accessed on 21 July 2022. This third dataset is considered the most challenging of the three datasets for two reasons. First, it is an agricultural dataset, i.e., it is affected by the weather, genetic improvements of the plants, varying fertilization strategies, etc., meaning that we will most likely not discover any reasonable seasonalities and or trends, despite the apparent increase in maize yields due to various improvements in agriculture. Second, this dataset is collected annually for all of Austria, i.e., a lot of the information contained in the dataset is lost due to the annual and regional averaging. Thus, we conclude that it will be challenging or impossible to predict annual maize yields several years ahead effectively.

#### *5.1. Car Sales in Quebec Dataset*

This is a dataset from the Time Series Data Library [15]. It depicts monthly car sales in Quebec from January 1960 to December 1968, with an overall 108 data points.

The corresponding phase space embedding, with a time delay *τ* = 1, was detrended by subtracting a linear fit from the data and normalized such that the range of all data is between [0, 1]. The interpolated time series and the corresponding reconstructed phase space are depicted in Figure 2.

**Figure 2.** Time-series and attractor plots for the annual car sales in Quebec dataset. (**a**) Stoch. interpolated, 13 interpolation points, time-series plot; (**b**) non-interpolated, reconstructed attractor plot, detrended, normalized; (**c**) stoch.-interpolated, 13 interpolation points, reconstructed attractor plot, detrended, normalized. The rainbow colors in the phase space plots correspond to different steps in time. The spectrum starts with blue (early) and ends with red (later).

#### *5.2. Monthly International Airline Passengers Dataset*

This is a dataset from the Time Series Data Library [15]. It depicts monthly international airline passengers from January 1949 to December 1960, with an overall 144 data points, given in units of 1000.

The corresponding phase space embedding, with a time delay *τ* = 1, was detrended by subtracting a linear fit from the data and normalized such that the range of all data is between [0, 1]. The interpolated time series and the corresponding reconstructed phase space are depicted in Figure 3.

**Figure 3.** Time-series and attractor plots for the monthly international airline passengers dataset. (**a**) Stoch.-interpolated, 13 interpolation points, time-series plot; (**b**) non-interpolated, reconstructed attractor plot, detrended, normalized; (**c**) stoch.-interpolated, 13 interpolation points, reconstructed attractor plot, detrended, normalized. The rainbow colors in the phase space plots correspond to different steps in time. The spectrum starts with blue (early) and ends with red (later).

#### *5.3. Annual Maize Yields in Austria*

This is a dataset of the annual yields of maize in Austria ranging from 1961 to 2017, with an overall 57 data points. This dataset can be downloaded at http://www.fao.org/faostat/, accessed on 21 July 2022. The corresponding phase space embedding, with a time delay *τ* = 1 and an embedding dimension of *dE* = 3, was detrended by subtracting a linear fit from the data and normalized such that the range of all data is between [0, 1]. The interpolated time series and the corresponding reconstructed phase space are depicted in Figure 4.

**Figure 4.** Time-series and attractor plots for the annual maize yields in Austria data set. (**a**) Stoch. interpolated, 13 interpolation points, time-series plot; (**b**) non -interpolated, reconstructed attractor plot, detrended, normalized; (**c**) stoch.-interpolated, 13 interpolation points, reconstructed attractor plot, detrended, normalized. The rainbow colors in the phase space plots correspond to different steps in time. The spectrum starts with blue (early) and ends with red (later).

#### *5.4. Data Preprocessing*

Two steps of preprocessing were performed before forecasting the featured datasets. First, each dataset was made stationary by subtracting a linear fit. Second, each dataset was scaled to [0.1, 0.9].

Finally, each dataset was split into a train and test dataset with an 80%/20% ratio.

#### **6. LSTM Neural Network Time-Series Prediction**

LSTMs are a category of recurrent neural networks (*RNNs*). RNNs are capable of using feedback or *recurrent* connections to cope with time-series data.

LSTMs [16] feature a component called a *memory block* to enhance their capability to model long-term dependencies. This memory block is a recurrently connected subnet containing two functional modules, i.e., the memory cell and the corresponding gates. The task of the memory cell is to remember the temporal state of the neural network. On the

other hand, the gates are responsible for controlling the information flow and consist of multiplicative units.

#### *6.1. Randomly Parameterized Neural Networks*

In this article, we are using an approach developed in [5]. The idea is to generate many randomly parameterized neural networks to build ensemble predictions based on the phase space properties of the autoregressively produced predictions. An autoregressive prediction is a one-step-at-a-time prediction, whereas old outputs are used as inputs for the next step.

These randomly parameterized neural networks feature one to five hidden LSTM layers, a *hard sigmoid* activation function in the hidden and input layers, and a rectified linear unit as the output activation function. No dropout criterion or regularization was used.

We used the following ranges for the parameters for our randomly parameterized neural network implementation.


We used LSTM architectures for this research, one can use any type of neural network cell for this approach.

#### *6.2. Prediction Filter*

The so-generated autoregressive predictions are then filtered using the criterion for smooth phase space trajectories from Section 4.2, i.e., we want the variance of second derivatives along a reconstructed phase space trajectory to be as low as possible. We thus randomly chose 1 to 10 predictions from the whole set of predictions. Next, these predictions are averaged to form an ensemble prediction. This ensemble prediction is merged with the training data. Then, the variance of second derivatives along the phase space trajectory is analyzed. This process is repeated 1 million times. The set of averaged predictions with the lowest variance of second derivatives is kept. On all plots, this procedure is referred to as loss\_rand.

#### **7. Experiments and Results**

In this Section, we provide the experimental setup and the corresponding results.

First, each dataset was interpolated using PhaSpaSto interpolation (Section 4) with varying interpolation points. For the monthly international airline passengers and the car sales in Quebec datasets, interpolations with the following numbers of interpolation points were performed: *NI* = {1, 3, 5, 7, 9, 11, 13}. For the annual maize yields in Austria, this range was changed to *NI* = {9, 11, 13, 15} to save computational resources. Further, we produced 500 randomly parameterized neural network predictions for the monthly international airline passengers and car sales in Quebec datasets. In contrast, for the annual maize yields in Austria dataset, 1000 of these predictions were produced. These multitudes of predictions were created for the non-interpolated and the interpolated datasets. As initially mentioned, the whole scheme is depicted in Figure 1.

All of these predictions were analyzed using the root mean squared error (RMSE). Here, we used the RMSE on both a normalized dataset, i.e., the dataset and the prediction are scaled to be ∈ [0, 1] (denoted as **RMSE [0, 1]**, Equation (10)), and the regular dataset and prediction (denoted as **RMSE**, Equation (10)).

All errors for all datasets are collected in Table 1. The corresponding plots for the car sales in Quebec dataset are collected in Figure 5, the results for the monthly international airline passengers are plotted in Figure 6, and finally, the results for the annual maize

yields in Austria are depicted in Figure 7. Further, we discuss each dataset separately in the following.

$$RMSE = \left(\frac{1}{n} \sum\_{i=1}^{n} \left[\hat{\mathbf{x}}\_{i} - \boldsymbol{\alpha}\_{i}\right]^{2}\right)^{\frac{1}{2}} \tag{10}$$

Here, *x*ˆ*<sup>i</sup>* are the predicted values, *xi* is the ground truth, and *n* is the number of samples.

**Table 1.** Results for all datasets and experiments, i.e., interpolated, non-interpolated, filtered, and unfiltered.


#### *7.1. Car Sales in Quebec Dataset*

All errors for the car sales in Quebec dataset are collected in Table 1, and the corresponding plots are collected in Figure 5.

When comparing the errors for the results with and without interpolation, we see that the interpolated results are reduced, i.e., the unfiltered, interpolated results have a lower error than the unfiltered, not interpolated ones. The same is true for the filtered results.

Next, the filtered results consistently drastically outperformed the unfiltered ones. Exactly this behavior is depicted in Figure 5. The overall best result is the interpolated and filtered prediction approach, which can be seen in Figure 5d.

**Figure 5.** Autoregressive prediction results for the car sales in Quebec dataset. (**a**) Non-interpolated, non-filtered; (**b**) non-interpolated, filtered; (**c**) stoch. interpolated, 13 interpolation points, non-filtered; (**d**) stoch. interpolated, 1 interpolation point, filtered.

#### *7.2. Monthly International Airline Passengers*

All errors for the monthly international airline passengers dataset are collected in Table 1, and the corresponding plots are collected in Figure 6.

When comparing the errors for the results with and without interpolation, we see that the errors for the interpolated results are reduced, i.e., the unfiltered interpolated results have a lower error than the unfiltered, not interpolated ones. The same is true for the filtered results.

Next, the filtered results always drastically outperformed the unfiltered ones. Exactly this behavior is depicted in Figure 6. The overall best result is the interpolated and filtered prediction approach, which can be seen in Figure 6d.

**Figure 6.** Autoregressive prediction results for the car sales in Quebec dataset. (**a**) Non-interpolated, non-filtered; (**b**) non-interpolated, filtered; (**c**) stoch. interpolated, 13 interpolation points, non-filtered; (**d**) stoch. interpolated, 9 interpolation points, filtered.

#### *7.3. Annual Maize Yields in Austria Dataset*

All errors for the monthly international airline passengers dataset are collected in Table 1, and the corresponding plots are collected in Figure 7.

When comparing the errors for the results with and without interpolation, we see that the interpolated results are reduced, i.e., the unfiltered, interpolated results have a lower error than the unfiltered, not interpolated ones. The same is true for the filtered results.

Next, the filtered results consistently drastically outperformed the unfiltered ones. Exactly this behavior is depicted in Figure 7. The overall best result is the interpolated and filtered prediction approach, which can be seen in Figure 7d.

This dataset is considered to be the most difficult of the three featured datasets, and although our predictions are still off, as can be seen in Figure 7d, the performed procedures, i.e., PhaSpaSto interpolation and the prediction filter, do improve the accuracy of the forecast. Further, the result depicted in Figure 7d suggests that the employed neural networks learned some inherent behavior, especially when taking into account the initial variations after the train/test cut.

**Figure 7.** Autoregressive prediction results for the car sales in Quebec dataset. (**a**) Non-interpolated, non-filtered; (**b**) non-interpolated, filtered; (**c**) stoch. interpolated, 15 interpolation points, non-filtered; (**d**) stoch. interpolated, 15 interpolation points, filtered.

#### *7.4. Benchmark and Baseline Predictions*

We finally provide some baseline and benchmark results for the conducted experiments. We used an LSTM neural network with one hidden layer as a baseline prediction. Each neural network was trained with a batch size of 2 and varying epochs. Further, verbose was set to 2. For the activation of the neural network, hard\_sigmoid was chosen, and the activation function of the output layer was relu. For the initialization, glorot\_uniform was used for the LSTM layer, orthogonal was used as the recurrent initializer, and glorot\_uniform for the Dense layer. For the LSTM layer, the bias was set to use\_bias=True, with a corresponding bias\_initializer="zeros". Further, no constraints, regularizers, or dropout criteria were used for the recurrent and the Dense layers. As an optimizer, rmsprop was used and the loss was calculated using mean\_squared\_error. The output node returned only one result, i.e., the next time step. The varying architectures are collected in Table 2 and the corresponding predictions are depicted in Figure 8.

#### **Table 2.** Errors for the baseline predictions for each dataset.


**Figure 8.** LSTM baseline predictions. The red line denotes the autorgressive single step-by-step prediction, which is featured in Table 2. (**a**) Car sales in Quebec dataset; (**b**) monthly international airline passengers dataset; (**c**) annual maize yields in Austria dataset.

The featured baseline predictions, though reasonable, are consistently outperformed by the interpolated filtered results from our main experiments—see Table 1—in terms of RMSE. Here, we want to highlight that the baseline neural network did not capture the characteristics of the annual maize yields dataset. This may be due to poor parameterization or the fact that a single hidden layer is not sufficient to capture the dynamics of this dataset. Still, we tuned the employed neural networks by hand, and the results for the other dataset show that neural networks of these sizes are sufficient for univariate time-series data of this length. The best ensemble result, in comparison, provides a drastically improved result compared to the baseline prediction.

When it comes to benchmark results from the literature, we found the best result for the monthly international airline passengers dataset in ref. [17], with an RMSE of 13.0 for a hybrid MLP-ARIMA approach, which is superior to our baseline of 30.6 and our best ensemble result of 21.2. Thus, we conclude that our ensemble approach with the presented specifications cannot outperform state-of-the-art methods for this dataset.

For the monthly car sales in Quebec dataset, we found a comparable result in ref. [18], with an RMSE [0, 1] of 0.08143 for the additive Holt–Winters method. Our baseline LSTM result for this dataset has an RMSE [0, 1] of 0.08593 and our best ensemble results is at 0.07958. We conclude that our ensemble prediction is able to outperform the best results from ref. [18] for this dataset.

As far as the authors know, there is no benchmark result for the annual maize yields in Austria dataset. Thus, we stick to the previously presented baseline prediction, which is outperformed by the presented ensemble prediction.

#### *7.5. Summary*

We briefly summarize our findings and point out the main results below.


Though the interpolated and filtered ensemble approach did outperform a given baseline prediction for the monthly international airline passengers dataset, the featured benchmark prediction from the literature still outperformed our approach on this dataset.

We provide a baseline prediction for the annual maize yields in Austria dataset, which was outperformed using our interpolated and filtered ensemble approach, discussed in Section 7.4. We cannot provide a benchmark result from the literature for this dataset.

4. The employed neural network ensembles were not individually parameterized for each dataset. Instead, we filtered the predictions according to the phase space properties of each dataset. Thus, we could circumvent the problem of parameterizing neural networks.

#### **8. Conclusions**

This article presents an experiment to test the applicability of a novel interpolation method—for simplicity, abbreviated as PhaSpaSto interpolation—combined with randomly parameterized neural network autoregressive predictions. These predictions are then filtered using the variance of second derivatives along a reconstructed phase space trajectory to only keep forecasts that ensure a smooth phase space trajectory.

First, a given time series is interpolated using the featured interpolation method. It is forecast by generating multiple differently parameterized neural networks, each providing an autoregressive prediction of the data under study. Finally, this multitude of predictions is filtered such that the result guarantees a smooth reconstructed phase space trajectory.

The results show that this novel approach outperforms the provided baseline predictions. Further, we were able to best a given benchmark result from the literature for one of the three discussed datasets.

The concept of reconstructed phase spaces can be applied to interpolate time series to guarantee a smooth phase space trajectory, which, in turn, improves the accuracy of our neural network predictions. Further, filtering ensemble predictions based on their phase space properties, i.e., the smoothness of their phase space trajectories, improves the presented ensemble predictions. Moreover, we can circumvent the problem of parameterizing neural networks by generating many predictions and filtering them based on their phase space properties.

Ideas for future research in this field are, e.g., to test the presented filter on other state-of-the-art ensemble approaches or to test the robustness of neural network predictions using the presented PhaSpaSto and related interpolation techniques. We further want to highlight that the proposed methodology improves the accuracy on the featured annual maize yields dataset, which the authors expected to be a challenging dataset. Thus, another idea for future research might be to specifically target challenging time-series prediction problems, such as forecasting agricultural or financial time-series data.

**Author Contributions:** Conceptualization, S.R.; Data curation, S.R.; Funding acquisition, T.N.; Investigation, S.R.; Methodology, S.R.; Project administration, T.N.; Resources, T.N.; Software, S.R.; Supervision, T.N.; Validation, S.R.; Visualization, S.R.; Writing—original draft, S.R.; Writing—review and editing, S.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors acknowledge the funding of the project "DiLaAg—Digitalization and Innovation Laboratory in Agricultural Sciences", by the private foundation "Forum Morgen", the Federal State of Lower Austria, and by the FFG; Project AI4Cropr, No. 877158.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Proceeding Paper* **Dynamic Asymmetric Causality Tests with an Application †**

**Abdulnasser Hatemi-J**

College of Business and Economics, UAE University, Al Ain P.O. Box 1551, United Arab Emirates; ahatemi@uaeu.ac.ae

† Presented at the 8th International Conference on Time Series and Forecasting, Gran Canaria, Spain, 27–30 June 2022.

**Abstract:** Testing for causation—defined as the preceding impact of the past value(s) of one variable on the current value of another when all other pertinent information is accounted for—is increasingly utilized in empirical research using the time-series data in different scientific disciplines. A relatively recent extension of this approach has been allowing for potential asymmetric impacts, since this is harmonious with the way reality operates, in many cases. The current paper maintains that it is also important to account for the potential change in the parameters when asymmetric causation tests are conducted, as several reasons exist for changing the potential causal connection between the variables across time. Therefore, the current paper extends the static asymmetric causality tests by making them dynamic via the use of subsamples. An application is also provided consistent with measurable definitions of economic, or financial bad, as well as good, news and their potential causal interactions across time.

**Keywords:** dynamic causality; asymmetric causality; positive changes; negative changes; oil; stock market; the US

**JEL Classification:** C32; C51; D82; G15; E17

**Citation:** Hatemi-J, A. Dynamic Asymmetric Causality Tests with an Application. *Eng. Proc.* **2022**, *18*, 41. https://doi.org/10.3390/engproc 2022018041

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 2 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

From cradle to grave, one of the most prevalent and persistent questions in life is figuring out what is the cause and what is the effect when certain pertinent events are observed. This subject must have been one of the most inspirational and important issues since the dawn of humankind. Throughout history, many philosophers have devoted their pondering to causality as an abstraction. Yet there is no common definition of causality, and above all, there is no common or universally accepted approach for detecting or measuring causality. Since the pioneer notion of [1] and the seminal contribution of [2], testing for the predictability impact of one variable on another has increasingly gained popularity and practical usefulness in different fields when the variables are quantified across time. This approach is known as Granger causality in the literature, and it describes a situation in which the past values of one variable (i.e., the cause variable) are statistically significant in an autoregressive regression model of another variable (i.e., the effect variable) when all other relevant information is also accounted for. The null hypothesis is defined as zero restrictions imposed on the parameters of the cause variable in the autoregressive model when the dependent variable is the potential effect variable. If the null hypothesis is not accepted empirically, it is taken as empirical evidence for causality, according to the Wiener–Granger method (alternative designs exist, such as those detailed by [3,4]). There have been several extensions of this method, especially since the discovery of unit roots and stochastic trends, which is a common property of many time-series variables, that quantify economic or financial processes across time. Refs. [5–7] suggested testing for causality via an error correction model if the variables are integrated. Ref. [8] proposed a modified [9] test statistic in order to take into account the impact of unit roots when causality

tests are conducted within the vector autoregressive (VAR) model by adding additional unrestricted lags. Ref. [10,11] suggested bootstrapping with leverage adjustments in order to generate accurate critical values for the modified Wald test, since the asymptotical values are not precise when the desirable statistical assumptions for a good model are not satisfied, according to the Monte Carlo simulations conducted by the authors. The bootstrap corrected tests appear to have better size and power properties compared to the asymptotic tests, especially in small sample sizes.

However, there are numerous reasons for the potential causal connection between the variables to have an asymmetric structure. It is commonly agreed to in the literature that markets with asymmetric information prevail (based on the seminal contributions of [12–14]). People frequently react more strongly to a negative change, in contrast to a comparable positive one (According to [15–18], among others, there is indeed an asymmetric behavior by the investors in the financial markets, since they have a tendency to respond more strongly to negative compared to positive news). There are also natural restrictions that can lead to the asymmetric causation phenomenon. For instance, there is a limit on the potential price decrease of any normal asset or commodity, since the price cannot drop below zero. However, there is no restriction on the amount of the potential price increase. In fact, if the price decreases by a given percentage point and then increases again by the same percentage point; it will not end up at the initial price level, but at a lower level. This is true even if the process occurs in the reverse order. There are also moral and/or legal limitations that can lead to an asymmetric behavior. For example, if a company manages to increase its profit by the *P*% at a given period, it is feasible and easy to expand the business by that percentage point. However, if the mentioned company experiences a loss by the *P*%, it is not that easy to implement an immediate contraction of the business operation by the *P*%. The contraction is usually less than the *P*%, and it can take a longer time to realize this contraction compared to the corresponding expansion. It is naturally easier for the company to hire people than to fire them. In the markets for many commodities, it can also be clearly observed that there is an inertia effect for price decreases compared to price increases. Among others, the fuel market can be mentioned. When the oil price increases, there seems to be an almost immediate increase in fuel prices, and by the same proportion, if not more. However, when the oil price decreases, there is a lag in the decrease in fuel prices, and the adjustment might not be fully implemented. This indicates that the fuel prices adjustments are asymmetric with regard to the oil price changes under the ceteris paribus condition. In order to account for this kind of potential asymmetric causation in the empirical research based on time-series data, ref. [19] suggests implementing asymmetric causality tests, which the author introduces. However, these asymmetric causality tests are static by nature.

The objective of the current paper is to extend these asymmetric causality tests to a dynamic context by allowing for the underlying causal parameters to vary across time, which can be achieved by using subsamples. There are several advantages for using this dynamic parameter approach. One of the advantages of the time-varying parameter approach is that it takes into account the well-known [20] critique, which is an essential issue from the policy-making point of view. Peoples' preferences can change across time, resulting in a change in their behavior, thereby changing the economic or financial process. There are ground-breaking technological innovations and progresses that happen with time. Major organizational restructuring can take place across time. Unexpected major events, such as the current COVID-19 pandemic, can occur. All these events can result in a change in the potential causal connection between the underlying variables in a model. Thus, a dynamic parameter model can be more informative, and it can better present the way things operate in reality. Moreover, from a correct model specification perspective, the dynamic parameter approach can be preferred to the constant parameter approach, as it is also more informative. Since the dynamic testing of the potential causation connection is more informative than the static approach, it can shed light on the extent of the pertinent phenomenon known, in the financial literature, as the correlation risk. According to [21], correlation risk is defined as the potential risk that the strength of the relationship between financial assets varies unfavorably across time. This issue can have crucial ramifications for investors, institutions, and the policy makers.

The rest of the paper is organized as follows. Section 2 introduces the methodology of dynamic asymmetric causality testing. Section 3 provides an application of the potential causal impact of oil prices on the world's largest stock market, accounting for rising and falling prices, using both the static and the dynamic asymmetric causality tests. Conclusions are offered in the final section.

#### **2. Dynamic Asymmetric Causality Testing**

The subsequent definitions are utilized in this paper.

**Definition 1.** *An n-dimensional stochastic process* (*xt*)*t*=1,··· , *<sup>T</sup> measured across time is integrated of degree 1, signified as I(1), if it must be differenced once for becoming a stationary process. That is, xt ~~ I(1) if* Δ*xt ~~ I(0), where the denotation* Δ *is the first difference operator.*

**Definition 2.** *Define* (*εt*)*t*=1,··· , *<sup>T</sup> as an n-dimensional stochastic process. Thus, during any time period t* ∈ {1, ··· , *T*}*, the positive and negative shocks of this random variable ε<sup>t</sup> (i.e., ε* + *<sup>t</sup> and ε* − *t ) are identified as the following:*

$$\varepsilon\_t^+ := \max(\varepsilon\_t, 0) := (\max(\varepsilon\_{1t'}, 0), \dots, \max(\varepsilon\_{nt'}, 0))\tag{1}$$

*and*

$$\varepsilon\_t^- := \min(\varepsilon\_t, 0) := (\min(\varepsilon\_{1t}, 0), \dots, \min(\varepsilon\_{nt}, 0))\tag{2}$$

The definition of the positive and negative shocks was suggested by [22] for testing for hidden cointegration.

The implementation of the causality tests in the sense of the Wiener–Granger method is operational within the vector autoregressive (VAR) model of [23]. The asymmetric version of this test method is introduced by [19] (Ref. [24] extends the test to the frequency domain). Consider the following two I(1) variables with deterministic trend parts (For the simplicity of expression, we assume that *n* = 2. However, it is straightforward to generalize the results):

$$\mathbf{x}\_{1t} = a + bt + \mathbf{x}\_{1t-1} + \varepsilon\_{1t'} \tag{3}$$

and

$$\mathbf{x}\_{2t} = \mathbf{c} + dt + \mathbf{x}\_{2t-1} + \varepsilon\_{2t\prime} \tag{4}$$

where *a*, *b*, *c*, and *d* are parametric constants and *t* is the deterministic trend term. The positive and negative partial sums of the two variables can be recursively defined as the following, based on the definitions of shocks presented in Equations (1) and (2):

$$\mathbf{x}\_{1t}^{+} := \frac{at + \left[\frac{t(t+1)}{2}\right]b + \mathbf{x}\_{10}}{2} + \sum\_{i=1}^{t} \varepsilon\_{1i}^{+} \tag{5}$$

$$\chi\_{1t}^{-}:=\frac{at+\left[\frac{t(t+1)}{2}\right]b+\chi\_{10}}{2}+\sum\_{i=1}^{t}\varepsilon\_{1i}^{-}\tag{6}$$

$$\chi\_{2t}^{+} := \frac{ct + \left[\frac{t(t+1)}{2}\right]d + \varkappa\_{20}}{2} + \sum\_{i=1}^{t} \varepsilon\_{2i}^{+} \tag{7}$$

$$\chi\_{2t}^{-}:=\frac{ct+\left[\frac{t(t+1)}{2}\right]d+\chi\_{20}}{2}+\sum\_{i=1}^{t}\varepsilon\_{2i}^{-}\tag{8}$$

where *x*<sup>10</sup> and *x*<sup>20</sup> are the initial values. Note that the required conditions of having *<sup>x</sup>*1*<sup>t</sup>* <sup>=</sup> *<sup>x</sup>*<sup>+</sup> <sup>1</sup>*<sup>t</sup>* + *x*<sup>−</sup> <sup>1</sup>*<sup>t</sup>* and *<sup>x</sup>*2*<sup>t</sup>* <sup>=</sup> *<sup>x</sup>*<sup>+</sup> <sup>2</sup>*<sup>t</sup>* + *x*<sup>−</sup> <sup>2</sup>*<sup>t</sup>* are fulfilled (For the proof of these results and for the transformation of I(2) and I(3) variables into the cumulative partial sums of negative and positive components see [25]). Interestingly, the values expressed in Equations (5)–(8) also have economic or financial implications in terms of measuring good or bad news that can affect the markets. It should be mentioned that the issue of whether to include the deterministic trends in the data generating process for a given variable is an empirical issue. In some cases, there might be the need for both a drift and a trend, and in other cases, it might be sufficient to include a drift without any trend. It is also possible to have no drift and no trend. For the selection of the deterministic trend components, the procedure suggested by [26] can be useful.

The asymmetric causality tests can be implemented via the vector autoregressive model of order *p*, as originally introduced by [22], i.e., the VAR(*p*). Let us consider testing for the potential causality between the positive components of these two variables. Then, the vector consisting of the dependent variables is defined as *x*<sup>+</sup> *<sup>t</sup>* = *x*+ 1*t* , *x*<sup>+</sup> 2*t* , and the following VAR(*p*) can be estimated based on this vector:

$$\mathbf{x}\_{t}^{+} = B\_{0}^{+} + B\_{1}^{+} \mathbf{x}\_{t-1}^{+} + \dots + B\_{p}^{+} \mathbf{x}\_{t-p}^{+} + B\_{p+1}^{+} \mathbf{x}\_{t-p-1}^{+} + v\_{t}^{+} \tag{9}$$

where *B*<sup>+</sup> <sup>0</sup> is the 2 <sup>×</sup> 1 vector of intercepts, *<sup>B</sup>*<sup>+</sup> *<sup>r</sup>* is a 2 × 2 matrix of parameters to be estimated for lag length *r* (*r* = 1, ... , *p*), and *v*<sup>+</sup> *<sup>t</sup>* is a 2 × 1 vector of the error terms. An important issue consideration before using the VAR(*p*) for drawing inference is to determine the optimal lag order *p*. This can be achieved, among other methods, by minimizing the information criterion suggested by [27], which is expressed as the following:

$$H\text{JC} = \ln\left(\left|\text{i}\mathbf{\hat{I}}\_p^+\right|\right) + p\left(\frac{n^2 \ln T + 2n^2 \ln(\ln T)}{2T}\right), p = 1, \dots, \ p\_{\text{max}}.\tag{10}$$

where Πˆ *p* + is the determinant of the variance–covariance matrix of the error terms in the VAR model that is estimated based on the lag length *p*, ln is the natural logarithm, *n* is the number of time-series included in the VAR model, and *T* is the full sample size used for estimating the parameters in that model (The Monte Carlo simulations conducted by [28] demonstrate clearly that the information criterion expressed in equation (10) is successful in selecting the optimal lag order when the VAR model is used for forecasting purposes. In addition, the simulations show that this information criterion is robust to the ARCH effects and performs well when the variables in the VAR model are integrated. See also [29] for more information on this criterion). The lag order that results in the minimum value of the information criterion is to be selected as the optimal lag order. It is also important that the off-diagonal elements in the variance-covariance matrix are zero. Therefore, tests for multivariate autocorrelation need to be performed in order to verify this issue. The null hypothesis that the *j*th element of *x*<sup>+</sup> *<sup>t</sup>* does not cause the *<sup>k</sup>*th element of *<sup>x</sup>*<sup>+</sup> *<sup>t</sup>* can be tested via a [9] test statistic (It should be mentioned that the additional unrestricted lag has been added to the VAR model for taking into account the impact of one unit root, consistent with the results of [8]. Multivariate tests for autocorrelation also need to be implemented to make sure that the off-diagonal elements in the variance and covariance matrix are zero). The null hypothesis of non-causality can be formulated as the following:

$$H\_0: \text{ The row } k\_r \text{ column } j \text{ element in } B\_r^+ \text{ equals zero for } r = 1, \dots, p. \tag{11}$$

For a dense representation of the Wald test statistic, we need to make use of the following denotations (It should be pointed out that this formulation requires that the *p* initial values for each variable in the VAR model are accessible. For the particulars on this requirement, see [30]):

*X*<sup>+</sup> := *x*+ <sup>1</sup> , . . . ., *<sup>x</sup>*<sup>+</sup> *T* as a (*<sup>n</sup>* <sup>×</sup> *<sup>T</sup>*) matrix, *<sup>D</sup>*<sup>+</sup> :<sup>=</sup> - *B*+ <sup>0</sup> , *<sup>B</sup>*<sup>+</sup> <sup>1</sup> ,..., *<sup>B</sup>*<sup>+</sup> *p*+1 as a (*n* × (1 + *<sup>n</sup>* <sup>×</sup> (*<sup>p</sup>* + 1))) matrix, *<sup>Z</sup>*<sup>+</sup> *<sup>t</sup>* := 5 1, *x*<sup>+</sup> *<sup>t</sup>* , *<sup>x</sup>*<sup>+</sup> *<sup>t</sup>*−1, . . . ., *<sup>x</sup>*<sup>+</sup> *t*−*p* 6 as a ((1 + *n* × (*p* + 1)) × 1) matrix, *Z*<sup>+</sup> := *Z*<sup>+</sup> <sup>0</sup> , . . . ., *<sup>Z</sup>*<sup>+</sup> *T*−1 as a ((1 + *<sup>n</sup>* <sup>×</sup> ((*<sup>p</sup>* + 1) + 1)) <sup>×</sup> *<sup>T</sup>*) matrix and *<sup>V</sup>*<sup>+</sup> = (*v*<sup>+</sup> <sup>1</sup> , ... , *<sup>v</sup>*<sup>+</sup> *<sup>T</sup>* as a (*n* × *T*) matrix. Via these denotations, we can express the VAR model and the Wald test statistic compactly as the following:

$$X^{+} = D^{+}Z^{+} + V^{+} \tag{12}$$

$$\text{Wald}^+ = \left(\mathbb{C}\mathfrak{F}^+\right)' \left[\mathbb{C}\left(\left(Z^{+\prime}Z^+\right)^{-1}\otimes\text{I}\mathfrak{I}\_{\mathfrak{u}}^+\right)\mathbb{C}'\right]^{-1}\left(\mathbb{C}\mathfrak{F}^+\right) \tag{13}$$

The parameter matrix *D*<sup>+</sup> is estimated via the multivariate least squares as the following:

$$
\hat{D}^{+} = X^{+} Z^{+\prime} \left( Z^{+} Z^{+\prime} \right)^{-1} \tag{14}
$$

Note that *β*<sup>+</sup> = *vec*(*D*ˆ +) and *vec* is the column-stacking operator. That is

$$\mathcal{J}^+ = \text{vec}(\mathcal{D}^+) = \left( \left( Z^+ Z^{+\prime} \right)^{-1} \otimes I\_{\mathbb{N}} \right) \text{vec} \left( X^+ \right) \tag{15}$$

The denotation ⊗ is the Kronecker product operator, and *C* is a ((*p* + 1) × *n*) × (1 + *n* × (*p* + 1)) indicator matrix that includes one and zero elements. The restricted parameters are defined by elements of ones, and the elements of zeros are used for defining the unrestricted parameters under the null hypothesis. *In* is an (*<sup>n</sup>* <sup>×</sup> *<sup>n</sup>*) identity matrix. <sup>Π</sup><sup>ˆ</sup> *<sup>u</sup>* <sup>+</sup> represents the variance–covariance matrix of the unrestricted VAR model as expressed by Equation (12), which can be estimated as the following:

$$\text{II}\_{\text{ul}}{}^{+} = \frac{\hat{V}\_{\text{ul}}^{+} \prime \hat{V}\_{\text{ul}}^{+}}{T - q} \tag{16}$$

Note that the constant *q* represents the number of parameters that are estimated in each equation of the VAR model. By using the presented denotations, the null hypothesis of no causation might also be formulated as the following expression:

$$H\_0: \mathbb{C}\mathfrak{F}^+ = 0 \tag{17}$$

The Wald test statistic expressed in (13) that is used for testing the null hypothesis of non-causality, as defined in (11) based on the estimated VAR model in Equation (12), has the following distribution, asymptotically:

$$
\begin{array}{ccccc}
\text{Wald}^+ & \xrightarrow{d} & \chi\_p^2 \\
\end{array}
\tag{18}
$$

This is the case if the assumption of normality is fulfilled. Thus, the Wald test statistic for testing for potential asymmetric causal impacts has a *χ*<sup>2</sup> distribution, with the number of degrees of freedom equal to the number of restrictions under the null hypothesis of non-causality, which is equal to *p* in this particular case. This result also holds for a corresponding VAR model for negative components, or any other combinations. For the proof, see Proposition 1 in [25].

However, if the assumption of the normal distribution of the underlying dataset is not fulfilled, the asymptotical critical values are not accurate, and bootstrap simulations need to be performed in order to obtain accurate critical values. If the variance is not constant, or if the ARCH effects prevail, then the bootstrap simulations need to be conducted with leverage adjustments. The size and power properties of the test statistics, based on the bootstrap simulation approach with leverage corrections, has been investigated by [10,11] via the Monte Carlo simulations. The simulation results provided by the previously mentioned authors show that the causality test statistic based on the leveraged bootstrapping

exhibits correct size and higher power, compared to a causality test based on the asymptotic distributions, especially when the sample size is small or when the assumption of normal distribution and constant variance of the error terms are not fulfilled.

The bootstrap simulations can be conducted as the following. First, estimate the restricted model based on regression Equation (12). The restricted model imposes the restrictions under the null hypothesis of non-causality. Second, generate the bootstrap data, i.e., *X*+∗, via the estimated parameters from the regression, the original data, and the bootstrapped residuals. This means generating *X*+<sup>∗</sup> = *D*ˆ <sup>+</sup>*Z*<sup>+</sup> + *V*+∗. Note that the bootstrapped residuals (i.e., *V*+∗) are created by *T* random draws, with replacement from the modified residuals of the regression. Each of these random draws, with replacement, has the same likelihood, which is equal a probability of 1/*T*. The bootstrapped residuals need to be mean-adjusted in order to ensure that the residuals have zero expected value in each bootstrap sample. This is accomplished via subtracting the mean value of the bootstrap sample from each residual in that sample. Note that the residuals need to be adjusted by leverages in order to make sure that the variance is constant in each bootstrap sample. Next, repeat the bootstrap simulations 10,000 times and estimate the Wald test each time (For more information on the leverage adjustments in univariate cases, see Davison and [31], and in multivariate cases, see [10]). Use these test values in order to generate the bootstrap distribution of the test. The critical value at the *α* significance level via bootstrapping (denoted by *c*∗ *<sup>α</sup>*) can be acquired by taking the (*α*)th upper quantile of the distribution of the Wald test that is generated via the bootstrapping. The final step is to estimate the Wald test value based on the original data and compare it to the bootstrap critical value at the *α* level of significance. The null hypothesis of non-causation is rejected at the *α* significance level if the estimated Wald test value is higher than the *c*∗ *<sup>α</sup>* (i.e., the bootstrap critical value at *α* level).

In order to account for the possibility of the potential change in the asymmetric causal connection between the variables, these tests can be conducted using subsamples. A crucial issue within this context is to determine the minimum subsample size that is required for testing for the dynamic asymmetric causality. The following formula, developed by [32], can be used for determining the smallest subsample size (*S*):

$$S = \left[ T \left( 0.01 + \frac{1.8}{\sqrt{T}} \right) \right] \tag{19}$$

where *T* is the original full sample size. Note that *S* needs to be rounded up.

Two different approaches regarding the subsamples can be implemented for this purpose. The first one is the fixed rolling window approach, which is based on repeated estimation of the model, with a subsample size of *S* each time, and the window is moved forward by one observation each time. That is, we need to estimate the time varying causality for the following subsamples, where each number represents a point in time:

$$\begin{array}{r} 1, \, 2, \, 3, \, \cdots \, \, \, \, \, \, \, \, \\ 2, \, \, 3, \, 4, \, \, \cdots \, \, \, \, \, (S+1) \\ 3, \, \, 4, \, 5, \, \cdots \, \, \, \, \, (S+1), \, (S+2) \\ 4, \, \, 5, \, 6, \, \cdots \, \, \, \, \, \, (S+1), \, \, \, (S+2), \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \end{array}$$

This means that the first subsample consists of the range covering the first observation to the *S* observation. The next subsample removes the first observation from *S* and adds the one after *S*. The process is continued until the full range is covered. For example, assume that *T* = 10 and then we have *S* = 6 based on Equation (19) when *S* is rounded up (Obviously, the sample size normally needs to be bigger than 10 observations in the empirical analysis. Here, a very small sample size is assumed for the sake of the simplicity of the expression). Thus, we have the following subsamples (where each number represents the corresponding time):

$$1, \ 2, \ 3, \ \cdots, \ S = 1, \ 2, \ 3, \ 4, \ 5, \ 6$$

$$2, \ 3, \ 4, \ \cdots, \ \ (S+1) = 2, \ 3, \ 4, \ 5, \ 6, \ 7$$

$$3, \ 4, \ 5, \ \cdots, \ (S+1), \ (S+2) = 3, \ 4, \ 5, \ 6, \ 7, \ 8$$

$$4, \ 5, \ 6, \ \cdots, \ (S+1), \ (S+2), \ (S+3) = 4, \ 5, \ 6, \ 7, \ 8, \ 9$$

$$(T-S+1), (T-S+2), \ (T-S+3), \ \cdots, \ (T-S+S) = (10-6+1), (10-6+2),$$

$$(10-6+3), \ (10-6+4), \ (10-6+5), \ (10-6+6) = 5, \ 6, \ 7, \ 8, \ 9, \ 10$$

The graphical illustration of this approach for the current example is depicted in Figure 1.

**Figure 1.** The illustration of the subsamples compared to the entire sample based on the fixed rolling window approach. Notes: *T* represents the full sample size, and *Si* represents subsample *i* (for *i* = 1, . . . , 5.), in this case.

The second method for determining multiple subsamples is to start with *S* and recursively add an observation to *S* each time for obtaining the next subsample without removing any observation from the beginning. In this approach, the sample size increases by one observation in each subsample until it covers the full range. That is, the size of the first subsample is equal to *S* and the size of the last one is equal to *T*. This is a recursive rolling window approach that is anchored at the start. The graphical drawing of this method for the mentioned example is shown in Figure 2.

The next step in order to implement the dynamic asymmetric causality tests is to calculate the Wald test statistic for each subsample and produce its bootstrap critical value at a given significance level. Then, the following ratio can be calculated for each subsample:

$$TVp\mathbb{C}V = \frac{\text{Wald Test Value based on the Given Subsample}}{\text{Boost trap Circuit Value at the Given Significance Level and Subsample}} \tag{20}$$

where *TVpCV* signifies the test value per the critical value at a given significance level using a particular subsample. If this ratio is higher than one, it implies that the null hypothesis of no causality is rejected at the given significance level for that subsample. The 5% and the 10% significance levels can be considered. A graphical illustration of (20) for different

subsamples can be informative to the investigator in order to detect the potential change of the asymmetric causal connection between the underlying variables in the model.

**Figure 2.** The graphical presentation of the subsamples based on the recursive rolling window approach. Notes: *Si* represents subsample *i* (for *i* = 1, ... , 5.), in this example. Note that the *S*<sup>5</sup> = *T*, in this case.

An alternative method for estimating and testing the time-varying asymmetric causality tests is to make use of the [33] filter within a multivariate setting. However, this method might not be operational if the dimension of the model is rather high and/or the lag order is large.

#### **3. An Application**

An application is provided for detecting the change in the potential causal impact of oil prices and the world's largest financial market. Two indexes are used for this purpose. The first one is the total share prices index for all shares for the US. The second index is the global price of Brent Crude in USD per barrel. The frequency of the data is yearly, and it covers the period of 1990–2020. The source of the data is the FRED database, which is provided by the Federal Reserve Bank of St. Louis. Let us denote the stock price index by *S* and the oil price index by *O*. The aim of this application is to investigate whether the potential impact of rising or falling oil prices on the performance of the world's largest stock market is time dependent or not. Interestingly, the US market is not only the largest market and its financial market is the most valuable in the word, but the US is also the biggest oil producer in the world. These combinations might make the empirical results of this pragmatic application more useful and general.

The variables are expressed in the natural logarithm format. The unit root test results of the conducted Phillips–Perron test confirm that each variable is integrated at the first order (The unit root test results are not reported, but they are available on request). First, the following linear regression is estimated via the OLS technique for the stock price index (i.e., *x*1*<sup>t</sup>* = *lnSt*):

$$(lnS\_t - lnS\_{t-1}) = a + bt + \varepsilon\_{1t} \tag{21}$$

Next, the residuals of the above regression are estimated. That is

$$\hat{\varepsilon}\_{1t} = \left(\ln S\_t - \ln S\_{t-1}\right) - \hat{a} - \hat{b}t.$$

Note that the circumflex implies the estimated value. The positive and negative shocks are measured as the following, based on the definitions presented in Equations (1) and (2):

$$
\mathfrak{E}\_{1t}^+ := \max(\mathfrak{E}\_{1t'}0) \qquad \text{and} \qquad \mathfrak{E}\_{1t}^- := \min(\mathfrak{E}\_{1t'}0) \tag{22}
$$

The positive and negative partial sums for this variable are defined as the following, based on the results presented in Equations (5) and (6):

$$\left(\ln \text{S}\_{t}\right)^{+} := \frac{\text{it} + \left[\frac{t(t+1)}{2}\right]\hat{b} + \left(\ln \text{S}\_{0}\right)}{2} + \sum\_{i=1}^{t} \hat{\varepsilon}\_{1i}^{+} \tag{23}$$

$$\left(\ln \mathcal{S}\_t\right)^{-} := \frac{\mathfrak{A}t + \left[\frac{t(t+1)}{2}\right]\mathfrak{f} + \left(\ln \mathcal{S}\_0\right)}{2} + \sum\_{i=1}^{t} \mathfrak{E}\_{1i}^{-} \tag{24}$$

where *lnS*<sup>0</sup> signifies the initial value of the stock price index in the logarithmic format, which is assumed to be zero, in this case. Note that the equivalency condition *lnSt* = (*lnSt*) <sup>+</sup> + (*lnSt*) − is fulfilled. It should be mentioned that the value expressed by Equation (22) represents the good news with regard to the stock market, while the value expressed by Equation (23) signifies the bad news pertaining to the same market. The oil price index can also be transformed into cumulative partial sums of positive and negative components in an analogous way. Note that a drift and a trend were included in the equation of each variable, since it seems to be needed, based on the graphs presented in Figure 3.

The dataset can be transformed by a number of user-friendly statistical software components. Prior to implementing causality tests, diagnostic tests were also implemented. The results of these tests are presented in Table A1, which indicate that the assumption of normality is not fulfilled, and the conditional variance is not constant, in most cases. Thus, making use of the bootstrap simulations with leverage adjustments is necessary in order to produce reliable critical values. This is particularly the case for subsamples, since the degrees of freedom are lower.

Both symmetric and the asymmetric causality tests are implemented in a dynamic setting using the statistical software component created by [34] in Gauss. The results for the symmetric causality tests are presented in Tables A2 and A3, based on the 5% and the 10% significance levels. Based on these results, it can be inferred that the oil price does not cause the results of the stock market price index, not even at the 10% significance level.

The results are also robust regarding the choice of the subsamples because the same results are obtained with all subsamples. An implication of this empirical finding is that the market is informationally efficient in the semi-strong form with regard to oil prices, as defined by [35]. However, when the tests for dynamic asymmetric causality are implemented, the results show that an oil price decrease does not cause a decrease in the stock market price index, and these results are the same across subsamples, even at the 10% significance level (see Tables A4–A6). Conversely, the null hypothesis that an oil price increase does not cause an increase in the stock market price index is rejected during four subsamples.

It is interesting that by using only three fewer observations, the null hypothesis of non-causality would be rejected at the 10% significance level, in contrast to the result for the entire sample period, which does not reject the underlying null hypothesis (see Figure 4 and Table A7).

**Figure 3.** The time plot of the variables, along with the cumulative partial components for positive and negative parts. Notes: The notation ln*Ot* represents the oil price index, and ln*St* represents the US stock market price index for all shares. The corresponding sign indicates the positive and negative components.

**Figure 4.** The time plot of the causality test results for the positive components at the 10% significance level.

#### **4. Conclusions**

Tests for causality in the Wiener–Granger method are regularly used in empirical research regarding the time series data in different scientific disciplines. A popular extension of this approach is the asymmetric casualty testing approach, as developed by Hatemi-J (2012). However, this approach is static by nature. A pertinent issue within this context is whether the potential asymmetric causal impacts between the underlying variables in a model are steady or not over the selected time span. In order to throw light on this issue, the current paper suggests implementing asymmetric causality tests across time to see whether the potential asymmetric causal impact is time dependent or not. This can be achieved by using subsamples via two different approaches.

An application is provided in order to investigate the potential causal connection of oil prices with the stock prices of the world's largest financial market within a time-varying setting. The results of the dynamic symmetric causality tests show that the oil prices do not cause the level of the market price index, regardless of the subsample size. However, when the dynamic asymmetric causality tests are implemented, the results show that positive oil price changes cause a positive price change in the stock market using certain subsamples. In fact, if only three fewer observations are used compared to the full sample size, the results show that there is causality from the oil price increase to the stock market price increase. Conversely, if the full sample size is used (i.e., only three more degrees of freedom), no causality is found. This shows that it can indeed be important to make use of the dynamic causality tests in order to see whether or not the causality result is robust across time.

It should be pointed out that an alternative method for estimating and testing the time-varying asymmetric causality tests is to make use of the [33] filter within a multivariate setting. However, this method might not be operational if the dimension of the VAR model is rather high and/or the lag order is large.

The time-varying asymmetric causality tests results can shed light on whether the causal connection between the variables of interest is general or time dependent. This has important practical implications. If the causal connection changes across time, then the decision or policy based on this causal impact needs to be time dependent as well. This is the case because a static strategy is likely to be inefficient within a dynamic environment.

At the end, the following English proverb that has been quoted by Wiener (1956) [1] says it all.

*"For want of a nail, the shoe was lost; For want of a shoe, the horse was lost; For want of a horse, the rider was lost; For want of a rider, the battle was lost; For want of a battle, the kingdom was lost!"*

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** FRED DATA BASE, https://fred.stlouisfed.org (accessed on 26 June 2021).

**Acknowledgments:** This paper was presented at the 8th International Conference on Time Series and Forecasting (ITISE 2022), Spain. The author would like to thank the participants for their comments. The usual disclaimer applies, nevertheless.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A**

**Table A1.** Test results for multivariate normality and multivariate ARCH in the VAR model.


Notes: (1) ln*Ot* signifies the oil price index and ln*St* indicates the US stock price index for all shares. The vector 5 (ln *St*) <sup>+</sup>, (ln*Ot*) +6 denotes the cumulative partial sum of the positive changes, and the vector 5 (ln *St*) <sup>−</sup>, (ln*Ot*) −6 indicates the cumulative partial sum of the negative changes. (2) The multivariate test of [36] was implemented for testing the null hypothesis of multivariate normality in the residuals in each VAR model. (3) The multivariate test of [37] was conducted for testing the null hypothesis of no multivariate ARCH (1).

**Table A2.** Dynamic symmetric causality test results at the 5% significance level. (H0: The oil price does not cause the stock market price index).



**Table A2.** *Cont.*

DENOTATIONS: SSP: The subsample period. CV: the critical value; TVpCV: the test value per the critical value; TVpCV = (test value)/(bootstrap critical value at the given significance level). If the value of TVpCV > 1, it implies that the null hypothesis of no causality is rejected at the given significance level.

**Table A3.** Dynamic symmetric causality test results at the 10% significance level. (H0: The oil price does not cause the stock market price index).


DENOTATIONS: SSP: the subsample period; CV: the critical value; TVpCV: the test value per the critical value; TVpCV = (test value)/(bootstrap critical value at the given significance level). If the value of TVpCV > 1, it implies that the null hypothesis of no causality is rejected at the given significance level.


increase does not cause an increase in the stock market price index).

DENOTATIONS: SSP: the subsample period; CV: the critical value; TVpCV: the test value per the critical value; TVpCV = (test value)/(bootstrap critical value at the given significance level). If the value of TVpCV > 1, it implies that the null hypothesis of no causality is rejected at the given significance level.

**Table A5.** Dynamic asymmetric causality test results at the 5% significance level. (H0: An oil price decrease does not cause a decrease in the stock market price index).



**Table A5.** *Cont.*

DENOTATIONS: SSP: the subsample period; CV: the critical value; TVpCV: the test value per the critical value; TVpCV = (test value)/(bootstrap critical value at the given significance level). If the value of TVpCV > 1, it implies that the null hypothesis of no causality is rejected at the given significance level.

**Table A6.** Dynamic asymmetric causality test results at the 10% significance level. (H0: An oil price decrease does not cause a decrease in the stock market price index).


DENOTATIONS: SSP: the subsample period; CV: the critical value; TVpCV: the test value per the critical value; TVpCV = (test value)/(bootstrap critical value at the given significance level). If the value of TVpCV > 1, it implies that the null hypothesis of no causality is rejected at the given significance level.


**Table A7.** Dynamic asymmetric causality test results at the 5% significance level. (H0: An oil price increase does not cause an increase in the stock market price index.).

DENOTATIONS: SSP: the subsample period; CV: the critical value; TVpCV: the test value per the critical value; TVpCV (test value)/(bootstrap critical value at the given significance level). If the value of TVpCV > 1, it implies that the null hypothesis of no causality is rejected at the given significance level.

#### **References**


### *Proceeding Paper* **Coarse Grain Spectral Analysis for the Low-Amplitude Signature of Multiperiodic Stellar Pulsators †**

**Sebastià Barceló Forteza 1,***∗***, Javier Pascual-Granado 2, Juan Carlos Suárez 1, Antonio García Hernández <sup>1</sup> and Mariel Lares-Martiz <sup>2</sup>**


**Abstract:** Coarse Grain Spectral Analysis (CGSA) can explain the possible multiscaling nature of the thousands of low-amplitude peaks observed in the power spectra of some pulsating stars. Spacebased observations allowed for the scientific community to find this kind of structure thanks to their long-duration, high-photometric precision and duty cycle compared to observations from the ground. Although these time series are far from perfect (outliers, trends, gaps, etc.), we used our own data preprocessing method, known as the 2K+1 stage interpolation method, to improve the background noise up to a factor 14, avoiding spurious effects. We applied both techniques, the 2K+1 stage method and the CGSA analysis, to shed some light on a real problem regarding stellar seismology: finding the physical nature of the low-amplitude signature for multiperiodic stellar pulsators.

**Keywords:** time series analysis; data preprocessing methods; spectrum analysis; multiscaling; applications in real problem (stellar seismology)

### **1. Introduction: Harmonic or Fractal Time Series?**

The observed flux in a classical multiperiodic pulsating star time series can be considered as the contribution of several harmonic terms *i*,

$$F \approx \sum\_{i} A\_{i} \sin \left( 2\pi \nu\_{i} t + \phi\_{i} \right) + N \tag{1}$$

characterized by their frequency *νi*, amplitude *Ai*, and phase *φi*; and noise (*N*). Once spacebased telescopes allowed for long-duration, high-precision and duty-cycle time series to be obtained, the number of detected low-amplitude harmonics explosively increased [1]. The debate of their nature not only includes a physical origin but also a cascade error due to a flawed analysis [2]. One of these cases may be the presence of a self-affine signal, *y*(*t*),

$$F \approx \sum\_{i} A\_{i} \sin \left(2\pi\nu\_{i}t + \phi\_{i}\right) + \lambda^{H} y(\lambda t) \tag{2}$$

where *λ* is the scale factor, and *H* is the so-called Hausdorff exponent, characterizing long-term correlations and the type of self-affinity in a time series. This term has to be taken as a statistical meaning, so that the scaling relationship holds when one performs appropriate measures on mean values over pairs of points at the same distance or over equal-length subseries or windows. It also includes white noise.

Using the Coarse Grain Spectral Analysis (CGSA), we can separate the harmonic from the fractal contribution and calculate their percentage on the time series [3]. Nevertheless, a constant cadence is required, without gaps, which cannot be achieved in real, raw observations.

**Citation:** Barceló Forteza, S.; Pascual-Granado, J.; Suárez, J.C.; García Hernández, A.; Lares-Martiz, M. Coarse Grain Spectral Analysis for the Low-Amplitude Signature of Multiperiodic Stellar Pulsators. *Eng. Proc.* **2022**, *18*, 42. https://doi.org/ 10.3390/engproc2022018042

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 22 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **2. Methodology: Fractal Analysis of a Gaped Time Series**

To study their power spectra, we subtracted the pulsations with an improved method based on iterative sine wave fitting [4] called the 2K+1 stage method [5,6]. We used the subtracted signal,

$$S\_{i} \equiv \frac{rms\_{i} - rms\_{i+1}}{rms\_{0}},\tag{3}$$

where *rmsi*+<sup>1</sup> is the root mean square of the residuals to stop the iterative process and finish each stage [5]. This alarm avoids a cascade error since its value should always be positive. The first 2K interpolation stages filled the gaps with the information subtracted for the previous stage, but started the frequency analysis from the beginning, avoiding spurious peaks due to minimum differences (see Figure 1). The last stage considers all datapoints for the frequency analysis. Three stages allow an improvement up to a factor 14 of the background noise for real, high-duty-cycle light curves (≈0.9) [7]. More stages are needed for lower cases. Figure 1 shows the improvement in the detection of simulated time series with ≈1400 typical oscillations with different expected duty cycles.

**Figure 1.** Number of detected peaks at each stage for our interpolation method. Each line is a simulated time series with ≈1400 typical oscillations from high (0.95, red) to low (0.68, purple) duty cycles. From [6].

At the final stage, we were able to properly study a higher number of lower-amplitude harmonics with lower *Si*. The CGSA should increase if we subtracted a legitimate harmonic [8]. However, this should decrease if we found a fractal contribution, since a new sinusoid would be added instead of subtracted.

We calculated the CGSA after the subtraction of all peaks with consecutive lower *Si* level to observe if the contribution of the remaining is harmonic, especially for the lowest (*logSi* ≤ −2), which corresponds to the low-amplitude signature for this kind of pulsating star [9].

#### **3. Results: Applying the Method to a Real Pulsating Star**

We analysed a real star observed with the TESS space telescope for approximately one year (see Figure 2). Around 65% of the remaining signal becomes fractal before lowamplitude peaks are subtracted. Nevertheless, the CGSA does not decrease for those low-amplitude subtracted signal. Therefore, no fractal signature is detected down to noise. Repeating this analysis for more multiperiodic stars observed by TESS will allow for us to characterize these variations and ascertain their nature [6].

**Figure 2.** CGSA for the remaining signature after subtracting all peaks with signals higher than *logSi* (black). The red dashed–dotted line points to the remaining signature of low-amplitude peaks and noise after removing intermediate-amplitude variations.

**Author Contributions:** Conceptualization, S.B.F., J.P.-G., J.C.S.; methodology, S.B.F., J.P.-G.; investigation, all authors; original draft preparation, S.B.F.; review and editing, all authors. All authors have read and agreed to the published version of the manuscript.

**Funding:** S.B.F. and J.C.S. received financial support from the Spanish State Research Agency (AEI) Projects No. PID2019-107061GB-C64: "Contribution of the UGR to the PLATO2.0 space mission. Phases C/D-1". J.P.-G. and M.L.-M. acknowledge funding support from Spanish public funds for research under project ESP2017-87676-C5-5-R. A.G.H. acknowledges funding support from 'European Regional Development Fund/Junta de Andalucía-Consejería de Economía y Conocimiento' under project E-FQM-041-UGR18 by Universidad de Granada.

**Acknowledgments:** The authors wish to thank *TESS* team whose efforts made these results possible. Funding for the TESS mission is provided by the NASA Explorer Program. S.B.F. also thanks the resources received from the PLATO project collaboration with Centro de Astrobiología (PID2019- 107061GB-C61). J.P.-G. and M.L.-M. acknowledge financial support from the State Agency for Research of the Spanish MCIU through the "Center of Excellence Severo Ochoa" award to the Instituto de Astrofísica de Andalucía (SEV-2017-0709).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Engineering Proceedings* Editorial Office E-mail: engproc@mdpi.com www.mdpi.com/journal/engproc

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-5452-5