*Proceeding Paper* **PV Energy Prediction in 24 h Horizon Using Modular Models Based on Polynomial Conversion of the L-Transform PDE Derivatives in Node-by-Node-Evolved Binary-Tree Networks †**

**Ladislav Zjavka \* and Václav Snášel**


† Presented at the 8th International Conference on Time Series and Forecasting, Gran Canaria, Spain, 27–30 June 2022.

**Abstract:** Accurate daily photovoltaic (PV) power predictions are challenging as near-ground atmospheric processes include complicated chaotic interactions among local factors (ground temperature, cloudiness structure, humidity, visibility factor, etc.). Fluctuations in solar irradiance resulting from the cloud structure dynamics are influenced by many uncertain parameters, which can be described by differential equations. Recent artificial intelligence (AI) computational tools allow us to transform and post-validate forecast data from numerical weather prediction (NWP) systems to estimate PV power generation in relation to on-site local specifics. However, local NWP models are usually produced each six hours to simulate the progress of main weather quantities in a medium-scale target area. Their delay usually covers several hours, further increasing the inadequate operational quality required in PV plants. All-day prediction models perform better, if they are developed with the last historical weather and PV data. Differential polynomial neural network (D-PNN) is a recently designed computational method, based on a new learning approach, which allows us to represent complicated data relations contained in local weather patterns to account for irregular phenomena. D-PNN combines two-input variables to split the partial differential equation (PDE), defined in the general order k and n variables, into partition elements of two-input node PDEs of recognized order and type. The node-determined sub-PDEs can be easily converted using operator calculus (OC), in several types of predefined convert schemes, to define unknown node functions expressed in the Laplace images form Application of the inverse L-transformation formula to the L-converts results in obtaining the prime function originals. D-PNN elicits a progressive modular tree structure to assess one-by-one the optimal PDE node solutions to be inserted in the sum output of the overall expanded computing model. Statistical modular models are the result of learning schemes of preadjusted day data records from various observational localities. They are applied after testing to the rest of unseen daily series of known data to compute estimations of clear-sky index (CSI) in the 24 h input-delayed time-sequences.

**Keywords:** modelling dynamics; differential learning; multinomial tree; operator calculus; conversion scheme; Laplace image

#### **1. Introduction**

Stochastic photovoltaic power (PVP) production can hardly be predicted when relying solely on NWP data units, which cannot fully account for local anomalies in the near-ground surface terrain [1]. NWP utilities try to integrate their output with sky- or ground-produced image data patterns to localize and identify the development of cloud structure formations on a regional scale [2]. Middle-time horizon NWP forecasting can be fused with clear sky data modelling and re-analyzed to determine the optimal interval sizes of training samples.

**Citation:** Zjavka, L.; Snášel, V. PV Energy Prediction in 24 h Horizon Using Modular Models Based on Polynomial Conversion of the L-Transform PDE Derivatives in Node-by-Node-Evolved Binary-Tree Networks. *Eng. Proc.* **2022**, *18*, 34. https://doi.org/10.3390/ engproc2022018034

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 27 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Distribution data based on forecast errors, obtained for various cloudiness situations, can be used to define probability intervals in prediction days and to represent inhomogeneous weather states. An increase in the modular temperature of photovoltaic panels results in a decrease in efficient power production and irradiance conversion [3]. AI computing is not focused on the direct recognition of particular physical development on multilayer atmosphere scales. They usually use the available size of data records in modelling or analyzing local atmospheric progress. Statistics can be applied in short-time prediction of, or need to fuse combine their data processing with, NWP forecasting in a larger day horizon. Irradiance inputs and target PVP output sequences in a day-determined range can be used in statistical analyses or AI learning. The statistics models were applied to 24–48 h middle-scale data to post-process the system forecasts and compute PVP at all input–outputrelated sequence times. The conversion of NWP data has some drawbacks; the model functionality shows a great dependency on the applicability and reliability of the forecast data in processing [4]. The quality of AI-based learning predictions can essentially differ; the statistical procedure must undergo appropriate initialization and testing. Many AI methods show some limitations, for example, model unfitting or inadequate generalization related to local constraints of PVP plant designs. AI solutions can process additional data features or use extra units (cloudiness motion, spectral analysis) to recognize character of fluctuations or type structures in cloudiness formation processes. Popular hybrid techniques fuse different computing or processing approaches, additionally combining the unique keymodel features to innovate the inadequate performance. Multistep forecasting can be direct or iterative (or a combination). In the first, the values are forecasted all at once at different times, contrary to the second approach, where iteratively predicted data at the previous time steps are supplied for the input vectors at the following computing time sequences. Additional solar features can be used, e.g., clearness factor, cloudiness index, aerosol and visibility parameter, turbidity, sun angle, or azimuth. The summation ensembles form a weighted model output to compute the final result according to the predetermined weather type or probabilistic reliability. If single-model forecasts differ in a large measure, the overall output is given a significant uncertainty. In contrast, when the results of the ensemble models show a large similarity, the uncertain character of the output model is greatly reduced. Probability intervals in the model output define an uncertainty range in computing reliable data estimations under different initial constraint statistics density.

The new differential learning is used to gradually elicit structures based on the form of a polynomial neural network (PNN), initially using one node, to partition and transform the general PDE of n variables into summation modules of single PDEs in blocks of two-variable nodes. The rational components are obtained in the form of Laplace transform images of unknown node functions, formed according to operator calculus (OC) procedures [5]. The inverse L-operation is applied to the node-produced rational components, based on the OC expression. Unknown node originals are obtained to be summed in the overall output of the n-variable PDE model. D-PNN combines and extracts the most applicable node input couples in each layer, step by step, to generate modular subcomponents from the originally defined PDEs and extend with them the iterative solution to reach the output error minima. The appropriate testing procedure can employ the external complement scheme to allow adaptation and adjustment of those node components, minimizing all computed errors in training and testing defined by the modelling constraint of a problem [6]. Historical observations with 24 h delay between input and output data are processed to pre-assess the optimal lengths of daily intervals, which enable us to obtain the operable statistics predictions in a day horizon. Finally, the tested models can compute their output sequences in the defined time horizon in relation to the processed last-day input data and the learned patterns included in the determined data sample range.

#### **2. 24 h Sequenced PV-Output Prediction Based on Data Record Statistics**

AI regression was used in training the 24 h delay between input–output data patterns (blue, left), selected in the optimal day-range sequences, by the initial assessment model. The statistic models were developed in multisite data processing to apply the last known observation input series to forecast the complete day light cycle of PV power production in the applied 24 h trained prediction horizon (red, right). PVP production in the next

24 h is calculated by processing series at the related times in sequence data in the previous day. Initial assessment models examine data intervals, gradually increasing their days, to pre-define the optimal training process. Their computed output is compared one by one with the time-related data of the target CSI in the reserved part, to obtain the best accuracy. Error minima determine the approximate number of applicable day records that can be used in training. The recognized day-data patterns show mostly a degree of similarity compared to the latest observational hours used in the final testing. This procedure can successfully obtain up-to-date models that can process the unknown last-day input series (Figure 1) in most day-experiment instances.

**Figure 1.** CSI day-sequence forecasting by models (red **right**) resulting from training for the initially assessed day-data patterns in an all-day input–output horizon (blue **left**).

The clear-sky index (CSI) is a relative solar parameter defined by ratio of the real to the ideal PVP considering total clear sky at any day time cycles. It is necessary to compute the output of the model in training and prediction [7] regardless of changes in the actual values, directly related to the PVP cycles in a time. PVP pattern change considerably as a result of different atmospheric conditions and local anomalies (e.g., cloudiness type, humidity character, wind gusts, etc. Overnight sudden changes in the previous pattern can cause difficulties in training. The statistically developed model may be completely out of date, unable to process unrecognized data series in prediction. NWP processing data can be applied in these sudden cases [5]. Figure 2 shows the frontal change in PVP patterns on 3 May.

**Figure 2.** Model development, testing and CSI final output computing (red, right) related to changing PVP day patterns in the fixed 24 h input–output time delay (blue, left).

#### **3. PDE Conversion and L-Transformation Using Operator Calculus**

D-PNN evolves binary-tree structures to partition the generally defined PDE of *n* variables into determined-order sub-PDEs of two variables (using data samples). The simple PDEs are L-transformed using OC definitions into rational components, forming the Laplace images for the node-searched unavailable summation term series. Inverse members determined by L-restoration are used to calculate the overall output to model the separable function of *n* variables, initially defined only by the PDE [8].

$$a + bu + \sum\_{i=1}^{n} c\_i \frac{\partial u}{\partial \mathbf{x}\_i} + \sum\_{i=1}^{n} \sum\_{j=1}^{n} d\_{ij} \frac{\partial^2 u}{\partial \mathbf{x}\_i \partial \mathbf{x}\_j} + \dots = \mathbf{0} \tag{1}$$

$$
\mu = \sum\_{k=1}^{\infty} u\_k \tag{2}
$$

where the following are defined: *u*(*x*1, *x*2, ... , *xn*)—unknown separable function of n-input variables; *a*, *b*, *ci*, *dij*,...—weights of terms; *ui*—partial sum functions.

The general form of PDE (1) can be used to describe a problem-separable *u* function of *n* inputs, which can be formulated as a non-limited convergent series (2) of simplified functions *uk* of two variables initially defined by sub-PDEs (3) in an equality of eight variables.

$$F\left(x\_1, x\_2, u, \frac{\partial u}{\partial x\_1}, \frac{\partial u}{\partial x\_2}, \frac{\partial^2 u}{\partial x\_1^2}, \frac{\partial^2 u}{\partial x\_1 \partial x\_2}, \frac{\partial^2 u}{\partial x\_2^2}\right) = 0 \tag{3}$$

where *uk* are node partial sum functions of an unknown separable function *u*.

The adapted polynomial conversion using OC formulas for the derivatives *f*(*t*) described by an ordinary differential equation (ODE) is based on the proposition that the Laplace transformation is applicable in the case of known initial conditions (4).

$$L\left\{f^{(n)}(t)\right\} = p^n F(p) - \sum\_{k=1}^n p^{n-i} f\_{0+}^{(i-1)} \quad L\{f(t)\} = F(p) \tag{4}$$

where the following are defined: *f*(*t*), *f'*(*t*), ... , *f* (*n*) (*t*)—originals continuous in <0+, ∞>; *p, t—*complex and real variables.

Polynomial conversion applied to ODE derivatives results in a set of Equation (4), where the Laplace transform *F*(*p*) can be formulated through the complex conjugate *p*. *F*(*p*) is separated in a rational term (5) to define the L-image of the function *f*(*t*). The inverse transform of OC of the ration term restores the original *f*(*t*) of a real variable *t* (5).

$$F(p) = \frac{P(p)}{Q(p)} = \frac{Bp + C}{p^2 + ap + b} = \sum\_{k=1}^{n} \frac{A\_k}{p - a\_k} \tag{5}$$

where the following are defined: *B, C, Ak—*coefficients of elementary fractions; *a,b* polynomial parameters.

$$F(p) = \frac{P(p)}{Q(p)} = \sum\_{k=1}^{n} \frac{P(a\_k)}{Q\_k(a\_k)} \frac{1}{p - a\_k} \qquad f(t) = \sum\_{k=1}^{n} \frac{P(a\_k)}{Q\_k(a\_k)} e^{a\_k \cdot t} \tag{6}$$

where the following are defined: *αk—*simple real roots of the multinomial; *Q*(*p*), *F*(*p*)—Ltransform image.

Rational components (8), formed to express the Laplace image terms of *uk* functions (2), which are not available, result from the GMDH polynomials (7) composed of binary nodes of a PNN tree structure (Figure 3) in the PDE converting procedure. The inverse L operation is necessary (8) in the PNN nodes according to the definition of OC (6). The sum of originals determined by binary nodes in the partial *uk* is used to calculate the overall model *u* (2) [8]. Each D-PNN node, using the GMDH output, groups possible simple or composite member solutions (8) for specific two-input sub-PDEs. The selected components are included in the D-PNN output sum to better converge to the target output.

$$y = a\_0 + a\_1 \mathbf{x}\_{\dot{i}} + a\_2 \mathbf{x}\_{\dot{j}} + a\_3 \mathbf{x}\_{\dot{i}} \mathbf{x}\_{\dot{j}} + a\_4 \mathbf{x}\_{\dot{i}}^2 + a\_5 \mathbf{x}\_{\dot{j}}^2 \tag{7}$$

where the following are defined: *xi, xj*—two input variables of neuron nodes.

$$y\_i = w\_i \frac{b\_0 + b\_1 \mathbf{x}\_1 + b\_2 \text{sig}(\mathbf{x}\_1^2) + b\_3 \mathbf{x}\_2 + b\_4 \text{sig}(\mathbf{x}\_2^2)}{a\_0 + a\_1 \mathbf{x}\_1 + a\_2 \mathbf{x}\_2 + a\_3 \mathbf{x}\_1 \mathbf{x}\_2 + a\_4 \text{sig}(\mathbf{x}\_1^2) + a\_5 \text{sig}(\mathbf{x}\_2^2)} \cdot e^{\phi} \tag{8}$$

where the following are defined: *ϕ* = arctg(*x*1*/x*2)—phase representation of two input variables; *x*1, *x*2, *ai*, *bi*—polynomial parameters; *wi*—weights; *sig*—sigmoidal transform.

**Figure 3.** Block nodes of produce and group simple (/) and composite PDE member solutions.

The Euler formulation of conjugates in complex form (9), represents the conversion *f*(*t*) in OC (6). The radius *r* is defined as a rational element, while the angle (arctg(*x*2*/x*1)) of two real inputs is related to the inverse restoration L of *F*(*p*).

$$p = \underbrace{\mathbf{x}\_1}\_{\text{Re}} + i \cdot \underbrace{\mathbf{x}\_2}\_{\text{Im}} = \sqrt{\mathbf{x}\_1^2 + \mathbf{x}\_2^2} \cdot e^{i \cdot \arctan(\frac{\mathbf{x}\_2}{\mathbf{x}\_1})} = r \cdot e^{i \cdot \boldsymbol{\varphi}} = r \cdot (\cos \boldsymbol{\varphi} + i \cdot \sin \boldsymbol{\varphi}) \tag{9}$$

#### **4. PDE Partition in Backward Tree Structures of D-PNN Layers**

D-PNN forms progressive binary-tree structures, using composite processing functions (7) in PNN nodes, by extending/modifying the last added/processed layer with selected blocks nodes, one by one. Node blocks in secondary subsequent tree layers can form composite term (CT) products, in addition to one-fraction neurons (8). CTs are products consisting of adjustable neurons, i.e., sub-PDE converts, selected in the back-attached production blocks in the backward linked tree structure layers (Figure 4). CTs represent composite sub-PDE solutions for the node-unavailable *uk* function series in the form of a product that includes images of external and internal functions commonly expressed by the derivation rules (11).

$$F(\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) = f(z\_1, z\_2, \dots, z\_m) = f(\phi\_1(\mathbf{X}), \phi\_2(\mathbf{X}), \dots, \phi\_m(\mathbf{X})) \tag{10}$$

$$\frac{\partial F}{\partial \mathbf{x}\_k} = \sum\_{i=1}^m \frac{\partial f(z\_1, z\_2, \dots, z\_m)}{\partial z\_i} \cdot \frac{\partial \phi\_i(\mathbf{X})}{\partial \mathbf{x}\_k} \,\, k = 1, \,\dots, n \tag{11}$$

For example, a block in the third layer can form additional CTs using products of sub-PDE ratio converts (12), that is, the simple neuron images of two and four back-linked tree blocks in the previous two layers (Figure 4).

$$y\_{31} = w\_{31} \cdot \frac{b\_0 + b\_1 \mathbf{x}\_{21} + b\_2 \mathbf{x}\_{21}^2 + b\_3 \mathbf{x}\_{22} + b\_4 \mathbf{x}\_{22}^2}{a\_0 + a\_1 \mathbf{x}\_{21} + a\_2 \mathbf{x}\_{22} + a\_3 \mathbf{x}\_{21} \mathbf{x}\_{22} + a\_4 \mathbf{x}\_{21}^2 + a\_5 \mathbf{x}\_{22}^2} \cdot \frac{b\_0 + b\_1 \mathbf{x}\_{12} + b\_2 \mathbf{x}\_{12}^2}{a\_0 + a\_1 \mathbf{x}\_{11} + a\_2 \mathbf{x}\_{12} + a\_3 \mathbf{x}\_{11} \mathbf{x}\_{12} + a\_4 \mathbf{x}\_{11}^2 + a\_5 \mathbf{x}\_{12}^2} \cdot \frac{\mathbf{P}\_{12}(\mathbf{x}\_{1}, \mathbf{x}\_2)}{\mathbf{Q} \mathbf{z}(\mathbf{x}\_{1}, \mathbf{x}\_2)} \cdot \mathbf{\mathcal{C}}^{\theta\_{12}} \tag{12}$$

where the following are defined: *Qij*, *Pij* = output and reduced multi-nomial of *n* and *n* − 1*th* degree; *ykp—pth* Composite Term (CT); *ϕ*<sup>21</sup> = arctg(*x*11*/x*13); *ϕ*<sup>31</sup> = arctg(*x*21*/x*22)*ϕ*.

The number of possible CT combinations in blocks doubles along with each backjoined preceding layer (Figure 4). Neurons in each next layer can produce more composite partial model components, using the block node outputs and PDE converts.

**Figure 4.** D-PNN searches for the most valuable input couple node blocks to generate sum PDE components, neurons and CTs, possible to insert in the overall model.

The summation model output *Y* is the arithmetic mean of active neurons + CT, produced in node blocks, which optimizes the adjustment and selection of PDE terms (13).

$$Y = \frac{1}{k} \sum\_{i=1}^{k} y\_i \tag{13}$$

where *k =* the number of active neurons or CTs (node PDE solutions).

Multi-objective models based on procedures can be used to optimize performance of the back-production of selected neurons and CTs in the tree-like block structure (Figure 4).

D-PNN searches for the most relevant combination couples in each input layer, analogously to the principles of GMDH [6], to form and rearrange adequate PDE components acceptable by the model. Polynomial coefficients and weights of terms are partially optimized using the gradient method [9] in each iteration tree cycle. The algorithm randomly skips the block nodes, one by one, to select and update the applied neurons or CTs involved in the complete model. The training error is calculated in relation to a continuously performed test based on the external complement restraint evaluation. This approach allows only inserting or adapting PDE components that comply with the testing restrictions. The root mean squared error (RMSE) is a gradually minimized measure of the model approximation ability at each step of training (Figure 1).

#### **5. PVP Prediction Using the AI Pre-Determined Training Sequences in 24 h Horizon**

PVP was forecasted at the Starojcka Lhota plant located in the North-East Moravian region of the Czech Republic, using historical measurements of ground environment temperature, PVP, and irradiance along with multisite spatial meteorological data of wind parameters from three nearby regional wind farms located in Maletin, Drahany, and Veseli nad Moravou. These data sets were supplied with free access airport weather observations (avg. ground temperature, avg. humidity, avg. atm. pressure, avg. wind parameters and visibility conditions) of two ground-based stations located in Brno-Turany and Ostrava-Monov (Figure 5). These standard variables were used in the development of models to predict the complete day-ahead PVP cycle data in the 24 h input–output delay. Detailed minute power measurements of the PV plant and wind farms were averaged and extrapolated to correspond to a half-hour meteorological data series from airport weather observation stations [10].

**Figure 5.** PVP plant and wind farm allocation in measurements and meteo-observations.

Figure 6 shows PVP forecast results of the D-PNN, Statistics and Machine Learning Tool-Box (SMLT) for Matlab regression [11], and persistent models in the 24 h horizon in specific demonstration days of the examined 2-week interval from 12–25 May 2011. The SMLT day-ahead final forecasting models were chosen considering their test error minima, obtained from the reserved part of available set data. The optimal training days, including applicable data patterns, were predetermined initially by examination of data in the previous gradually increasing training-day intervals, using a simplified model form, the same as using D-PNN (Figure 1). The SMLT forecast models resulting from Gaussian Process Regression (GPR), Support Vector Machine (SVM), and Ensemble Boosted/Bagged Tree (EBT), using the same input–output data sample variables in the next day 24 h CSI sequence prediction, obtained with the testing RMSE minima which differ only in a slight measure. Persistent comparative models, average CSI in processing series over the determined number of the last-day intervals, were used to estimate the approximation series at the responding PVP cycle day time. These oversimplified forecast benchmark solutions could be considered as daily error-reference minima.

**Figure 6.** St. Lhota, 16 May 2011: variable cloudiness (catching and sunny periods follow); RMSE: D-PNN = 60.09; Persistent = 89.51; SMLT = 78.1 kWp 30 min.

The ratios of the averaged PVP series and the difference values show the relative involvement in the day PV patterns (Figure 7). Slight variations in the data indicate a smooth plain in PVP cycles in settled sunny weather. These phenomena are contrary to the ramping fluctuations in PVP day series, mostly in cloudy weather (Figure 6). The prediction errors in day ahead PVP estimations may be related to relative ratios and absolute differences obtained on PVP data series (Figure 7; they can denote a pattern complexity in daily PVP cycle series in consideration of the PV power average/maxima. Figure 8 presents the mean R2 determination parameter in the prediction of PV power 24 h in advance. The models of D-PNN and SMLT better perform in case of different estimations in the optimal training data sample sequences in comparison to the benchmark reference solutions (Figure 9), i.e., the varying number of the training data records can result in analogous prediction series (Figure 8).

**Figure 7.** The daily average PV series ratios are dimensionless and characterize daily data patterns regard-less of the absolute power values, 12–25 May 2011.

**Figure 8.** The 2-week daily average PVP prediction coefficient of determination R2: D-PNN = 084, Persistent = 0.787, SMLT = 0.79.

The reference models mostly perform worse in changeable cloudy weather; however, lower errors in the 24 h persistent prediction can be obtained in days with a higher degree of similarity in patterns indicated by smooth plain PVP cycles (no ramping events). The applied AI models show slight inaccuracies in this specific condition if the last-day unsettled changeable weather intervals are suddenly broken by an overnight front, resulting in a sunny-day character (the second week in the examined data period). SMLT models obtain more accurate prediction data, as compared to D-PNN, in days of settled stable weather conditions, including a few previous days. The AI predictions can sometimes obtain an

increase in the final model accuracy in the early afternoon (Figure 6). These phenomena are directly related to particular weather patterns and characteristics of training data samples in days of forecasts. An increase in forecast errors in the first week data is related to unsettled weather conditions resulting in more complex data patterns not fully applicable to AI learning. Output errors denote the operability of finally tested prediction models in processing the reserved last observed data sequences applied on the 24 h horizon.

**Figure 9.** Daily initialization time of the 24 h PVP prediction models.

#### **6. Conclusions**

Estimated day sequence optima in the training data eliminate sudden variations and rapid changes in local weather situations. New approaches applicable in a more adequate selection of data sample records can try to recognize the frontal overnight change breaks in weather progress. The following data intervals can be applied in training, in place of the initial model identification scheme. Data patterns can be re-analyzed in a longer time interval (if available the observations) to extract adequate training data records according to the pattern similarity, one by one, in order to obtain the test error minima, not necessarily considering the sequence in the day time.

**Author Contributions:** Conceptualization, L.Z. and V.S.; methodology, L.Z.; software, L.Z.; validation, V.S.; formal analysis, L.Z.; investigation, V.S.; resources, V.S.; data curation, L.Z.; writing original draft preparation, L.Z.; writing—review and editing, L.Z.; visualization, L.Z.; supervision, V.S.; project administration, V.S.; funding acquisition, V.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data and software: https://drive.google.com/drive/folders/1ZAw8 KcvDEDM-i7ifVe\_hDoS35nI64-Fh.

**Acknowledgments:** The work was supported by SGS, VSB—Technical University of Ostrava, Czech Republic, under the grant "Parallel processing of Big Data IX" [No\SGS2022/12].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Proceeding Paper* **Modelling the Number of Daily Stock Transactions Using a Novel Time Series Model †**

**Sunecher Yuvraj**

Department of Accounting, Finance and Economics, Faculty of Business Management and Finance, University of Technology Mauritius, Port-Louis 11108, Mauritius; ysunecher@umail.utm.ac.mu

† Presented at the 8th International Conference on Time Series and Forecasting,

Gran Canaria, Spain, 27–30 June 2022.

**Abstract:** This paper focusses on the impact of the COVID-19 on the Stock Exchange of Mauritius (SEM) by modelling the number of daily stock transactions of two banks. Hence, a non-stationary bivariate integer-valued autoregressive and moving average of order 1 (BINARMA (1,1)) process with COM-Poisson (CMP) innovations (BINARMA (1,1) CMP) is introduced. The conditional maximum likelihood (CML) approach is used to estimate the model parameters. The novel model is applied on the intra-day trading of two banking stocks.

**Keywords:** non-stationary; autoregressive; moving average; COM-poisson; CML

#### **1. Introduction**

The Stock Exchange of Mauritius Ltd. (SEM, Port-Louis, Mauritius) started trading on 30 March 1989 as a private limited company with the responsibility of promoting a proficient and well-regulated stock market in Mauritius. The SEM changed its status on 6 October 2008 to operate as a public company and has all these years left any stone unturned to be the leading stock exchange market in the African continent. To date, the SEM has positioned itself as an essential capital raising platform for nearly 61 companies operating in the financial, construction, leisure, agricultural and other sectors of the economy. In its internationalization process, the SEM has set-up a multi-currency listing, trading and financial platform and has modernized its listing framework with different multi-asset class financial products. In 2010, the SEM embarked in a new journey by changing its strategic direction and started an internationalization of its operational and regulatory framework. To date, the market capitalization and the annual turnover of the SEM are approximately USD 7.5 billion and USD 302 billion, respectively.

The start of the financial year 2020–2021 was affected by the direct impact of the COVID-19 followed by the first nationwide sanitary confinement from March 2020 to May 2020. With the upliftment of the confinement, the Mauritian economy gradually started its recovery pathway. However, a second nationwide confinement was announced in March 2021 due to the resurgence of the COVID-19 cases with a partial de-confinement plan as from May 2021 up till now. Undeniably, the impact of the COVID-19 was severe such that the SEM has been navigating mostly in the red zone during this pandemic period. In the year 2021, the market started recovering following the announcements of the vaccines against the COVID-19, but the recovery was again affected by the second confinement. Hence, modelling the number of intra-day transactions on the stock market is of upmost importance. Several researchers have also elaborated on the potential covariates that affect these daily stock transactions: the impact of news entering the market, the time of day effect and the day effect [1,2]. Time of the day effect and the impact of news on the market have proved to influence the intensity of daily trading on the stock market [3]. However up till now, there has not been any studies that have incorporated the cross-

**Citation:** Yuvraj, S. Modelling the Number of Daily Stock Transactions Using a Novel Time Series Model. *Eng. Proc.* **2022**, *18*, 35. https:// doi.org/10.3390/engproc2022018035

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 27 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

correlation between two competing companies affected by the above covariates as well as the COVID-19 news effect.

Quoreshi [4,5] is the first author who tried to model the number of intra-day transactions in the literature by developing a BINMA(p) process and applying the latter on the number of intra-day transactions of AstraZeneca and Ericsson B based on the Generalized Poisson distribution of the innovation series due to the over-dispersed nature of the data. Several authors have recently introduced bivariate processes under the non-stationary negative binomial (NB) and CMP innovations in the literature under either autoregressive or moving average structures (see Jowaheer et al. [6], Mamodekhan et al. [7] and Sunecher et al. [8,9]). As for the estimation of parameters, Quoreshi [4,5] estimated the regression effects using the feasible generalized least squares (FGLS) technique and concluded that the estimates of the model parameters are efficient, but the efficiency of these estimates have been questioned (see Sunecher et al. [8]). Another estimation method which has been frequently used in the literature is the generalized quasi-likelihood (GQL) method which has proved to yield more reliable estimates than FGLS. However, the likelihood-based approach provides the best estimates [10].

Based on the above findings, this paper proposes a novel bivariate integer-valued autoregressive and moving average of order 1 (BINARMA (1,1)) process under non-stationary COM-Poisson (CMP) innovation series where the model parameters are estimated using the conditional maximum likelihood (CML) approach. This novel model is then applied to the intra-day series of two banks listed on the SEM.

Hence, the paper is laid out as follows: In Section 2, the BINARMA (1,1) process with CMP innovations is developed. In Section 3, the CML approach is derived and Section 4 presents the forecasting equations. In Section 5, the BINARMA (1,1) model is applied on the number of intra-day transactions of two banks listed in SEM and is compared with two other competing models. The conclusion is presented in the last section.

#### **2. The Non-Stationary BINARMA (1,1) Process with COM-Poisson Innovations (BINARMA (1,1) CMP)**

Consider

$$Y\_t^{[1]} = \gamma\_{11} \* Y\_{t-1}^{[1]} + \gamma\_{12} \* R\_{t-1}^{[2]} + R\_t^{[1]} \tag{1}$$

$$Y\_t^{[2]} = \gamma\_{21} \* Y\_{t-1}^{[2]} + \gamma\_{22} \* R\_{t-1}^{[2]} + R\_t^{[2]} \tag{2}$$

where *γkj* ∈ (0, 1) and *γkj*∗ are mutually independent binomial thinning operators such that *<sup>γ</sup>kj* <sup>∗</sup> *<sup>Y</sup>*[*k*] *<sup>t</sup>*−<sup>1</sup> <sup>=</sup> <sup>∑</sup>*Y*[*k*] *t*−1 *<sup>i</sup>*=<sup>0</sup> *Zi* where *Zi* ∼ *Bernoulli*(*γkj*).

'◦' indicates the binomial thinning operator [11,12] such that {*γkk* ◦ *<sup>Y</sup>*[*k*] *<sup>t</sup>*−1|*Y*[*k*] *<sup>t</sup>*−1} ∼ Binomial(*Y*[*k*] *<sup>t</sup>*−1, *<sup>γ</sup>kk*) where:


As for the innovation terms, *Corr*(*R*[1] *<sup>t</sup>* , *<sup>R</sup>*[2] *<sup>t</sup>* ) = *<sup>α</sup>*<sup>12</sup> where (*R*[1] *<sup>t</sup>* , *<sup>R</sup>*[2] *<sup>t</sup>* ) follows a bivariate COM-Poisson distribution with *R*[*k*] *<sup>t</sup>* <sup>∼</sup> *COM* <sup>−</sup> *Poisson*(*λ*[*k*] *<sup>t</sup>* , *<sup>ν</sup>*˜*k*). Note that *<sup>λ</sup>*[*k*] *<sup>t</sup>* = (*θ* [*k*] *<sup>t</sup>* )1/*ν*˜*<sup>k</sup>* <sup>−</sup> ( *<sup>ν</sup>*˜*k*−<sup>1</sup> <sup>2</sup>*ν*˜*<sup>k</sup>* ), where *<sup>θ</sup>* [*k*] *<sup>t</sup>* = exp(*x tβ*[*k*] ) with *x<sup>t</sup>* = [*xt*1, *xt*2, ... , *xtj*, ... , *xtp*] and *β*[*k*] = [*β*[*k*] <sup>1</sup> , *<sup>β</sup>*[*k*] <sup>2</sup> ,..., *<sup>β</sup>*[*k*] *<sup>j</sup>* ,..., *<sup>β</sup>*[*k*] *p* ] for *k* ∈ {1, 2}.

Based on the above conditions, the moments are derived as follows:

$$\begin{aligned} E(Y\_t^{[1]}) &= E(\gamma\_{11} \circ Y\_{t-1}^{[1]} + \gamma\_{12} \circ R\_{t-1}^{[1]} + R\_t^{[1]}) \\ &= E(\gamma\_{11} \circ Y\_{t-1}^{[1]}) + E(\gamma\_{12} \circ R\_{t-1}^{[1]}) + E(R\_t^{[1]}) \\ &= \gamma\_{11} E(Y\_{t-1}^{[1]}) + \gamma\_{12} E(R\_{t-1}^{[1]}) + E(R\_t^{[1]}) \\ \mu\_t^{[1]} &= E(Y\_t^{[1]}) = \gamma\_{11} \mu\_{t-1} + \gamma\_{12} \lambda\_{t-1}^{[1]} + \lambda\_t^{[1]} \end{aligned} \tag{3}$$

$$\begin{aligned} E(Y\_t^{[2]}) &= E(\gamma\_{21} \circ Y\_{t-1}^{[2]} + \gamma\_{22} \circ R\_{t-1}^{[2]} + R\_t^{[2]}) \\ &= E(\gamma\_{21} \circ Y\_{t-1}^{[2]}) + E(\gamma\_{22} \circ R\_{t-1}^{[2]}) + E(R\_t^{[2]}) \\ &= \gamma\_{21} E(Y\_{t-1}^{[2]}) + \gamma\_{22} E(R\_{t-1}^{[2]}) + E(R\_t^{[2]}) \end{aligned}$$

$$\mu\_t^{[2]} = E(Y\_t^{[2]}) = \gamma\_{21} \mu\_{t-1} + \gamma\_{22} \lambda\_{t-1}^{[1]} + \lambda\_t^{[1]} \tag{4}$$

Var(*Y*[1] *<sup>t</sup>* ) = *Var*(*γ*<sup>11</sup> ◦ *<sup>Y</sup>*[1] *<sup>t</sup>*−<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*<sup>12</sup> ◦ *<sup>R</sup>*[1] *<sup>t</sup>*−<sup>1</sup> <sup>+</sup> *<sup>R</sup>*[1] *<sup>t</sup>* ) <sup>=</sup> *Var*(*γ*<sup>11</sup> <sup>∗</sup> *<sup>Y</sup>*[1] *<sup>t</sup>*−1) + *Var*(*γ*<sup>12</sup> <sup>∗</sup> *<sup>R</sup>*[1] *<sup>t</sup>*−1) + *Var*(*R*[1] *<sup>t</sup>* ) + 2Cov(*γ*<sup>11</sup> <sup>∗</sup> *<sup>Y</sup>*[1] *<sup>t</sup>*−1, *<sup>γ</sup>*<sup>12</sup> <sup>∗</sup> *<sup>R</sup>*[1] *<sup>t</sup>*−1) <sup>=</sup> *<sup>γ</sup>*11(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*11)*E*(*Y*[1] *<sup>t</sup>*−1) + *<sup>γ</sup>*<sup>2</sup> 11Var(*Y*[1] *<sup>t</sup>*−1) + *<sup>γ</sup>*12(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*12)*E*(*R*[1] *<sup>t</sup>*−1) + *γ*<sup>2</sup> 12Var(*R*[1] *<sup>t</sup>*−1) + *Var*(*R*[1] *<sup>t</sup>* ) + <sup>2</sup>*Cov*(*γ*<sup>11</sup> <sup>∗</sup> *<sup>Y</sup>*[1] *<sup>t</sup>*−1, *<sup>γ</sup>*<sup>12</sup> <sup>∗</sup> *<sup>R</sup>*[1] *<sup>t</sup>*−1) <sup>=</sup> *<sup>γ</sup>*11(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*11)*μ*[1] *<sup>t</sup>*−<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*<sup>2</sup> 11Var(*Y*[1] *<sup>t</sup>*−1) + *<sup>γ</sup>*12(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*12)*λ*[1] *t*−1 + *γ*<sup>2</sup> 12 ⎡ ⎣ *λ*[1] *t*−1 *ν*˜*k* + *ν*˜*<sup>k</sup>* − 1 2*ν*˜<sup>2</sup> *k* ⎤ <sup>⎦</sup> <sup>+</sup> ⎡ ⎣ *λ*[1] *t*−1 *ν*˜*k* + *ν*˜*<sup>k</sup>* − 1 2*ν*˜<sup>2</sup> *k* ⎤ <sup>⎦</sup> <sup>+</sup> <sup>2</sup>*γ*11*γ*12*Cov*(*Y*[1] *<sup>t</sup>*−1, *<sup>R</sup>*[1] *<sup>t</sup>*−1) <sup>=</sup> *<sup>γ</sup>*11(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*11)*μ*[1] *<sup>t</sup>*−<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*<sup>2</sup> 11Var(*Y*[1] *<sup>t</sup>*−1) + *<sup>γ</sup>*12(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*12)*λ*[1] *t*−1 + (1 + *γ*<sup>2</sup> <sup>12</sup> + 2*γ*11*γ*12) ⎡ ⎣ *λ*[1] *t*−1 *ν*˜1 + *ν*˜1 − 1 2*ν*˜<sup>2</sup> 1 ⎤ <sup>⎦</sup>, (5)

where

$$\begin{split} \text{Cov}(Y\_{t-1}^{[1]}, R\_{t-1}^{[1]}) &= \text{Cov}(\gamma\_{11} \circ Y\_{t-2}^{[1]} + \gamma\_{12} \circ R\_{t-2}^{[1]} + R\_{t-1}^{[1]}, R\_{t-1}^{[1]}) \\ &= \text{Cov}(R\_{t-1}^{[1]}, R\_{t-1}^{[1]}) \\ &= \text{Var}(R\_{t-1}^{[1]}) \\ &= \frac{\lambda\_{t-1}^{[1]}}{\tilde{\nu}\_{1}} + \frac{\tilde{\nu}\_{1} - 1}{2\tilde{\nu}\_{1}^{2}}. \end{split} \tag{6}$$

Similarly,

$$\begin{split} \text{Var}(Y\_{t}^{[2]}) &= \text{Var}(\gamma\_{21} \ast Y\_{t-1}^{[2]}) + \text{Var}(\gamma\_{22} \ast R\_{t-1}^{[2]}) + \text{Var}(R\_{t}^{[2]}) + 2\text{Cov}(\gamma\_{21} \ast Y\_{t-1}^{[2]}, \gamma\_{22} \ast R\_{t-1}^{[2]}) \\ &= \gamma\_{21} (1 - \gamma\_{21}) \mu\_{t-1}^{[2]} + \gamma\_{21}^{2} \text{Var}(Y\_{t-1}^{[2]}) + \gamma\_{22} (1 - \gamma\_{22}) \lambda\_{t-1}^{[2]} \\ &+ (1 + \gamma\_{22}^{2} + 2\gamma\_{21}\gamma\_{22}) \left[ \frac{\lambda\_{t-1}^{[2]}}{\vec{v}\_{2}} + \frac{\vec{v}\_{2} - 1}{2\vec{v}\_{2}^{2}} \right]. \end{split} \tag{7}$$

As for the covariance for the same series,

$$\begin{split} \text{Cov}(Y\_{l}^{[1]}, Y\_{l+h}^{[1]}) &= \text{Cov}[Y\_{l}^{[1]}, (\gamma\_{11} \circ Y\_{l+h-1}^{[1]} + \gamma\_{12} \circ R\_{l}^{[1]} + R\_{l+h}^{[1]})] \\ &= \gamma\_{11} \text{Cov}[Y\_{l}^{[1]}, Y\_{l+h-1}^{[1]}] \\ &= \gamma\_{11} \text{Cov}[Y\_{l}^{[1]}, (\gamma\_{11} \circ Y\_{l+h-2}^{[1]} + \gamma\_{12} \circ R\_{l+h-2}^{[1]} + R\_{l+h-1}^{[1]})] \\ &= \gamma\_{11}^{2} \text{Cov}[Y\_{l}^{[1]}, Y\_{l+h-2}^{[1]}] \\ &\vdots \\ &= \gamma\_{11}^{h-1} \text{Cov}[Y\_{l}^{[1]}, (\gamma\_{11} \circ Y\_{l}^{[1]} + \gamma\_{12} \circ R\_{l}^{[1]} + R\_{l+1}^{[1]})] \\ &= \gamma\_{11}^{h} \text{Var}(Y\_{l}^{[1]}) + \gamma\_{11}^{h-1} \gamma\_{12} \text{Var}(R\_{l}^{[1]}) \\ &= \gamma\_{11}^{h} \text{Var}(Y\_{l}^{[1]}) + \gamma\_{11}^{h-1} \gamma\_{12} \left[\frac{\lambda\_{l-1}^{[1]}}{\bar{\nu}\_{1}} + \frac{\bar{\nu}\_{1} - 1}{2\bar{\nu}\_{1}^{2}}\right] \end{split} \tag{8}$$

and

$$\text{Cov}(Y\_t^{[2]}, Y\_{t+h}^{[2]}) = \gamma\_{21}^h \text{Var}(Y\_t^{[2]}) + \gamma\_{21}^{h-1} \gamma\_{22} \left[ \frac{\lambda\_{t-1}^{[2]}}{\vec{v}\_2} + \frac{\vec{v}\_2 - 1}{2\vec{v}\_2^2} \right]. \tag{9}$$

#### The cross-covariance are derived as follows:

$$\begin{split} \text{Cov}(Y\_{l}^{[1]}, Y\_{l}^{[2]}) &= \text{Cov}(\gamma\_{11} \circ Y\_{l-1}^{[1]} + \gamma\_{12} \circ R\_{l-1}^{[1]} + R\_{l+}^{[1]} \gamma\_{21} \circ Y\_{l-1}^{[2]} + \gamma\_{22} \circ R\_{l-1}^{[2]} + R\_{l}^{[2]}) \\ &= \gamma\_{11}\gamma\_{21} \text{Cov}(Y\_{l-1}^{[1]}, Y\_{l-1}^{[2]}) + \gamma\_{11}\gamma\_{22} \text{Cov}(Y\_{l-1}^{[1]}, R\_{l-1}^{[2]}) \\ &+ \gamma\_{12}\gamma\_{21} \text{Cov}(R\_{l-1}^{[1]}, Y\_{l-1}^{[2]}) + \text{Cov}(R\_{l-1}^{[1]}, R\_{l-1}^{[2]}) + \text{Cov}(R\_{l}^{[1]}, R\_{l}^{[2]}) \\ &= \gamma\_{11}\gamma\_{21} \text{Cov}(Y\_{l-1}^{[1]}, Y\_{l-1}^{[2]}) + (\gamma\_{11}\gamma\_{22} + \gamma\_{12}\gamma\_{21} + \gamma\_{12}\gamma\_{22})a\sqrt{\lambda\_{l-1}^{[1]}}\sqrt{\lambda\_{l-1}^{[2]}} \\ &+ a\sqrt{\lambda\_{l}^{[1]}}\sqrt{\lambda\_{l}^{[2]}}, \end{split} \tag{10}$$

$$\begin{split} \text{Cov}(Y\_t^{[1]}, Y\_{t+h}^{[2]}) &= \text{Cov}[Y\_t^{[1]}, (\gamma\_{21} \circ Y\_{t+h-1}^{[2]} + \gamma\_{22} \circ R\_{t+h-1}^{[2]} + R\_{t+h}^{[2]})] \\ &= \gamma\_{21} \text{Cov}[Y\_t^{[1]}, Y\_{t+h-1}^{[2]}] \\ &= \gamma\_{21} \text{Cov}[Y\_t^{[1]}, (\gamma\_{21} \circ Y\_{t+h-2}^{[2]} + \gamma\_{22} \circ R\_{t+h-2}^{[2]} + R\_{t+h-1}^{[2]})] \\ &= \gamma\_{21}^2 \text{Cov}[Y\_t^{[1]}, Y\_{t+h-2}^{[2]}] \\ &\vdots \\ &= \gamma\_{21}^h \text{Cov}[Y\_t^{[1]}, (\gamma\_{21} \circ Y\_t^{[2]} + \gamma\_{22} \circ R\_t^{[2]} + R\_{t+1}^{[2]})] \\ &= \gamma\_{21}^h \text{Cov}(Y\_t^{[1]}, Y\_t^{[2]}) + \gamma\_{21}^{h-1} \gamma\_{22} \text{Cov}(R\_t^{[1]}, R\_t^{[2]}) \\ &= \gamma\_{21}^h \text{Cov}(Y\_t^{[1]}, Y\_t^{[2]}) + \gamma\_{21}^{h-1} \gamma\_{22} \text{Cov}(\lambda\_t^{[1]} \sqrt{\lambda\_t^{[2]}}) \\ &= \gamma\_{21}^h \text{Cov}(Y\_t^{[1]}, Y\_t^{[2]}) + \gamma\_{21}^{h-1} \gamma\_{22} \text{Cov}(\lambda\_t^{[1]} \sqrt{\lambda\_t^{[2]}}) \end{split} \tag{11}$$

and

$$\text{Cov}(Y\_t^{[2]}, Y\_{t+h}^{[1]}) = \gamma\_{11}^h \text{Cov}(Y\_t^{[1]}, Y\_t^{[2]}) + \gamma\_{11}^{h-1} \gamma\_{12} a \sqrt{\lambda\_t^{[1]}} \sqrt{\lambda\_t^{[2]}}.\tag{12}$$

**Remark 1.** *Under stationary moment conditions, replacing t* = 1 *in Equations (3)–(7) and Equation (10), we have*

$$
\mu\_1^{[1]} = E(Y\_t^{[1]}) = \frac{(\gamma\_{12} + 1)\lambda\_1^{[1]}}{1 - \gamma\_{11}} \tag{13}
$$

$$
\mu\_1^{[2]} = E(Y\_t^{[2]}) = \frac{(\gamma\_{22} + 1)\lambda\_1^{[2]}}{1 - \gamma\_{21}} \tag{14}
$$

$$Var(Y\_1^{[1]}) = \left(\frac{1}{1-\gamma\_{11}^2}\right)\{\gamma\_{11}(1-\gamma\_{11})\mu\_1^{[1]} + \gamma\_{12}(1-\gamma\_{12})\lambda\_1^{[1]}$$

$$+ (1+\gamma\_{12}^2+2\gamma\_{11}\gamma\_{12})\left[\frac{\lambda\_1^{[1]}}{\vec{v}\_1} + \frac{\vec{v}\_1-1}{2\vec{v}\_1^2}\right] \},\tag{15}$$

$$\begin{split} Var(Y\_1^{[2]}) &= \left(\frac{1}{1-\gamma\_{21}^2}\right) \{\gamma\_{21}(1-\gamma\_{21})\mu\_{t-1}^{[2]} + \gamma\_{22}(1-\gamma\_{22})\lambda\_{t-1}^{[2]} \\ &+ (1+\gamma\_{22}^2+2\gamma\_{21}\gamma\_{22}) \left[\frac{\lambda\_{t-1}^{[2]}}{\vec{v}\_2} + \frac{\vec{v}\_2-1}{2\vec{v}\_2^2}\right] \}, \end{split} \tag{16}$$

$$Cov(Y\_1^{[1]}, Y\_1^{[2]}) = \frac{(\gamma\_{11}\gamma\_{22} + \gamma\_{12}\gamma\_{21} + \gamma\_{12}\gamma\_{22} + 1)a\sqrt{\lambda\_1^{[1]}}\sqrt{\lambda\_1^{[2]}}}{1 - \gamma\_{11}\gamma\_{21}}\tag{17}$$

**Remark 2.** *Using the initial values for t* = 1 *in Equations (12)–(16), we compute the values of <sup>μ</sup>*[1] *<sup>t</sup>* , *<sup>μ</sup>*[2] *<sup>t</sup>* , *Var*(*Y*[1] *<sup>t</sup>* ), *Var*(*Y*[2] *<sup>t</sup>* ) *and Cov*(*Y*[1] *<sup>t</sup>* ,*Y*[2] *<sup>t</sup>* ) *in Equations (3)–(11) iteratively for t* = 2, . . . , *T.*

#### **3. Conditional Maximum Likelihood Method**

In this section, we derive the CML method for estimating the parameters of the BINARMA (1,1) model based on thinning and convolution properties [13]. The conditional density of the proposed BINARMA (1,1) model with COM-Poisson innovations is derived as follows:

$$f\_1(k) = \sum\_{j\_1=0}^k \binom{y\_{t-1}^{[1]}}{j\_1} \binom{r\_{t-1}^{[1]} = y\_{t-1}^{[1]} - k}{k - j\_1}$$

$$\gamma\_{11}^{j\_1} (1 - \gamma\_{11})^{y\_{t-1}^{[1]} - j\_1} \gamma\_{12}^{k - j\_1} (1 - \gamma\_{12})^{y\_{t-1}^{[1]} - 2k + j\_1} \tag{18}$$

$$f\_2(s) = \sum\_{j\_2=0}^{s} \binom{y\_{t-1}^{[2]}}{j\_2} \binom{r\_{t-1}^{[2]} - y\_{t-1}^{[2]} - s}{s - j\_2}$$

$$\gamma\_{21}^{j\_2} (1 - \gamma\_{21})^{y\_{t-1}^{[2]} - j\_2} \gamma\_{22}^{s - j\_2} (1 - \gamma\_{22})^{y\_{t-1}^{[2]} - 2s + j\_2} \tag{19}$$

and a bivariate distribution of the innovation terms *f*3(*r* [1] *<sup>t</sup>* = *y* [1] *<sup>t</sup>*−<sup>1</sup> <sup>−</sup> *<sup>k</sup>*,*<sup>r</sup>* [2] *<sup>t</sup>* = *y* [2] *<sup>t</sup>*−<sup>1</sup> <sup>−</sup> *<sup>s</sup>*) = *P* (*R*[1] *<sup>t</sup>* =*r* [1] *<sup>t</sup>* ,*R*[2] *<sup>t</sup> r* [2] *<sup>t</sup>* ) , where

$$f\_3(r\_t^{[1]} = y\_{t-1}^{[1]} - k, r\_t^{[2]} = y\_{t-1}^{[2]} - s) = \left[ \frac{(\lambda\_t^{[1]})^{y\_{t-1}^{[1]} - k}}{((y\_{t-1}^{[1]} - k)!)^{\mathbb{I}\_1}} \right] \left[ \frac{1}{Z(\lambda\_t^{[1]}, \mathbb{M}\_1)} \right] \left[ \frac{(\lambda\_t^{[2]})^{y\_{t-1}^{[2]} - k}}{((y\_{t-1}^{[2]} - k)!)^{\mathbb{I}\_2}} \right] \left[ \frac{1}{Z(\lambda\_t^{[2]}, \mathbb{M}\_2)} \right] \tag{20}$$

where the normalizing constant *Z*(*λ*[*k*] *<sup>t</sup>* , *<sup>ν</sup>*˜*k*) = <sup>∑</sup><sup>∞</sup> *j*=0 *<sup>λ</sup>*[*k*] *t j* (*j*!)*ν*˜ *k* . The conditional density is written as *f*((*y* [1] *<sup>t</sup>* , *y* [2] *<sup>t</sup>* )|(*y* [1] *<sup>t</sup>*−1, *<sup>y</sup>* [2] *<sup>t</sup>*−1,*<sup>r</sup>* [1] *<sup>t</sup>*−1,*<sup>r</sup>* [2] *<sup>t</sup>*−1), *<sup>θ</sup>*) <sup>=</sup>∑*g*<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> <sup>∑</sup>*g*<sup>2</sup> *s*=0 *f*1(*k*)*f*2(*s*)*f*3(*r* [1] *<sup>t</sup>* = *y* [1] *<sup>t</sup>*−<sup>1</sup> <sup>−</sup> *<sup>k</sup>*,*<sup>r</sup>* [2] *<sup>t</sup>* = *y* [2] *<sup>t</sup>*−<sup>1</sup> <sup>−</sup> *<sup>s</sup>*),

where *θ* = [*γ*11, *γ*12, *γ*21*γ*22, *ν*˜1, *ν*˜2, *β*[*k*] ] is the vector of unknown parameters, *g*<sup>1</sup> = min(*y* [1] *<sup>t</sup>* , *y* [1] *<sup>t</sup>*−1) and *g*<sup>2</sup> = min(*y* [2] *<sup>t</sup>* , *y* [2] *<sup>t</sup>*−1).

The conditional likelihood function is given by

$$L(\boldsymbol{\theta}|\boldsymbol{y}) = \prod\_{t=1}^{T} f((y\_t^{[1]}, y\_t^{[2]}) | (y\_{t-1}^{[1]}, y\_{t-1}^{[2]}, r\_{t-1}^{[1]}, r\_{t-1}^{[2]}), \boldsymbol{\theta}) \tag{21}$$

and after maximizing Equation (22)

$$\log[L(\theta|y)] = \log\left[\sum\_{t=1}^{T} f((y\_t^{[1]}, y\_t^{[2]}) | (y\_{t-1}^{[1]}, y\_{t-1}^{[2]}, r\_{t-1}^{[1]}, r\_{t-1}^{[2]}), \theta)\right] \tag{22}$$

we obtain the maximum likelihood estimators of *θ* for some starting value of *y***0**.

#### **4. Forecasting Equations**

Based on the proposed model, the forecasting equations are derived as follows:

$$E(Y\_{t+1}^{[1]}|y\_t^{[1]}, r\_t^{[1]}) = E(\gamma\_{11} \circ Y\_t^{[1]}|y\_t^{[1]}) + E(\gamma\_{12} \circ R\_t^{[1]}|r\_t^{[1]}) + E(R\_{t+1}^{[1]})$$

$$= \gamma\_{11}y\_t^{[1]} + \gamma\_{12}r\_t^{[1]} + \lambda\_{t+1}^{[1]} \tag{23}$$

$$\begin{split} E(Y\_{t+1}^{[2]}|y\_t^{[2]},r\_t^{[2]}) &= E(\gamma\_{21} \circ Y\_t^{[2]}|y\_t^{[2]}) + E(\gamma\_{22} \circ R\_t^{[2]}|r\_t^{[2]}) + E(R\_{t+1}^{[2]})\\ &= \gamma\_{21}y\_t^{[1]} + \gamma\_{22}r\_t^{[2]} + \lambda\_{t+1}^{[2]} \end{split} \tag{24}$$

and

$$\begin{split} \text{Var}(Y\_{t+1}^{[1]}|y\_t^{[1]} \; \prime \; r\_t^{[1]}) &= \text{Var}(\gamma\_{11} \circ y\_t^{[1]}|y\_t^{[1]}) + \text{Var}(\gamma\_{12} \circ R\_t^{[1]}|r\_t^{[1]}) + \text{Var}(R\_{t+1}^{[1]})\\ &= \gamma\_{11}(1-\gamma\_{11})y\_t^{[1]} + \gamma\_{12}(1-\gamma\_{12})r\_t^{[1]} + \frac{\lambda\_{t+1}^{[1]}}{\tilde{v}\_1} + \frac{\tilde{v}\_1 - 1}{2\tilde{v}\_1^2} \end{split} \tag{25}$$

$$\begin{split} \text{Var}(Y\_{t+1}^{[2]}|y\_t^{[2],\boldsymbol{r}\_t^{[2]}}) &= \text{Var}(\gamma\_{21} \circ y\_t^{[2]}|y\_t^{[2]}) + \text{Var}(\gamma\_{22} \circ \mathcal{R}\_t^{[2]}|r\_t^{[2]}) + \text{Var}(\mathcal{R}\_{t+1}^{[2]}) \\ &= \gamma\_{21}(1-\gamma\_{21})y\_t^{[2]} + \gamma\_{22}(1-\gamma\_{22})r\_t^{[2]} + \frac{\lambda\_{t+1}^{[2]}}{\tilde{v}\_2} + \frac{\tilde{v}\_2 - 1}{2\tilde{v}\_2^2} \end{split} \tag{26}$$

#### **5. Modelling Daily Stock Transactions**

This section focusses on the number of daily stock transactions of the two most eminent banking institutions in Mauritius, namely Mauritius Commercial Bank Group Limited (MCB) and State Bank of Mauritius Holdings Ltd. (SBMH), that are listed on SEM. The daily stock transactions refers to the number of times stocks are bought and sold at the prevailing price during the trading session. MCB and SBMH are licensed by the Bank of Mauritius and have the biggest market share in the country. MCB was founded in 1838 and is the oldest and largest banking institution in Mauritius, while SBMH is the second largest commercial bank established in 1973. The total assets of MCB is nearly USD 15.8 billion, with a market capitalization on the SEM of USD 1.5 billion. MCB is owned by almost 22,000 domestic and foreign shareholders, has over 1.1 million individual and institutional clients and employs approximately 3700 staff. On the other hand, the total assets of SBMH is nearly USD 6.6 billion, with a market capitalization on the SEM of USD 253 million. SBMH is owned by almost 18,518 domestic and international shareholders, has over 0.75 million individual and institutional customers and employs approximately 2845 staff. The COVID-19 pandemic since the year 2020 has caused unprecedented disruptions

and created innumerable challenges for both commercial banks. In the wake of this difficult time, many investors of the SEM have been negatively affected and has been navigating in the red zone for quite some time because of the uncertainty prevailing due to the COVID-19 pandemic. MCB and SBMH have not been spared by the pandemic and their performance on the SEM has been affected negatively since the pandemic. Hence, it is of upmost importance to model the number of daily stock transactions of these two banks and provide reliable estimates to the investors so that they can decide whether they need to hold or sell the shares of MCB and SBMH.

The transactions of MCB and SBMH must be inter-related as they operate in the same sector, namely the banking sector and provides the same line of services and financial activities. Thus, we collected data from the several brokers on the number of daily transactions of these two banking institutions from 4 October to 10 December 2021 over 30 min intervals. As far the covariates are concerned, based on previous researchers [1,3–5], the following variables were identified as those influencing the number of daily stock transactions on the SEM: the intervention of any COVID-19 news that affect the financial market, the time of day effect and the day effect.

Hence, in this section, we analyze the intra-day stock transactions of MCB and SBMH using a novel time series model, namely the BINARMA (1,1) model with CMP innovations. The Stock market data for the number of daily transactions were collected from the SEM for MCB and SBMH within 30 min interval from 4 October to 10 December 2021, amounting to 450 paired observations. In the same line, the covariates that influence these daily stock transactions were recorded as follows: information on COVID-19 news (*xt*1) where 1 refers for any new COVID-19 information which influence the stock trading of SBMH and MCB and 0 for no COVID-19 information, Friday effect (*xt*2) where 1 refers to trading conducted on Fridays and 0 for trading conducted on Mondays, Tuesdays, Wednesdays and Thursdays and time of the day effect (*xt*3) where 1 refers to the trading effected during the time period 12:00–13:30 and 0 for the trading between 09:00 and 12:00. Normally, the SEM operates from Monday to Friday only between 09:00 to 13:30 and is closed on public holidays. Based on 450 paired observations, the time series plots and the descriptive statistics are shown in Figures 1–4 and Table 1.

**Figure 1.** Time series plot for SBMH.

**Figure 2.** ACF plot for SBMH.

**Figure 3.** Time series plot for MCB.

**Figure 4.** ACF plot for MCB.

**Table 1.** Summary statistics for the intra-day transactions of SBMH and MCB.


From Table 1, we can conclude that the SBMH data series is slightly over-dispersed while the MCB data series is slightly under-dispersed. Both series have an average sample lag-1 correlation, with a sample cross-correlation of 0.1617, which confirms both the existence of relationship between as well as within the two series. As for the ACF plots, we observe that for both series that lag-1 has the highest peak. Thus, the proposed BI-NARMA (1,1) model with CMP innovations is used to model the in-sample daily trading of SBMH (*Y*[1] *<sup>t</sup>* ) and MCB (*Y*[2] *<sup>t</sup>* ) between 4 October and 3 December 2021, totalling 405 paired observations, while the out-sample from 5 December to 10 December 2021 is used to validate the model. We also apply the bivariate integer-valued autoregressive model of order 1 with CMP innovations (BINAR(1)CMP) developed by Jowaheer et al. [6] and the bivariate integer-valued moving average model of order 1 with CMP innovations (BINMA(1)) developed by Mamodekhan et al. [7] on the intra-day series. Under *λ*[*k*] *<sup>t</sup>* <sup>=</sup> exp(*β*ˆ[*k*] <sup>0</sup> <sup>+</sup> *<sup>β</sup>*ˆ[*k*] <sup>1</sup> *xt*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*ˆ[*k*] <sup>2</sup> *xt*<sup>2</sup> <sup>+</sup> *<sup>β</sup>*ˆ[*k*] <sup>3</sup> *xt*3), the estimates of the covariate effects under the application of the three models are shown in Table 2.

**Table 2.** Intra-day transactions for MCB and SBMH: Estimates of the regression parameters.


From the regression coefficients Table 2, we observe that the estimates of the covariates obtained using the BINARMA (1,1) CMP have lower standard errors compared to those obtained using BINAR(1)CMP and BINMA(1)CMP and hence, we interpret only the estimates of the BINARMA (1,1) CMP model. We can also notice that all the explanatory variables are significant, thus confirming their influence on the number of intra-day transactions of MCB and SBMH. Since the pandemic of COVID-19 started in the year 2020, any news entering the domestic market pertaining to confinement, number of COVID-19 cases in Mauritius, potential vaccines against COVID-19, the gradual recovery of the economy and the lifting of the confinement have caused an increase in the number of inra-day stock transactions of both MCB and SBMH. For SBMH, as news filter in the market, we expect an increase in the stock transactions of 30.1 percent and 27.8 percent for MCB. From these figures, we can conclude that there are more trading for SBMH stocks than MCB stocks as MCB is a more robust bank than SBMH. The other two explanatory variables, namely the Friday and time effects, have also affected the number of intra-day transactions of MCB and SBMH, but their influence were lower than the COVID-19 news effect. From the correlation Table 3, we can confirm that there exists a relationship within and between the number of daily trading of MCB and SBMH.


Using the forecasting Equations (23) and (24), we compute the one-step ahead forecast of the number of stock trading of SBMH and MCB based on the out-sample trading data between 5 December to 10 December 2021, totalling 45 paired observations and the corresponding root mean square errors (RMSEs) for three models with CMP innovations are shown in Table 4:

**Table 3.** Intra-day transactions for MCB and SBMH: Estimates of the dependence parameters.

**Table 4.** RMSEs for the out-sample number of intra-day transactions of MCB and SBMH.


From Table 4, we observe that the BINARMA (1,1) CMP provide better RMSEs than BINAR(1)CMP and BINMA(1)CMP.

#### **6. Conclusions**

This paper considers the modeling of the intra-day transactions of two most prestigious banking companies: MCB and SBMH in Mauritius and how the COVID-19 pandemic has affected the stock transactions of these two commercial banks. Since the time series data of one bank is over-dispersed and the other one is under-dispersed, we develop a BINARMA (1,1) process with CMP innovation terms to model these data. In this paper, a CML approach is used to estimate the regression and correlation parameters for the two series. This novel BINARMA (1,1) CMP model together with the CML were then applied to estimate the regression and correlation effects of the intra-day transactions where it was found to yield significant estimates for the time of trade, Friday and COVID-19 news effect. The forecasting equations were also developed and they yield reliable estimates for the volume of transaction for the two series based on the real figures.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


### *Proceeding Paper* **Improving the Predictive Power of Historical Consistent Neural Networks †**

**Rockefeller Rockefeller 1,2,***<sup>∗</sup>* **, Bubacarr Bah 1,2,‡, Vukosi Marivate 3,‡ and Hans-Georg Zimmermann 4,‡**


**Abstract:** The Historical Consistent Neural Networks (HCNN) are an extension of the standard Recurrent Neural Networks (RNN): they allow the modeling of highly-interacting dynamical systems across multiple time scales. HCNN do not draw any distinction between inputs and outputs, but model observables embedded in the dynamics of a large state space. In this paper, we propose to improve the predictive power of the (Vanilla) HCNN using three methods: (1) HCNN with Partial Teacher Forcing, (2) HCNN with Sparse State Transition Matrix, and (3) a Long Short Term Memory Formulation of HCNN. We investigated the effect of those long memory improvement methods on three chaotic time-series mathematically generated from the Rabinovich–Fabrikant, the Rossler System and the Lorenz system. To complement our study, we compared the accuracy of the different HCNN variants with well-known recurrent neural networks methods such as Vanilla RNN and LSTM for the same prediction tasks. Overall, our results show that the Vanilla HCNN is superior to RNN and LSTM. This is even more the case if you include the above long memory extensions (1), (2) and (3). We demonstrate that (1) and (3) are superior for the modeling of our chaotic dynamical systems. We show that for these deterministic systems, the ensembles are narrowed.

**Keywords:** recurrent neural networks; historical consistent neural networks; time series forecasting; chaotic dynamical systems

#### **1. Introduction**

Over the recent years, data-driven approaches, including deep learning techniques have played an instrumental role in the way we model, predict, and control dynamical systems [1]. Thanks to the help of modern mathematical methods, the availability of data and computational resources, Neural Networks have been increasingly used to understand complex systems (non linear and high dimensional systems) [2]. In 1989, Hornik, Stinchcombe, and White proved through the Universal Approximation theorem that Multi-layer feedforward Networks are Universal Approximators [3]. The Universal Approximation for RNN is stated in [2]. Recurrent neural networks (or RNN) are a family of Neural networks designed for processing sequential data. They are increasingly being used to understand, analyze and forecast the evolution of complex dynamical systems due to the explicit modeling of time and memory they offer [4]. RNN fulfill the universal approximation properties and allow the identification of dynamical systems in form of high dimensional, non-linear state space models [5]. Simple in architectures with sophisticated learning algorithms, they have emerged as one of the first class candidates for modeling dynamical systems [6]. The benefits they offer to deal with the typical challenges associated with forecasting in

**Citation:** Rockefeller, R.; Bah, B.; Marivate, V.; Zimmermann, H.-G. Improving the Predictive Power of Historical Consistent Neural Networks. *Eng. Proc.* **2022**, *18*, 36. https://doi.org/10.3390/ engproc2022018036

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 27 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

general, make them suitable for learning non-linear dependencies from observed time series data [7].

Throughout the training process, RNN rely on external inputs, which make them suitable for the modeling of open dynamical systems. However, they assume constant environmental conditions as from present time on, which make them temporarily inconsistent [7]. HCNN do not have this inconsistency between past and future modeling. Introduced by [8], HCNN is based upon the assumption that dynamical systems must be seen in a context of large systems in which various (non-linear) dynamics interact with each other in time. HCNN do not only model the individual dynamics of interest, but also the external drivers in the same manner, by embedding them in the dynamics of a large state space [8].

This paper brings about three contributions: Firstly, we modeled three well-known chaotic dynamical systems by the use of Vanilla HCNN: namely the Lorenz, the Rabinovich–Fabrikant and the Rossler Systems. Secondly, we improved the forecast accuracy and the length of the forecast horizon of the Vanilla HCNN using three methods: HCNN with Partial Teacher Forcing, HCNN with Sparse constraint on state transition matrix, and a Long Short Term Memory Formulation of HCNN. Thirdly, we ran a comparative analysis between those different strands of HCNN and well-known deep learning neural network models such as Vanilla Recurrent Neural Networks (RNN) and long-short term memory based model (LSTM). The rest of this paper is organized as follows.

The Sections 2 and 3 provide respectively a review of the mathematical description of RNN and an architectural description of HCNN as well as its learning algorithm. In Section 4, we discussed the intuition behind and the architecture of the different HCNN improvement methods. The focus of Section 5 is on the data generation of the three chaotic dynamical systems. Section 6 shows the different results and comparative analysis between the pre-cited methods and the existing well-known recurrent neural networks namely RNN and LSTM. In Section 7, we demonstrate that our results are reproducible for different HCNN instances. Finally, we present the conclusion and future work in the Section 8.

#### **2. Reminder of Recurrent Neural Networks for Dynamical Systems**

Let us consider, as in Figure 1, a dynamical system driven by an external signal *ut*.

**Figure 1.** RNN Identification of a (folded) dynamical system using a discrete time description.

$$s\_t \quad = \quad f(s\_{t-1}, u\_t) \tag{1}$$

*yt* = *g*(*st*) (2)

Let us assume that, at each time *t*, an output *yt* is recorded. A dynamical system can be described for discrete time grids by a state space model, consisting respectively of a state transition and an output equation. The recursive Equation (1) describes the current state of the system *st* with respect to the previous state of the system *st*−<sup>1</sup> and the external signals *ut*. The expected output *yt* is computed as a function of the current state of the system (2). Key in the success of RNN, is their ability to generalized well, due to the fact that it is trained using parameter sharing [9]. Without loss of generality, we can approximate the state space model with the state transition (3) and the related output Equation (4):

$$\begin{array}{rcl} s\_l & = & \tanh(As\_{l-1} + Bu\_l) \end{array} \tag{3}$$

$$y\_t \quad = \quad \mathbb{C}s\_t \tag{4}$$

where *A*, *B* and *C* are the weight matrices, respectively, for hidden-to-hidden, input-tohidden and hidden-to-output connections. This makes a simple Recurrent Neural Network (RNN) with recurrent connections between hidden units across the whole time range [4]. By performing a finite unfolding in time, we transform the temporal equations above into the spatial architecture as shown in the Figure 2 above [8].

**Figure 2.** Vanilla RNN architecture.

The Vanilla RNN explains the dynamics observed on the *yt* at each time point by splitting its complexity into two parts: the external driven part represented by the external influences *ut* and the autonomous driven part (or hidden dynamics) represented by the internal states *st*. If the internal states of the system play an important role into understanding the dynamics of the observables, then an overshooting extends the autonomous part of the system several steps in the future and enable a reliable forecast of the *yt*. For a given sequence of *ut* and computed *yt* values, we pair the corresponding observed values *yd <sup>t</sup>* and find the optimal set of shared parameters (the matrices *A*, *B* and *C*) by solving the following optimization problem in the Equation (5) here after [7]:

$$\min\_{A,B,\mathbb{C}} \frac{1}{T} \sum\_{t=1}^{T} ||y\_t - y\_t^d|| \tag{5}$$

The training of the RNN can be conducted using the error-back-propagation-throughtime (BPTT) algorithm. This is a natural extension of standard back-propagation that performs gradient descent on a network unfolded in time. More details on BPTT are given in [4]. However, despite its success over the past years, and especially on short term forecasts, the missing external inputs *ut* in the future (which can be interpreted as a constant environment, i.e., *ut* 0), makes a vanilla RNN temporarily inconsistent [7].

#### **3. Historical Consistent Neural Networks**

The HCNN were designed to address the temporal inconsistency in RNN. A dynamical system is often viewed in the context of large systems in which various (non-linear) dynamics interact with one another in time. However, we can only measure/observe a small subset of those variables. Therefore, HCNN reconstruct (at least part of) the hidden variables in order to understand the dynamics of the whole system [7,10]. Here the input and output variables are combined and termed as observables (*Yt* := (*ut*, *yt*),(*Yt* <sup>∈</sup> <sup>R</sup>*N*)). Together with the hidden variables, they form the state of the system at each time *τ* (see Figure 3) and are treated by the model in the same manner. The corresponding state transition Equation (6) and output Equation (7) are also provided below.

**Figure 3.** HCNN identification of a (folded) dynamical system using a discrete time description.

$$s\_{l\\_} = \\_A \tanh(s\_{l-1}) \tag{6}$$

$$\mathcal{Y}\_{l} \quad = \quad [Id \, \! / 0] \\ \mathbf{s}\_{l} \quad \tag{7}$$

The joint dynamics for all observables is characterized in the HCNN by the sequence of states *st*. The observables (*i* = 1, ... , *N*) are arranged on the first *N* state neurons of *st* and followed by non-observable (hidden) variables as subsequent neurons. The object [*Id*, 0] is a fixed matrix that reads out the observables from the state vector *st* [7]. At the initial time, the state *s*<sup>0</sup> is described as a bias/random vector and the matrix *A* contains the only free parameters [7].

Similar to a standard RNN, HCNN fulfils the universal approximation theorem, as highlighted in [7,10]. However, the lack of any input signals and an unfolding across the complete data makes it difficult to train in practice [8]. As proposed by [4], the models that have recurrent connections from their outputs leading back into the model may be trained with teacher forcing.

This is a procedure that emerges from the maximum likelihood criterion. It makes the best possible use of the data from the observables and therefore accelerates the training of the HCNN [2,10]. Throughout the fitting procedure, the teacher forcing mechanism introduces a hidden layer *rt* that is a copy of the internal state *st*, with the exception that its first *N* components which correspond to the computed expected values *Yt* are replaced with the observed values *Y<sup>d</sup> <sup>t</sup>* as shown in the Equation (8).

$$\begin{array}{rcl} \mathbf{Y}\_{\mathbf{l}} &=& [Id, \mathbf{0}] \mathbf{s}\_{\mathbf{l}} \\ \mathbf{r}\_{\mathbf{l}} &=& \mathbf{s}\_{\mathbf{l}} - [Id, \mathbf{0}]^{\top} \left(\mathbf{Y}\_{\mathbf{l}} - \mathbf{Y}\_{\mathbf{l}}^{d}\right) \end{array} \tag{8}$$

$$s\_{l+1} \quad = \quad A \tanh(r\_l) \tag{9}$$

From the temporal equations above, we can also derive its spatial representation, through the resulting network architecture of the HCNN with integrated teacher forcing mechanism as illustrated in the Figure 4 below.

**Figure 4.** HCNN identification of a (folded) dynamical system using a discrete time description.

At each time *t* during training, the output layer of the HCNN is replaced by a cluster that is given a fixed target value of zero. This forces the HCNN to create the expected values *Yt* at each time *t*, to compensate for the negative observed values −*Yt* coming from the top node [8]. The content of that cluster, i.e., (*Yt* <sup>−</sup> *<sup>Y</sup><sup>d</sup> <sup>t</sup>* ), with a minus symbol is transferred to the upper part (the first *N* neurons) of the hidden layer *rt*. Furthermore, a copy of the state *st* is also transferred to the intermediate hidden layer *rt* on a component-by-component basis. As a result of that, the expected values *Yt* on the first *N* components of the hidden layer *rt* are replaced by the observed values [8] and the subsequent state *st*+<sup>1</sup> is computed using the state transition Equation (9).

#### **4. Long-Term Memory Improvement Methods**

To improve the long-term memory of the Vanilla HCNN model, three different improvement methods have been designed: HCNN with Partial Teacher Forcing, with Large

Sparse State Transition Matrices and a Long Short-Term Memory Formulation. The intuitions behind each of the methods are shown below.

#### *4.1. HCNN with Partial Teacher Forcing*

To enforce the long-term learning of the HCNN, we endow the output layers of the HCNN with a dropout filter, guided by a probability, as illustrated in the Equation (10).

$$\text{dropout}\_{(p)}(\mathbf{x}\_{i}) = \begin{cases} \text{0} & \text{if } \text{dropout}\_{(p)} \\ \text{x}\_{i} & \text{if no } \text{dropout}\_{(1-p)} \end{cases} \tag{10}$$

At time *t*, when the filter is activated (which means for a probability *p*), the HCNN randomly suppress elements in the time series that come from the cluster containing the difference between expectations and observations (*Yt* <sup>−</sup> *<sup>Y</sup><sup>d</sup> <sup>t</sup>* ). Thus, in the upper part (the first *N* components) of the *rt* vector, the network is enforced to replace the observations with its internal expectations. The architecture is represented in the Figure 5 below.

**Figure 5.** HCNN architecture with partial teacher forcing mechanism.

#### *4.2. HCNN with Large Sparse State Transition Matrix*

HCNN may often use large state vectors to model large dynamical systems (number of observables > 100). During training time, the iteration with a fully connected state transition matrix *A* could cause an information overload, leading to two risks:


In order to overcome that, we can choose to set the transition matrix *A* sparse to a chosen degree. As a result of that, the spread of the information peak through the network is damped by the sparsity too. Another approach, as proposed by [7], consists of a heuristic approach that represents the sparsity of *A*, as inversely proportional to the state dimensionality of the system, as shown in the Equation (11) below:

$$\text{Sparsity (A)}\quad = \min(1, \frac{50}{\text{dim}(s)})\tag{11}$$

#### *4.3. HCNN with LSTM Formulation*

The third approach to improve the long-term memory of the vanilla HCNN is through an exponential smoothing embedding of the HCNN with a learnable diagonal matrix [11], subject to the following constraints 0 ≤ *Dii* ≤ 1. The resulting state transition Equation (12) and output Equation (13) are provided below:

$$\begin{array}{rcl} s\_{l} &=& (\text{Id} - D)s'\_{l-1} + DA \tanh(s'\_{l-1}) \\ &=& s'\_{l-1} + D(A \tanh(s'\_{l-1}) - s'\_{l-1}) \end{array} \tag{12}$$

$$\mathbf{Y}\_{t} = \begin{bmatrix} \dot{\mathbf{I}}\dot{\mathbf{d}}\_{t}\mathbf{0} \end{bmatrix} \tag{13}$$

where *s <sup>t</sup>*−<sup>1</sup> <sup>=</sup> *TeacherForcing*(*st*−1)

Built upon the ideas of the LSTM formulation of RNN [12], the resulting architecture of the LSTM formulation of HCNN is provided in the Figure 6 below.

**Figure 6.** Architectural description of HCNN with an LSTM Formulation.

⎧ ⎪⎨

⎪⎩

The next section will focus on the different dynamical systems that will be used to generate the data for the fitting procedure of the HCNN models.

#### **5. Experimental Setup**

The experiments we carried out in this work aimed at forecasting the dynamics of three fully observable, chaotic and deterministic systems, namely the Rabinovich–Fabrikant, Rossler System and the Lorenz system. Their chaotic properties come from the fact that they are very sensitive to their initial conditions, which smallest changes completely modify the respective trajectories. They are also well known to show aperiodic behaviour which is apparently random and unpredictable [1,13].

#### *5.1. The Rabinovich–Fabrikant System*

The Rabinovich–Fabrikant system is represented by the system of Equation (14) below:

$$\begin{array}{ll}\frac{\text{dx}}{\frac{\text{df}}{\text{df}}} &= y(z - 1 + \mathbf{x}^2) + \gamma \mathbf{x} \\ \frac{\text{dy}}{\frac{\text{df}}{\text{df}}} &= \mathbf{x}(3z + 1 - \mathbf{x}^2) + \gamma y \\ \frac{\text{dz}}{\frac{\text{df}}{\text{df}}} &= -2z(\mathbf{a} + \mathbf{x}y) \end{array} \tag{14}$$

where we set *α* = 0.2, *γ* = 0.1. Introduced by Mikhail Rabinovich and Anatoly Fabrikant, the set of equations describes the stochasticity arising from the modulation instability in a non-equilibrium dissipative medium [14]. The corresponding attractor is shown in the Figure 7 below.

**Figure 7.** The Rabinovich–Fabrikant attractor and the corresponding time series, split into training and test data.

#### *5.2. The Rossler System*

The Rossler system is represented by the system of Equation (15) below:

$$\begin{cases} \begin{array}{ll} \frac{dx}{dt} &= -y - z \\ \frac{dy}{dt} &= x + ay \\ \frac{dz}{dt} &= b + z(x - c) \end{array} \end{cases} \tag{15}$$

where we set *a* = 0.2, *b* = 0.2 and *c* = 6.3. Studied first by Otto Rössler in the 1970s, these non-linear ordinary differential equations define a continuous-time dynamical system that exhibits chaotic dynamics associated with the fractal properties as it is shown by the corresponding attractor in the Figure 8 below [15].

**Figure 8.** The Rossler attractor and the corresponding time series, split into training and test data.

#### *5.3. The Lorenz System*

The Lorenz system is represented by the system of Equation (16) below:

$$\begin{cases} \begin{array}{ll} \frac{dx}{dt} &= \sigma(y-x) \\ \frac{dy}{dt} &= (\rho-z)-y \\ \frac{dz}{dt} &= xy-\beta z \end{array} \end{cases} \tag{16}$$

where we set *ρ* = 10, *σ* = 28 and *β* = 2.667. The equations above describe the rate of change of three quantities, namely the rate of convection, the horizontal temperature variation and the vertical temperature variation. First studied by Edward Lorenz, the Lorenz system is a simplified mathematical model for the atmospheric convection [16,17]. The graphical representation of its attractor is provided in the Figure 9 below.

**Figure 9.** The Lorenz attractor and the corresponding time series, split into training and test data.

#### *5.4. Traning Strategies*

We solved each of the systems above by numerical approximation, with the configurations summarized in the Table 1 below.


**Table 1.** Configuration of the training data for each of the chaotic dynamical systems.

Here, the truncation parameter refers to the down-sampling that has been applied on the initial sample size. On the Lorenz time series for instance, we recorded values only every 8th time step and obtained a new sample size of 2000 observations, which has been divided into 1400 for the training and 100 observations for the forecast period as it is shown in the Figure 9 above. The truncation parameter was chosen carefully to make sure that for each system, from a graphical view point, the shapes of the corresponding attractors are preserved as exemplified in the Figures 7–9 above.

The different HCNN models were instantiated with 20 variables. Three accounting for the observables and 17 hidden variables, modeled in the same manner, to explain the dynamics of the different chaotic dynamical systems at hand. For the HCNN model with Sparse constraints, we chose the state dimensionality value as 100 (dim(*s*) = 100), which implies the transition matrix *A* will be half sparse as stated in the Equation (11) above. For The HCNN with Partial Teacher Forcing, the dropout filter was set with a incremental probability to ensure that the dropout probability will reach 25% at the end of the training. The diagonal matrix *D* of the LSTM Formulation of HCNN was constrained between 0 and 1 at each stage of the training. For comparison purpose, we also instantiated a Recurrent Neural Network model and an LSTM formulation of it with 20 hidden states each.

#### *5.5. Evaluation Metrics*

As evaluation metric, we chose the Logarithm of Hyperbolic Cosine as our Loss Function, represented by the equation below:

$$\text{LogCash Loss} \quad = \ \frac{1}{T} \sum\_{t=1}^{T} \frac{1}{p} \ln \cosh \left( p \left( Y\_t - Y\_t^d \right) \right) \tag{17}$$

#### **6. Results and Analysis**

The different models were trained and the results are shown in the different plots below.

#### *6.1. On the Rabinovich-Fabrikant System*

The plots below consist of both the actual observations and the predicted values of the three trajectories along the forecast period for each of the models as shown in pairs below the Vanilla HCNN and Vanilla RNN models (Figure 10), the HCNN with partial Teacher Forcing and with Sparse Constraints models (Figure 11), the LSTM Formulation of HCNN and of RNN models (Figure 12).

**Figure 11.** Generalization: HCNN pTF (**top**) and HCNN Large Sparse (**bottom**).

**Figure 12.** Generalization: HCNN with LSTM Form (**top**) and LSTM Based Model (**bottom**).

#### *6.2. On the Rossler System*

The plots below consist of both the actual observations and the predicted values of the three trajectories along the forecast period for each of the models as shown in pairs below the Vanilla HCNN and Vanilla RNN models (Figure 13), the HCNN with partial Teacher Forcing and with Sparse Constraints models (Figure 14), the LSTM Formulation of HCNN and of RNN models (Figure 15).

**Figure 13.** Generalization: Vanilla HCNN (**top**) and Vanilla RNN (**bottom**).

**Figure 15.** Generalization: HCNN with LSTM Form (**top**) and LSTM Based Model (**bottom**).

#### *6.3. On the Lorenz System*

The plots below consist of both the actual observations and the predicted values of the three trajectories along the forecast period for each of the models as shown in pairs below the Vanilla HCNN and Vanilla RNN models (Figure 16), the HCNN with partial Teacher Forcing and with Sparse Constraints models (Figure 17), the LSTM Formulation of HCNN and of RNN models (Figure 18).

**Figure 17.** Generalization: HCNN pTF (**top**) and HCNN Large Sparse (**bottom**).

**Figure 18.** Generalization: HCNN with LSTM Form (**top**) and LSTM Based Model (**bottom**).

For each of the experiments, HCNN with partial Teacher Forcing is the superior method. To summarize our findings, we grouped the results for every model in the Table 2 below.


**Table 2.** Forecast error made by the different models on the different chaotic dynamical systems.

As a general remark, the vanilla HCNN model outperformed the Vanilla RNN model on the three forecasting exercises. Out of the three HCNN improvement methods, not only the HCNN with partial Teacher Forcing is the superior one, it has also improved the forecast error made by the existing well-known RNN and LSTM models (provided by the PyTorch library).

#### **7. Ensemble Computations**

To ensure that our results are reproducible, we instantiate an ensemble of HCNN with 10 members and train them simultaneously to forecast the dynamics of the Lorenz system. The Figure 19 below consists of both the actual observations and the 10 instances of the HCNN model, and median of the ensembles.

**Figure 19.** Vanilla HCNN ensemble computations.

Looking at the picture above, we can see that for the 10 different instances of Vanilla HCNN, the shape of the three time series are preserved. At each time step, we computed the median and average out of each the 10 forecasts. The result we obtained show a balance between both generalization errors with the values being respectively 4.04 <sup>×</sup> <sup>10</sup>−<sup>3</sup> for the ensemble's median forecast and 4.09 <sup>×</sup> <sup>10</sup>−<sup>3</sup> for the ensemble's average forecast. This shows that HCNN are able to model High dimensional non linear systems in a consistent way and that both the ensemble's median and average forecast are also reliable candidates for the forecast the dynamics of such systems. .

#### **8. Conclusions**

Throughout this paper, we have seen that HCNNs are able to model High dimensional deterministic non linear systems in a consistent way. The different improvement methods have been instrumental in enforcing a long term learning of the dynamics of the multivariate time-series generated from the Lorenz, the Rossler and the Rabinovich-Fabrikant Systems. Among the three improvement methods, measuring by the generalization error, the Partial Teacher Forcing method is the superior way to improve both the long memory and the extent of the forecast horizon. The Large Sparse HCNN method is mostly aligned to biological methods. The LSTM Formulation Method has a linear sub-structure to overcome the vanishing/exploding gradient problems which sometimes creates numerical instability. The results obtained from the ensemble computation graphically show that the shape of the three time series are near each other throughout the whole forecast horizon. Hence, this shows that for different instances of HCNN, the results are ensured to be reproducible. Moving forward, it is worth noting that the datasets were generated mathematically. This gives to those dynamical systems the attribute of fully observables. Since the final goal of this ongoing research is on the analysis of climate data, those are rather high-dimensional and noisy measurements: such kind of system are called partially observables.There is currently a work in progress to extend these techniques to wind forecasting as a basis for wind turbine control. Analysing such systems will require transferring the long memory insights gained from this experiment to that new task. Furthermore, the identification of the optimal HCNN meta-parameters and the formulation of additional improvement techniques for the learning of HCNN will also be probable directions to look at. On another hand, the ensemble forecasts (median and average of the ensemble forecasts) seem to be a promising direction to navigate into, for such forecasting exercise.

**Author Contributions:** Conceptualization, R.R., B.B., H.-G.Z. and V.M.; methodology, R.R., B.B., H.-G.Z. and V.M.; software, R.R.; validation, R.R., B.B., H.-G.Z. and V.M.; formal analysis, R.R.; investigation, R.R.; resources, R.R.; data curation, R.R.; writing—original draft preparation, R.R.; writing—review and editing, R.R., B.B., H.-G.Z. and V.M.; visualization, R.R.; supervision, R.R., B.B., H.-G.Z. and V.M.; project administration, R.R., B.B., H.-G.Z. and V.M.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada, and with financial support from the Government of Canada, provided through Global Affairs Canada (GAC).

**Acknowledgments:** A special thanks to the African Institute for Mathematical Sciences, AIMS South Africa, which is my host institution, through the Next Einstein Initiative.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Proceeding Paper* **Exploration of Different Time Series Models for Soccer Athlete Performance Prediction †**

**Siarhei Kulakou 1, Nourhan Ragab 1,2 , Cise Midoglu 1,\* , Matthias Boeker <sup>1</sup> , Dag Johansen <sup>3</sup> , Michael A. Riegler 1,3 and Pål Halvorsen 1,2,4**


**Abstract:** Professional sports achievements combine not only the individual physical abilities of athletes but also many modern technologies in areas such as medicine, equipment production, nutrition, and physical and mental health monitoring. In this work, we address the problem of predicting soccer players' ability to perform, from subjective self-reported wellness parameters collected using a commercially deployed digital health monitoring system called PmSys. We use 2 years of data from two Norwegian female soccer teams, where players have reported daily ratings for their readiness-to-play, mood, stress, general muscle soreness, fatigue, sleep quality, and sleep duration. We explore various time series models with the goal of predicting readiness, employing both a univariate approach and a multivariate approach. We provide an experimental comparison of different time series models, such as purely recurrent models, models of mixed recursive convolutional types, ensemble of deep CNN models, and multivariate versions of the recurrent models, in terms of prediction performance, with a focus on detecting peaks. We use different input and prediction windows to compare the accuracy of next-day predictions and next-week predictions. We also investigate the potential of using models built on data from the whole team for making predictions about individual players, as compared to using models built on the data from the individual player only. We tackle the missing data problem by various methods, including the replacement of all gaps with zeros, filling in repeated values, as well as removing all gaps and concatenating arrays. Our case study on athlete monitoring shows that a number of time series analysis models are able to predict readiness with high accuracy in near real-time. Equipped with such insight, coaches and trainers can better plan individual and team training sessions, and perhaps avoid over training and injuries.

**Keywords:** athlete training; data imputation; injury prevention; performance prediction; selfreporting; soccer; time series analysis

#### **1. Introduction**

Team sports are gaining traction with association football (soccer) in the lead as the most watched sport on television [1]. In 2018, 3.572 billion viewers tuned in to watch the FIFA World Cup [2]. This is equivalent to half of the population on the globe [2]. Soccer is played by amateurs and professionals globally and is a sport that brings people together across the world.

For most professional sports achievements, athletes' individual physical abilities are combined with many modern technologies from fields such as medicine, equipment production, nutrition, and physical and mental health monitoring. Proper diet, rest, and training

**Citation:** Kulakou, S.; Ragab, N.; Midoglu, C.; Boeker, M.; Johansen, D.; Riegler, M.A.; Halvorsen, P. Exploration of Different Time Series Models for Soccer Athlete Performance Prediction. *Eng. Proc.* **2022**, *18*, 37. https://doi.org/ 10.3390/engproc2022018037

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 29 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

regimens, as well as continuous monitoring and analysis of wellness and performance metrics can make a big difference for both individual and team sports. Soccer players constantly adhere to a strict nutrition plan, training process, and rest regime so that their body is in the required state at particular moments. The athletes' condition is influenced not only by the amount of consumed and burned calories, or the duration and intensity of the training process, but also by parameters such as the duration and quality of their sleep and their general mental state including mood and stress level.

The increasing popularity and adoption of Machine Learning (ML) approaches has led to more evidence-based decision making [3]. In this context, it is possible to compile a set of parameters describing the general state of the athlete at a certain point of time, collect objective or subjective measurements of these parameters, and try to predict the state/behavior of the athlete's body in the near future using ML. For instance, according to Fuller et al. [4], an average soccer team can expect around 50 injuries per season. Using ML, it might be feasible to make evidence-based decisions for reducing injuries at a chosen period in the future using time series analysis and predictions. The desired outcome of implementing such technologies into sports science is that injuries will be reduced, performance will be improved, and better decisions will be made. Similar technologies have already been approved by organizations such as FIFA and are used by several teams in the Norwegian top league.

In this work, we address the problem of predicting soccer players' ability to perform, from subjective self-reported wellness parameters collected using a commercially deployed digital health monitoring system called PmSys. We focus on *readiness to play*, which is a measure of a player's ability to complete a training session or play in a match. We address the research question: Can we predict readiness for elite female soccer athletes using ML on data collected using an athlete monitoring system? More specifically, we try to answer the following. Which time series models are capable of capturing adequate information for accurate predictions? Is it more accurate to predict a player's readiness using data from an individual player or team based data? How far back in the historical data should we go and how far into the future can we predict? Does the training dataset size have an impact on the results? Which hyperparameter configurations result in the most accurate predictions?

We design and implement a software framework for undertaking time series predictions which can be configured extensively [5], and run multiple iterations of experiments to answer the above questions. Our results show that a number of time series analysis models are able to predict readiness with high accuracy in near real-time. Equipped with such insight, coaches and trainers can better plan individual and team training sessions, and perhaps avoid over training and injuries.

The rest of this paper is organized as follows. In Section 2, we provide background information and an overview of related work. In Section 3, we describe our dataset and methodology. In Section 4, we present selected results from our analysis of the above research questions, and discuss our findings. In Section 5, we conclude the paper.

#### **2. Background and Related Work**

#### *2.1. Time Series Prediction*

A time series is described as an ordered collection of data points where each data point is an observation in time [6–8]. Time series are sample realizations of stochastic processes [6]. Stochastic processes can be found in various fields, such as temperature measurements or a stock market index. Time series have an inherent dependence in time. Successive observations are dependent and thus not randomly sampled [6]. Time as an additional dimension distinguishes time series data from the more generally known crosssectional data. The time series analysis describes a set of statistical models that aim to model the underlying stochastic process. The objectives of time series analysis are forecasting and control [6]. Forecasting is considered as a regression problem and is defined as the prediction of values based on the time series model beyond the present. The main tasks of time series analysis are: (i) understand under the influence of which parameters the

value of the time series is formed, (ii) build a mathematical model for each parameter or combination. Any time series can be decomposed into a trend, seasonal, cyclical, and random components. The first three components form the non-random component of the time series. The random component is present in any time series, but components of a non-random constituent in the time series structure are not necessary [9].

The development of Recurrent Neural Networks (RNNs) made a significant contribution to the study and help solve regression problems associated with time series. RNNs are applicable in tasks where something holistic is broken into parts. One of the most popular types of recurrent neural networks is the Long Short-Term Memory (LSTM) network. Some of the most outstanding achievements of using LSTM are the revolutionizing of speech recognition, outperforming traditional models in specific speech applications [10], improving large-vocabulary speech recognition [11,12], and breaking records for improved machine translation [13]. The LSTM extends the traditional time series analysis models such as autoregressive integrated moving average models or exponential smoothing methods. Saimi et al. [14] demonstrated the superiority of the LSTM over the traditional approaches. In addition, in 2014, a gate mechanism was introduced, called Gated Recurrent Units (GRU), for recurrent neural networks. One found that its effectiveness in solving problems of modeling music and speech signals is comparable to the use of long short-term memory [15]. Compared to LSTM, this mechanism has fewer parameters because there are no output gates [16].

#### *2.2. The PmSys Athlete Monitoring System*

PmSys is a digital monitoring system that was developed for the collection, storage, analysis, and visualization of athletes' health data [17,18]. The intention for creating this system was to replace the manual method of collecting information, and its storage on paper, with a digital one. According to the creators, the primary users are soccer players, coaches, and medical personnel [19]. The primary tool for information registration is a questionnaire that the players could easily and quickly fill out each day using a mobile application [20]. This approach allows athletes to manage their own time when filling out the questionnaire without the coach's insistent control of the process.

Several experiments have been carried out to see if ML can be used to forecast future values of soccer player readiness to play in previous work. Wiik et al. [18] conducted a study with the purpose of reducing sports injuries and predicting readiness to play. Based on a dataset from two male high division soccer teams in Norway (one team from January 2017 until late August 2017 including 19 players, the other team from February 2018 to mid June 2018 including 22 players, with an overall dataset of 6000 entries), they demonstrated the value of utilizing a LSTM RNN to predict reported training load. They were able to train the model to predict positive peaks and negative peaks. Positive peaks are categorized as values above 8 and negative peaks are categorized as values below 3. Both of the datasets did not have values for all days. As a result, they had to account for the issue of missing data. The missing data were not replaced or deleted to provide a realistic use scenario. This was performed to provide a more realistic use case, as data gaps would always exist due to vacations, injury time, and other factors. To keep the simplicity of reproducing and interpreting the results, the model was kept small. This allowed to explore the underlying possibilities in the data. To assess and confirm the data, two distinct methodologies were used. The initial strategy was to train the entire team and then predict when the player would be ready to play. The second strategy involves training the model on the player who will be predicted. When the entire team was used to train the model, the predictions were more accurate, and the graphs closely tracked the peaks. Wiik et al. [18] attempted another experiment with traditional ML methods such as linear regression and random forest, but the results were not significantly improved.

Another research performed by Johansen et al. [21] demonstrated the impact of employing current technologies in sports to detect injuries and train optimally. In this study they have seen the advantages of incorporating technology into elite athlete performance. Their decade of expertise has culminated in a smartphone-based application with a backend system for cutting-edge athlete monitoring. A cooperation between computer scientists, sport scientists, and medical professionals has helped to discover gaps that technology can address, allowing athletes to progress in the proper direction. PmSys was well received by both athletes and staff, making it simpler to advocate for earlier bedtimes and modified training days.

#### **3. Dataset and Methodology**

#### *3.1. Dataset*

Over the course of 2 years, players from 2 participant elite female soccer teams in the PmSys system have contributed to the collection of subjective reports through a mobile application, where they used questionnaires for registering their individual subjective responses to a list of metrics daily. In this work, we use a subset of these metrics as listed below.


#### *3.2. Proposed Investigations*

**Time series models:** We propose to benchmark different time series models, such as purely recurrent models (RNN, LSTM, GRU, RNNPlus, LSTMPLus, GRU-Plus), models of mixed recursive convolutional types (RNN-FCNPlus, LSTM-FCNPlus, GRU-FCNPlus), ensemble of deep CNN models (InceptionTime), and multivariate versions of the recurrent models (MRNN-FCNPlus, MLSTM-FCNPlus, MGRU-FCNPlus).

**Univariate vs. multivariate prediction:** A univariate time series has a single timedependent variable. Using only the readiness parameter from our dataset to predict readiness is an example of a univariate time series. A multivariate time series, on the other hand, has more than one time-dependent variable. We propose to use mood, stress, soreness, and fatigue, along with readiness, as a multivariate time series, to compare univariate vs. multivariate prediction performance of the readiness parameter.

**Window size:** We propose to compare the performance of running trained models on test data with different sliding window sizes, both in terms of input window (indicating the amount of data to be used for making predictions) and output window (indicating the amount of data to be predicted).

**Training on team vs. training on player:** Motivated by the findings of Wiik et al. [18], we propose to compare the accuracy of predicting the readiness parameter for a single player, using a model trained on data from the same player only versus using a model trained on data from the player's entire team.

**Hyperparameter configuration:** We propose to investigate the influence of hyperparameters on prediction accuracy, in particular batch size, number of epochs in training, and data shuffling.

**Data imputation:** As our dataset is composed of subjectively reported metrics, a common problem is missing data (i.e., responses not being recorded every day). Missing data causes time series to have gaps. Several solutions can address this challenge: (i) replacing all gaps with 0, (ii) filling in the gaps with the value from the previous day, and (iii) removing all gaps and concatenating the data array.

#### *3.3. Implementation*

Python is a suitable language for working with large data sets since it is easy to use and has many already implemented libraries related to machine learning. We use **pandas** to extract data from file and preprocessing. Pandas' data functionality is built

on the NumPy library, a lower-level tool. Includes special techniques for working with numeric tables and time series. **Tsai** is a state-of-the-art deep learning library for time series and sequences (https://timeseriesai.github.io/tsai/ (accessed on 1 June 2022)). It is an open-source deep learning package built on top of Pytorch and fastai [22], focusing on state-of-the-art techniques for time series tasks such as classification, regression, forecasting, and imputation. This library contains a number of ready-to-use deep learning models that are possible to run directly in Python for time series prediction, which we use. To test the performance of various models, we also use the process and system utilities library (**psutil**). Table 1 presents the specifications of the hardware and software components of the benchmark implementation of our proposed framework.


**Table 1.** System specifications for the benchmark implementation.

#### *3.4. Metrics*

**Prediction performance:** The metrics used to evaluate prediction models operate on a set of continuous values (with an infinite number of items) and therefore differ slightly from the metrics used for classification tasks. The most popular metrics for regression models are Mean Squared Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). In the following, we primarily use MSE. MSE measures the average sum of the square of the difference between the actual value and the predicted value for all data points. Exponentiation is performed, so negative values are not compensated with positive ones. Compared to the MAE, MSE has several advantages: it emphasizes big mistakes over more minor mistakes. It is differentiable, which allows it to be more efficiently used to find the minimum or maximum values using mathematical methods. The lower the MSE, the higher the prediction accuracy.

$$MSE = \frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2$$

**System performance:** The primary metrics for measuring performance are usually time spent on training and testing models, processor, memory, disk, and Graphics Processing Unit (GPU) usage. We primarily focus on computation without the requirement for a GPU, therefore we measure all other parameters except the GPU usage: *(i) Time usage:* Time is measured in seconds. For measuring the time to train and test each model, we use the time module (https://docs.python.org/3/library/time.html (accessed on 1 June 2022)) from the Python standard library, which provides various time-related functions. *(ii) CPU, memory and disk usage:* To measure CPU, memory, and disk usage, we use a cross-platform library called psutil (https://pypi.org/project/psutil/ (accessed on 1 June 2022)). This library has a set of utilities for obtaining information about system performance (since our processor consists of 8 cores, we use the utility cpu\_percent() to obtain information about a load of each core in percentage. We also use psutil to measure memory and disk usage. The virtual\_memory() utility function provides information about various memory parameters. We are interested in the parameter used, which shows the amount of memory

used at a given time. The disk\_usage() function also has the parameter used that provides information about the used disk space at a given time). We measure the used memory and disk in GiB. We measure CPU, memory, and disk usage with an interval of one second.

#### **4. Selected Results**

#### *4.1. Benchmarking Different Time Series Models*

We ran the 13 different models listed in Section 3 on data from one team, predicting the readiness parameter for a single player using the model trained on the player's own data. Initial values for the hyperparameters batch size and number of epochs were chosen based on examples in the Tsai code repository (https://github.com/timeseriesAI/tsai/ tree/main/tutorial\_nbs (accessed on 1 June 2022)). For the sliding window size, multiple values (3, 5, 7, 14, 21, 28, 35, 42, 49) were considered. Missing data were treated with and without gaps. Table 2 presents the best performing models and their corresponding MSE for univariate prediction. The best results were obtained using a sliding window of size 3 and data without gaps. The initial value for the number of epochs (200) was suitable for almost all models, except for the GRU, where overfitting was observed in the training, and the value reduced to 100. Almost all models could predict the ground truth contour of readiness over time quite well with a slight deviation, except for GRUPlus. Some models also were able to predict positive and negative peaks successfully, especially the InceptionTime, MGRU-FCNPlus, MLSTM-FCNPlus, RNN-FCNPlus, MRNN-FCNPlus, GRU-FCNPlus, and LSTM-FCNPlus.


**Table 2.** MSE values for top models. Univariate prediction for single player, input window size: 3, no zero-padding.

#### *4.2. Univariate vs. Multivariate Prediction*

We ran the same configurations to carry out the initial experiments for multivariate scenario, with the difference that multivariate times series consisting of readiness, mood, stress, soreness, and fatigue, where the target parameter to predict is readiness, are used. After initial observations of multivariate prediction, 200 epochs turned out to be too high and influenced the results of predictions. As the plots of the training and validation loss function for various models showed, many of them were already trained after 20 epochs; therefore, we decided to correct the number of epochs for all models and rerun the initial experiments with sliding window sizes of 3, 5, 7, 28, and 49.

Table 3 presents the best performing models and their corresponding MSE for multivariate prediction with a higher number of epochs (top), and a lower number of epochs (bottom). With a large number of epochs, only three neural networks had an MSE below 2. Where MSE for the top ten models for univariate prediction was within the range 1191–1480, the MSE for only the four best models with multivariate prediction was within 1325–1603, which meant that employing univariate prediction was more accurate. The best results for both multivariate and univariate prediction were obtained for a sliding window of size 3 and data without gaps. Overfitting influenced multivariate prediction to a greater

extent. Similar to univariate prediction, InceptionTime was among the best models, with LSTMPlus and InceptionTime predictions closest to ground truth overall, and the plots of predicted against actual values of readiness for most models adequate for following peaks.

**Table 3.** MSE for top models. Multivariate prediction for single player, input window size: 3, no zero-padding. Number of epochs: 200 (top), 20 (bottom).


#### *4.3. Input and Output Window Size*

To investigate how the input window and output window sizes influence prediction in a more practical context, we focused on a daily and weekly predictions as might be relevant to soccer clubs during a game season. We ran experiments using a period of 2 years to train on the entire team and predict the readiness for one player for (7, 14, 21) days as the input window, and 1 and 7 days as the output window. Table 4 presents the MSE values for univariate prediction with LSTMPlus as a representative model, with different window sizes, for Team A. We note that similar results are obtained for Team B, and the MSE value increases for each day predicted in the future (see Figure 1), which was against our initial expectation of a somewhat weekly pattern (and therefore reduced MSE for 7 days). The increase in inaccuracy is possibly due to readiness being a continuous rather than periodically peaking parameter, meaning that the days immediately leading up to a day are important. If the peaks were more periodic, the predictions for future days which are ahead by multiple days (e.g., same day of the week in the upcoming week) would be possible to predict without relying on the whole of the leading week.

**Figure 1.** MSE per predicted day. Training on the entire team and testing on single player from Team B, number of epochs: 30, batch size: 5, input window: 21, output window: 7.


**Table 4.** MSE for LSTMPlus with different input and output window sizes (Team A).

#### *4.4. Training on Team vs. Training on Player*

Previous related work has proven that training on the entire team and predicting readiness for a single player can give promising results [18,21,23]. In Table 5, we present the results from experiments on Team A with 30 epochs and a batch size of 5, comparing the performance of models trained on the entire team and trained on a single player for predicting the readiness for the player. Overall, training on the entire team and predicting for one player show lower MSE values.

**Table 5.** MSE for LSTMPlus trained on single player vs. trained on entire team (Team A).


Figure 2 demonstrates the viability of training on the entire team, where the predictions follow the actual values quite accurately and the peaks are anticipated correctly. We also observe that the MSE value increases as the input window increases, possibly indicating that predicting the next day using data from the last week (input window: 7, output window: 1) might be the optimal use case for our scenario.

**Figure 2.** *Cont.*

**Figure 2.** Actual and predicted values for readiness for player from Team A. Training on entire team, number of epochs: 30, batch size: 5. (**a**) input window: 7; (**b**) input window 21.

#### *4.5. Hyperparameter Configuration*

Model optimization depends on hyperparameters, which include number of epochs, batch size and data shuffle. We pick a single player from a team, modify the hyperparameters and train the model on the entire team to predict on the player using univariate prediction with LSTMPLus. Table 6 presents the MSE values for different hyperparameter settings. The result showed that enabling shuffle led to a reduced MSE. The plots of actual and predicted values, however, were almost identical for different values for other hyperparameters. For all cases, the predicted peaks were quite close to the actual values. The only hyperparameter modification that reduced the MSE value was enabling data shuffle, and its influence was not significant.


**Table 6.** MSE for LSTMPlus with different hyperparameter values.

#### *4.6. System Performance*

Table 7 reports system performance metrics as recorded on the benchmark implementation specified in Table 1 for selected models for univariate and multivariate prediction.


**Table 7.** System performance metrics for benchmark implementation: univariate (top) and multivariate (bottom) prediction.

InceptionTime and LSTM have the best training time (200 epochs) for multivariate and univariate prediction, respectively. The testing time for all models (7100 data points) turned out to be approximately the same, and we see that testing proceeded very quickly. To calculate the CPU usage, we used a utility that measured the utilization for each core separately. The CPU utilization was measured only in the process of training the models. As a result, it turned out that the usage increased several times at the start of the training process and remained at the level of 75–100% (multivariate) and 65–100% (univariate) for all models. After the end of the training, the CPU utilization also immediately decreased. The values of memory and disk usage in the training process turned out to be insignificant. The minimum and maximum amount of memory and disk usage differed by about a few hundred MB. Overall, our preliminary analysis indicates that a framework as we have proposed in this work can be run in near real time without requiring a lot of resources.

#### **5. Conclusions**

In this work, we addressed the problem of predicting soccer players' ability to perform, from subjective self-reported wellness parameters collected using a commercially deployed digital health monitoring system called PmSys. We focused on readiness to play, which is a measure of a player's ability to complete a training session or play in a match, and experimented with a software framework for undertaking time series predictions to derive insights regarding the influence of different factors on prediction accuracy.

As an initial study, we investigated and presented our results regarding the following: (1) We tested various time series models to see which were capable of capturing adequate information for accurate predictions. (2) We compared univariate and multivariate approaches to see if we can leverage different wellness parameters in the dataset to predict readiness? (3) We looked at the impact of different input and output sizes to see how far back in the historical data it is necessary to go for training, and how far into the future it is possible to predict with acceptable accuracy. (4) We looked at how training on a single player versus training on the entire team affected the results, to see if prediction frameworks can leverage data from teams for making predictions on individual players. (5) We investigated the influence of different hyperparameters such as the number of epochs, batch size, and data shuffle on prediction accuracy.

We found out that most of the tested ML models are able to yield a reasonable accuracy in prediction, provided that training parameters are adjusted to avoid overfitting. In terms of performance, the best model for the multivariate prediction turned out to be InceptionTime, and for the univariate prediction, LSTM. We could not find evidence that multivariate predictions perform better within the context of our dataset and scenario, however, further analysis of the other parameters available from the PmSys dataset are necessary. We saw that daily predictions performed better than weekly predictions, indicating that the readiness parameter might not be as strongly periodic as initially assumed.

Training on the entire team and predicting for a single player resulted in more accurate overall results, especially when a team has high consistency across players, and the correct prediction of peaks. This is a first step toward determining the system's generalizability so that it may be used in other sports. Hyperparameter tuning did not have significant influence within our testing range, except for data shuffling, which improved performance.

In this work, we have tried to address various research aspects independently, and using different parts of the PmSys dataset at times; however, there are many investigations that need to be continued in a more systematic manner over the complete and extended dataset for a more comprehensive understanding of the potentially complex interplay between influencing factors on prediction performance. Dataset and pipeline standardization, including the evaluation of libraries other than Tsai, are also necessary for a large-scale analysis covering a wide range of parameter values and system settings. Other future work topics include the investigation of the contextual relevance of the samples (e.g., what effects does it have on the predictions when using off-season vs. on-season, weekday vs. weekend, match day vs. regular training), more advanced methods for handling missing data, and a deeper analysis of multivariate prediction with an augmented set of supplementary wellness and health parameters.

Overall, we demonstrated that it is possible to make relatively accurate predictions about elite soccer player readiness using ML algorithms such as LSTM, using the kind of datasets collected by athlete monitoring platforms such as PmSys. ML-based time series prediction pipelines, integrated with such platforms, can have a huge impact in the field of sports science and computer science, and our approach can be extended to various health, wellness, injury, and training parameters, as well as different sports as well.

**Author Contributions:** Conceptualization, D.J., M.A.R., and P.H.; methodology, D.J., M.A.R., P.H., and C.M.; software, S.K. and N.R.; validation, S.K., N.R., P.H., and C.M.; investigation, S.K., N.R., P.H., and C.M.; resources, D.J., M.A.R., and P.H.; data curation, M.B., C.M., and P.H.; writing—original draft preparation, C.M.; writing—review and editing, C.M., M.B., and P.H.; visualization, N.R.; supervision, C.M., D.J., M.A.R., and P.H.; project administration, P.H.; funding acquisition, D.J., M.A.R., and P.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work is partially funded by Tromsø Research Foundation. The funding body had no role in the study itself or in the writing of this paper.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study. The study was approved by both the Norwegian Privacy Data Protection Authority (reference number: 296155) and exempted from further user consent because the data were fully anonymous. All metadata were removed, and all files renamed to randomly generated file names before the original PmSys system exported the files from the cloud server. The study was also exempted from approval from the Regional Committee for Medical and Health Research Ethics—South East Norway and the Regional Committee for Medical and Health Research Ethics—Northern Norway since the collection of the data did not include a bio-bank, medical, or health data related to illness or interfere the normal operation of the players. Since the data are anonymous, the dataset is publicly shareable based on Norwegian and General Data Protection Regulation (GDPR) laws.

**Data Availability Statement:** The resources for reproducing the results can be found under https: //github.com/simula/pmsys (accessed on 26 June 2022).

**Conflicts of Interest:** Authors D.J. and P.H. both own shares in the Forzasys AS company developing among other things data collection systems and solutions for AI-based analysis of sports data. The PmSys system from Forzasys was used to collect the subjective reports, but there is no commercial interest from Forzasys regarding this publication and dataset. Otherwise, the authors declare no competing interests.

#### **References**


### *Proceeding Paper* **The Bootstrap for Testing the Equality of Two Multivariate Stochastic Processes with an Application to Financial Markets †**

**Ángel López-Oriona 1,\* and José A. Vilar 1,2**


**Abstract:** The problem of testing the equality of generating processes of two multivariate time series is addressed in this work. To this end, we construct two tests based on a distance measure between stochastic processes. The metric is defined in terms of the quantile cross-spectral densities of both processes. A proper estimate of this dissimilarity is the cornerstone of the proposed tests. Both techniques are based on the bootstrap. Specifically, extensions of the moving block bootstrap and the stationary bootstrap are used for their construction. The approaches are assessed in a broad range of scenarios under the null and the alternative hypotheses. The results from the analyses show that the procedure based on the stationary bootstrap exhibits the best overall performance in terms of both size and power. The proposed techniques are used to answer the question regarding whether or not the dotcom bubble crash of the 2000s permanently impacted global market behavior.

**Citation:** López-Oriona, Á.; Vilar, J.A. The Bootstrap for Testing the Equality of Two Multivariate Stochastic Processes with an Application to Financial Markets. *Eng. Proc.* **2022**, *18*, 38. https://doi.org/10.3390/ engproc2022018038

Academic Editors: Ignacio Rojas, Hector Pomares, Olga Valenzuela, Fernando Rojas and Luis Javier Herrera

Published: 8 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** multivariate time series; quantile cross-spectral density; frequency domain; moving block bootstrap; stationary bootstrap; dotcom bubble

#### **1. Introduction**

Comparison of time series often arises in multiple fields including machine learning, finance, economics, computer science, biology, medicine, physics, and psychology, among many others. For instance, it is not uncommon for an investor to have to determine if two particular assets show the same dynamic behavior over time based on historical data. In the same way, a physician often needs to find out to what extent two ECG signals recorded from different subjects exhibit similar patterns. There exist a wide variety of tools that have been used for these and similar purposes, including cluster analysis [1], classification [2], outlier detection [3], and comparisons through hypotheses tests [4]. It is worth highlighting that these techniques have mainly focused on univariate time series (UTS) [5], while the study of multivariate time series (MTS) has been given limited consideration [6].

In the context of hypotheses tests for time series, spectral quantities have played an important role. Specifically, testing for the equality of spectral densities has found substantial interest in the literature. Ref. [7] proposed a test for comparing spectral densities of stationary time series with unequal sample sizes. The procedure generalizes the class of tests presented in [8], which are based on an estimate of the *L*2-distance between the spectral density and its best approximation under the null hypothesis. Ref. [9] constructed a nonparametric test for the equality of spectral density matrices based on an *L*2-type statistic.

This work is devoted to constructing procedures to test for the equality of the so-called quantile cross-spectral density (QCD) between two independent MTS. Specifically, let *<sup>X</sup>*(1) *t* and *<sup>X</sup>*(2) *<sup>t</sup>* be two independent, *d*-variate, real-valued, strictly stationary stochastic processes. We fix a frequency *ω* ∈ [−*π*, *π*] and a pair of probability levels, *τ*, *τ* ∈ [0, 1], and we denote

the corresponding QCD matrices by f(*i*) (*ω*, *τ*, *τ* ), *i* = 1, 2. The hypotheses we consider can be stated as

$$H\_0: \mathfrak{f}\_{\mathbf{X}\_t^{(1)}} = \mathfrak{f}\_{\mathbf{X}\_t^{(2)}} \quad \text{against} \quad H\_1: \mathfrak{f}\_{\mathbf{X}\_t^{(1)}} \neq \mathfrak{f}\_{\mathbf{X}\_t^{(2)}} \tag{1}$$

where f *<sup>X</sup>*(1) and f *<sup>X</sup>*(2) are the corresponding sets of QCD matrices defined as

*t*

*t*

$$\mathfrak{f}\_{\mathbf{X}\_{l}^{(i)}} = \{ \mathfrak{f}^{(i)}(\omega, \mathfrak{r}, \mathfrak{r}') , \omega \in [-\pi, \pi] , \mathfrak{r} , \mathfrak{r}' \in [0, 1] \}. \tag{2}$$

where *i* = 1, 2. In order to perform the test in (1), we rely on a distance measure between stationary stochastic processes, so-called *dQCD*, which has already being utilized in several MTS data mining tasks [6,10–12]. This metric is simply the Euclidean distance between two complex vectors constructed by concatenating the terms in each collection of matrices (2) for some finite set of frequencies and probability levels. Hence, an equivalence occurs between the null hypothesis in (1) and the distance *dQCD* being zero for every possible set, making an estimate of this metric an appropriate tool to carry out the test in (1). The high ability of *dQCD* to detect every possible discrepancy between stochastic processes was shown in our previous work [10].

Two methods to perform the test in (1) are introduced in this manuscript. They are based on the moving block bootstrap (MBB) (see [13,14]) and the stationary bootstrap (SB) (see [15]). Both approaches are compared in terms of size and power by means of a broad simulation study. Finally, the tests are applied to answer the question regarding whether or not the dotcom bubble burst of 2000 changed the global behavior of financial markets.

The rest of the paper is organized as follows. The distance *dQCD* between stochastic processes is defined in Section 2. The two techniques to carry out the test in (1) are presented in Section 3. The results from the simulation study performed to compare the proposed tests are reported in Section 4. Section 5 contains the financial application and Section 6 concludes.

#### **2. A Distance Measure between Stochastic Processes**

Let {*Xt*, *<sup>t</sup>* ∈ Z} = {(*Xt*,1, ... , *Xt*,*d*), *<sup>t</sup>* ∈ Z} be a *<sup>d</sup>*-variate real-valued strictly stationary stochastic process. Denote by *Fj* the marginal distribution function of *Xt*,*j*, *j* = 1, ... , *d*, and by *qj*(*τ*) = *F*−<sup>1</sup> *<sup>j</sup>* (*τ*), *<sup>τ</sup>* ∈ [0, 1], the corresponding quantile function. Fix *<sup>l</sup>* ∈ Z and an arbitrary pair of quantile levels (*τ*, *τ* ) ∈ [0, 1] 2, and consider the cross-covariance of the indicator functions *I*{*Xt*,*j*<sup>1</sup> ≤ *qj*<sup>1</sup> (*τ*)} and *I*{*Xt*<sup>+</sup>*l*,*j*<sup>2</sup> ≤ *qj*<sup>2</sup> (*τ* )} given by

$$\gamma\_{j\_1,j\_2}(l,\tau,\tau') = \text{Cov}(I\{X\_{t,j\_1} \le q\_{j\_1}(\tau)\}, I\{X\_{t+l,j\_2} \le q\_{j\_2}(\tau')\})\_{\tau'}$$

for 1 ≤ *j*1, *j*<sup>2</sup> ≤ *d*. Taking *j*<sup>1</sup> = *j*<sup>2</sup> = *j*, the function *γj*,*j*(*l*, *τ*, *τ* ), with (*τ*, *τ* ) ∈ [0, 1] 2, so-called quantile autocovariance function (QAF) of lag *l*, generalizes the traditional autocovariance function.

Under suitable summability conditions (mixing conditions), the Fourier transform of the cross-covariances is well-defined and the *quantile cross-spectral density* (QCD) is given by

$$f\_{\dot{j}\_1,\dot{j}\_2}(\omega,\tau,\tau') = (1/2\pi) \sum\_{l=-\infty}^{\infty} \gamma\_{\dot{j}\_1,\dot{j}\_2}(l,\tau,\tau')e^{-il\omega},\tag{3}$$

for 1 ≤ *<sup>j</sup>*1, *<sup>j</sup>*<sup>2</sup> ≤ *<sup>d</sup>*, *<sup>ω</sup>* ∈ R and *<sup>τ</sup>*, *<sup>τ</sup>* ∈ [0, 1]. Note that <sup>f</sup>*j*1,*j*<sup>2</sup> (*ω*, *<sup>τ</sup>*, *<sup>τ</sup>* ) is complex-valued so that it can be represented in terms of its real and imaginary parts, which will be denoted by (f*j*1,*j*<sup>2</sup> (*ω*, *τ*, *τ* )) and (f*j*1,*j*<sup>2</sup> (*ω*, *τ*, *τ* )), respectively.

For fixed quantile levels (*τ*, *τ* ), QCD is the cross-spectral density of the bivariate process (*I*{*Xt*,*j*<sup>1</sup> ≤ *qj*<sup>1</sup> (*τ*)}, *I*{*Xt*,*j*<sup>2</sup> ≤ *qj*<sup>2</sup> (*τ* )}). Therefore, QCD measures dependence between two components of the multivariate process over different ranges of their joint distribution and across frequencies. This quantity can be evaluated for every couple of components on a range of frequencies Ω and of quantile levels T in order to obtain a complete representation of the process, i.e., consider the set of matrices

$$\mathfrak{f}\_{\mathbf{X}\_{\ell}}(\Omega,\mathcal{T}) = \{ \mathfrak{f}(\omega,\mathfrak{r},\mathfrak{r}'), \ \omega \in \Omega, \ \mathfrak{r},\mathfrak{r}' \in \mathcal{T} \}\_{\mathsf{V}}$$

where f(*ω*, *τ*, *τ* ) denotes the *<sup>d</sup>* × *<sup>d</sup>* matrix in C given by

$$f(\omega, \tau, \tau') = (f\_{\dot{\jmath}\_1, \dot{\jmath}\_2}(\omega, \tau, \tau'))\_{1 \le \dot{\jmath}\_1, \dot{\jmath}\_2 \le d}.$$

Representing *X<sup>t</sup>* through f*X<sup>t</sup>* , complete information on the general dependence structure of the process is available. Comprehensive discussions about the favorable properties of the quantile cross-spectral density are given in [10,16].

According to the prior arguments, a dissimilarity measure between two multivariate processes, *<sup>X</sup>*(1) *<sup>t</sup>* and *<sup>X</sup>*(2) *<sup>t</sup>* , could be established by comparing their representations in terms of the QCD matrices, f *<sup>X</sup>*(1) *t* and f *<sup>X</sup>*(2) *t* , evaluated on a common range of frequencies and quantile levels. Specifically, for a given set of *K* different frequencies Ω = {*ω*1, ... , *ωK*}, and *<sup>r</sup>* quantile levels <sup>T</sup> <sup>=</sup> {*τ*1, ... , *<sup>τ</sup>r*}, each stochastic process *<sup>X</sup>*(*u*) *<sup>t</sup>* , *u* = 1, 2, is characterized by means of a set of *<sup>r</sup>*<sup>2</sup> vectors {**Ψ**(*u*) *<sup>τ</sup>i*,*τi* , 1 <sup>≤</sup> *<sup>i</sup>*, *<sup>i</sup>* ≤ *r*} given by

$$\mathbf{Y}\_{\mathbf{T}\_{\mathbf{l}},\mathbf{T}\_{\mathbf{l}'}}^{(u)} = (\mathbf{Y}\_{\mathbf{1},\mathbf{T}\_{\mathbf{l}},\mathbf{T}\_{\mathbf{l}'}}^{(u)}, \dots, \mathbf{Y}\_{\mathbf{K},\mathbf{T}\_{\mathbf{l}},\mathbf{T}\_{\mathbf{l}'}}^{(u)}),\tag{4}$$

where each **Ψ**(*u*) *k*,*τi*,*τi* , *k* = 1, ... , *K* consists of a vector of length *d*<sup>2</sup> formed by rearranging by rows the elements of the matrix f(*ωk*, *τi*, *τi*).

Once the set of *<sup>r</sup>*<sup>2</sup> vectors **<sup>Ψ</sup>**(*u*) *<sup>τ</sup>i*,*τi* is obtained, they are all concatenated in a vector **<sup>Ψ</sup>**(*u*) in the same way as vectors **Ψ**(*u*) *<sup>k</sup>*,*τi*,*τi* constitute **<sup>Ψ</sup>**(*u*) *<sup>τ</sup>i*,*τi* in (4). Then, we define the dissimilarity between *<sup>X</sup>*(1) *<sup>t</sup>* and *<sup>X</sup>*(2) *<sup>t</sup>* by means of:

$$d\_{\mathbb{Q}CD}(\mathbf{X}\_t^{(1)}, \mathbf{X}\_t^{(2)}) = \mathbf{Y}^{(1)} - \mathbf{Y}^{(2)},\tag{5}$$

where *v* = (∑*<sup>n</sup> <sup>k</sup>*=<sup>1</sup> |*vk*| <sup>2</sup>)1/2, with *v* = (*v*1, ... , *vn*) being an arbitrary complex vector in C*n*, and |·| stands for the modulus of a complex number. Note that *dQCD* in (5) can also be expressed as

$$d\_{\mathcal{QCD}}(\mathbf{X}\_t^{(1)}, \mathbf{X}\_t^{(2)}) = \left[ \mathfrak{R}\_\upsilon(\mathbf{Y}^{(1)}) - \mathfrak{R}\_\upsilon(\mathbf{Y}^{(2)})^2 + \mathfrak{R}\_\upsilon(\mathbf{Y}^{(1)}) - \mathfrak{R}\_\upsilon(\mathbf{Y}^{(2)})^2 \right]^{1/2}.$$

where *<sup>v</sup>* and *<sup>v</sup>* denote the element-wise real and imaginary part operators, respectively. Since, in practice, we only have finite-length realizations of the stochastic processes

*<sup>X</sup>*(1) *<sup>t</sup>* and *<sup>X</sup>*(2) *<sup>t</sup>* , the value of *dQCD* is unknown and a proper estimate must be obtained.

Let {*X*1, ... , *<sup>X</sup>T*} be a realization from the process (*Xt*)*t*∈<sup>Z</sup> so that *<sup>X</sup><sup>t</sup>* = (*Xt*,1, ... , *Xt*,*d*), *t* = 1, ... , *T*. For arbitrary *j*1, *j*<sup>2</sup> ∈ {1, ... , *d*} and (*τ*, *τ* ) ∈ [0, 1] 2, the authors of [16] propose to estimate f*j*1,*j*<sup>2</sup> (*ω*, *τ*, *τ* ) considering a smoothed cross-periodogram based on the indicator functions *<sup>I</sup>*{*F*<sup>ˆ</sup> *<sup>T</sup>*,*j*(*Xt*,*j*)}, where *<sup>F</sup>*<sup>ˆ</sup> *<sup>T</sup>*,*j*(*x*) = *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *I*{*Xt*,*<sup>j</sup>* ≤ *x*} denotes the empirical distribution function of *Xt*,*j*. This approach extends to the multivariate case for the estimator proposed by [17] in the univariate setting. More specifically, the *rank-based copula cross periodogram* (CCR-periodogram) is defined by

$$I\_{T,R}^{j\_1,j\_2}(\omega,\tau,\tau') = \frac{1}{2\pi T} d\_{T,R}^{j\_1}(\omega,\tau) d\_{T,R}^{j\_2}(-\omega,\tau')\_{\tau}$$

where

$$d\_{T,R}^{j}(\omega,\tau) = \sum\_{t=1}^{T} I\{\hat{F}\_{T,j}(X\_{t,j}) \le \tau\} e^{-i\omega t}.$$

$$\mathbf{G}\_{T,R}^{\mathbf{j}\_1,\mathbf{j}\_2}(\omega,\mathbf{r},\mathbf{r}') = \frac{2\pi}{T} \sum\_{\mathbf{s}=1}^{T-1} \mathcal{W}\_T(\omega - \frac{2\pi \mathbf{s}}{T}) I\_{T,R}^{\mathbf{j}\_1,\mathbf{j}\_2}(\frac{2\pi \mathbf{s}}{T},\mathbf{r},\mathbf{r}'),\tag{6}$$

where

*CCR-periodogram* takes the form

$$W\_T(\mu) = \sum\_{v = -\infty}^{\infty} \frac{1}{h\_T} W(\frac{\mu + 2\pi v}{h\_T}),$$

with *hT* > 0 being a sequence of bandwidths such that *hT* → 0 and *ThT* → ∞ as *T* → ∞, and *W* is a real-valued, even weight function with support [−*π*, *π*]. Consistency and asymptotic performance of the smoothed CCR-periodogram *G*ˆ*j*1,*j*<sup>2</sup> *<sup>T</sup>*,*<sup>R</sup>* (*ω*, *τ*, *τ* ) are established in Theorem S4.1 of [16].

By considering the smoothed CCR-periodogram in every component of the vectors **Ψ**(1) and **Ψ**(2) , we obtain their estimated counterparts **<sup>Ψ</sup>**-(1) and **<sup>Ψ</sup>**-(2) , which allow us to construct a consistent estimate of *dQCD* by defining

$$
\hat{d}\_{\mathbf{QCD}}(\mathbf{X}\_t^{(1)}, \mathbf{X}\_t^{(2)}) = \hat{\mathbf{Y}}^{(1)} - \hat{\mathbf{Y}}^{(2)}.\tag{7}
$$

Quantity *d* - *QCD*(*X*(1) *<sup>t</sup>* , *<sup>X</sup>*(2) *<sup>t</sup>* ) has been successfully applied to perform clustering of MTS in crisp [6] and fuzzy [10–12] frameworks.

#### **3. Testing for Equality of Quantile Cross-Spectral Densities of two MTS**

In this section, two procedures to address the problem of testing (1) are constructed. They are based on the distance *dQCD* defined in (5). Both approaches consider well-known bootstrap methods for dependent data. The key principle is to draw pseudo-time series capturing the dependence structure in order to approximate the distribution of *d* - *QCD* under the null hypothesis.

#### *3.1. A Test Based on the Moving Block Bootstrap*

In this section, we introduce a bootstrap test based on a modification of the classical moving block bootstrap (MBB) method proposed by [13,14]. MBB generates replicates of the time series by joining blocks of fixed length, which have been drawn randomly with replacement from among blocks of the original realizations. This approach allows us to mimic the underlying dependence structure without assuming specific parametric models for the generating processes.

Given two realizations of the *<sup>d</sup>*-dimensional stochastic processes *<sup>X</sup>*(1) *<sup>t</sup>* and *<sup>X</sup>*(2) *<sup>t</sup>* , denoted by *Xt* (1) <sup>=</sup> {*X*(1) <sup>1</sup> , ... , *<sup>X</sup>*(1) *<sup>T</sup>* } and *X<sup>t</sup>* (2) <sup>=</sup> {*X*(2) <sup>1</sup> , ... , *<sup>X</sup>*(2) *<sup>T</sup>* }, respectively, the procedure proceeds as follows.

Step 1. Fix a positive integer, *b*, representing the block size, and take *k* equal to the smallest integer greater than or equal to *T*/*b*.

Step 2. For each realization, define the block *<sup>B</sup>*(*i*) *<sup>j</sup>* <sup>=</sup> {*X*(*i*) *<sup>j</sup>* , ... , *<sup>X</sup>*(*i*) *<sup>j</sup>*+*b*−1}, for *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>q</sup>*, with *<sup>q</sup>* <sup>=</sup> *<sup>T</sup>* <sup>−</sup> *<sup>b</sup>* <sup>+</sup> 1. Let *<sup>B</sup>* <sup>=</sup> {*B*(1) *<sup>j</sup>* , ... , *<sup>B</sup>*(1) *<sup>q</sup>* , *<sup>B</sup>*(2) *<sup>j</sup>* , ... , *<sup>B</sup>*(2) *<sup>q</sup>* } be the set of all blocks, those coming from *Xt* (1) and those coming from *<sup>X</sup><sup>t</sup>* (2) .

Step 3. Draw two sets of *<sup>k</sup>* blocks, *<sup>B</sup>*(*i*)<sup>∗</sup> <sup>=</sup> *{B*(*i*) <sup>1</sup> , ... , *<sup>B</sup>*(*i*) *<sup>k</sup>* }, *i* = 1, 2, with equiprobable distribution from the set *<sup>B</sup>*. Note that each *<sup>B</sup>*(*i*) *<sup>j</sup>* , *j* = 1, ... , *k*, *i* = 1, 2, is a *b*-dimensional MTS.

Step 4. Construct the pseudo-time series *X<sup>t</sup>* (*i*)<sup>∗</sup> by considering the first *<sup>T</sup>* temporal components of *B*(*i*)∗, *i* = 1, 2. Compute the bootstrap version *d* -∗ *QCD* of *d* - *QCD* based on the pseudo-time series *Xt* (1)<sup>∗</sup> and *<sup>X</sup><sup>t</sup>* (2)∗ .

Step 5. Repeat Steps 3 and 4 a large number *B* of times to obtain the bootstrap replicates *d* -(1)∗ *QCD*,..., *d* -(*B*)<sup>∗</sup> *QCD*.

Step 6. Given a significance level *α*, compute the quantile of order 1 − *α*, *q*<sup>∗</sup> <sup>1</sup>−*α*, based on the set {*d* -(1)<sup>∗</sup> *QCD*, ... , *d* -(*B*)<sup>∗</sup> *QCD*}. Then, the decision rule consists of rejecting the null hypothesis *H*<sup>0</sup> if *d* - *QCD*(*X*(1) *<sup>t</sup>* , *<sup>X</sup>*(2) *<sup>t</sup>* ) > *q*<sup>∗</sup> <sup>1</sup>−*α*.

Note that, by considering the whole set of blocks *B* in Step 2, both pseudo-time series *Xt* (1)<sup>∗</sup> and *<sup>X</sup><sup>t</sup>* (2)<sup>∗</sup> contain information about the original series *<sup>X</sup><sup>t</sup>* (1) and *<sup>X</sup><sup>t</sup>* (2) in equal measure. This way, the bootstrap procedure is able to approximate correctly the distribution of the test statistic *d* - *QCD* under the null hypothesis even if this hypothesis is not true.

From now on, we will refer to the test presented in this section as MBB.

#### *3.2. A Test Based on the Stationary Bootstrap*

The second bootstrap mechanism to approximate the distribution of *d* - *QCD* is an adaptation of the classical stationary bootstrap (SB) proposed by [15]. This resampling method is aimed at overcoming the lack of stationarity of the MBB procedure. Note that the distance measure *dQCD* is well-defined only for stationary processes, so it is desirable that a bootstrap technique based on this metric generates stationary pseudo-time series.

Given two *d*-dimensional realizations, denoted by *X<sup>t</sup>* (1) <sup>=</sup> {*X*(1) <sup>1</sup> , ... , *<sup>X</sup>*(1) *<sup>T</sup>* } and *Xt* (2) <sup>=</sup> {*X*(2) <sup>1</sup> , ... , *<sup>X</sup>*(2) *<sup>T</sup>* }, from the stochastic processes *<sup>X</sup>*(1) *<sup>t</sup>* and *<sup>X</sup>*(2) *<sup>t</sup>* , respectively, the SB method proceeds as follows.

Step 1. Fix a positive real number *p* ∈ [0, 1].

Step 2. Consider the set *X* <sup>7</sup>*<sup>t</sup>* <sup>=</sup> {*X<sup>t</sup>* (1) , *X<sup>t</sup>* (2) }. Draw randomly two temporal observations from *X* <sup>7</sup>*t*. Note that each one of these observations is of the form *<sup>X</sup>*(*k<sup>i</sup>* ) *<sup>j</sup><sup>i</sup>* for some *<sup>k</sup><sup>i</sup>* <sup>=</sup> 1, 2 and *j <sup>i</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*, *<sup>i</sup>* <sup>=</sup> 1, 2. Observation *<sup>X</sup>*(*k<sup>i</sup>* ) *<sup>j</sup><sup>i</sup>* is the first element of the pseudo-series *X<sup>t</sup>* (*i*)∗ , *i* = 1, 2.

Step 3. For *<sup>i</sup>* <sup>=</sup> 1, 2, given the last observation *<sup>X</sup>*(*k<sup>i</sup>* ) *<sup>j</sup><sup>i</sup>* , the next bootstrap replication in *X<sup>t</sup>* (*i*)∗ is defined as *<sup>X</sup>*(*k<sup>i</sup>* ) *<sup>j</sup>i*+<sup>1</sup> with probability 1 − *p*, and drawn from the set *X* <sup>7</sup>*<sup>t</sup>* with probability *<sup>p</sup>*. When *j <sup>i</sup>* = *<sup>T</sup>*, the selected observation is *<sup>X</sup>*(2) <sup>1</sup> if *<sup>k</sup><sup>i</sup>* <sup>=</sup> 1 and *<sup>X</sup>*(1) <sup>1</sup> if *<sup>k</sup><sup>i</sup>* = 2.

Step 4. Repeat Step 3 until the pseudo-series *X<sup>t</sup>* (1)<sup>∗</sup> and *<sup>X</sup><sup>t</sup>* (2)<sup>∗</sup> contain *<sup>T</sup>* observations. Based on the pseudo-series *Xt* (1)<sup>∗</sup> and *<sup>X</sup><sup>t</sup>* (2)∗ , compute the bootstrap version *d* -∗ *QCD* of *d* - *QCD*.

Step 5. Repeat Steps 3–4 *B* times to obtain *d* -(1)∗ *QCD*,..., *d* -(*B*)<sup>∗</sup> *QCD*.

Step 6. Given a significance level *α*, compute the quantile of order 1 − *α*, *q*<sup>∗</sup> <sup>1</sup>−*α*, based on the set {*d* -(1)<sup>∗</sup> *QCD*, ... , *d* -(*B*)<sup>∗</sup> *QCD*}. Then, the decision rule consists of rejecting the null hypothesis *H*<sup>0</sup> if *d* - *QCD*(*X*(1) *<sup>t</sup>* , *<sup>X</sup>*(2) *<sup>t</sup>* ) > *q*<sup>∗</sup> <sup>1</sup>−*α*.

It is worth remarking that, like the MBB procedure, a proper approximation of the distribution of *d* - *QCD* under the null hypothesis is also ensured here due to considering the pooled time series *X* <sup>7</sup>*<sup>t</sup>* in the generating mechanism.

From now on, we will refer to the test presented in this section as SB.

#### **4. Simulation Study**

In this section, we carry out a set of simulations with the aim of assessing the performance with finite samples of the testing procedures presented in Section 3. After describing the simulation mechanism, the main results are discussed.

#### *4.1. Experimental Design*

The effectiveness of the testing methods was examined with pairs of MTS realizations, *Xt* (1) <sup>=</sup> {*X*(1) <sup>1</sup> ,..., *<sup>X</sup>*(1) *<sup>T</sup>* } and *X<sup>t</sup>* (2) <sup>=</sup> {*X*(2) <sup>1</sup> ,..., *<sup>X</sup>*(2) *<sup>T</sup>* }, simulated from bivariate processes selected to cover different dependence structures. Specifically, three types of generating models were considered, namely VARMA processes, nonlinear processes, and dynamic conditional correlation models [18]. In all cases, the deviation from the null hypothesis of equal underlying processes was established by means of differences in the coefficients of the generating models. In each scenario, the degree of deviation between the simulated realizations is regulated by a specific parameter *δ* included in the formulation of the models. The specific generating models concerning each scenario are given below, taking into account that, unless otherwise stated, the error process (*t*,1, *t*,2) consists of iid realizations following a bivariate Gaussian distribution.

**Scenario 1**. VAR(1) models given by

$$
\begin{pmatrix} X\_{t,1} \\ X\_{t,2} \end{pmatrix} = \begin{pmatrix} 0.1+\delta & 0.1+\delta \\ 0.1+\delta & 0.1+\delta \end{pmatrix} \begin{pmatrix} X\_{t-1,1} \\ X\_{t-1,2} \end{pmatrix} + \begin{pmatrix} \varepsilon\_{t,1} \\ \varepsilon\_{t,2} \end{pmatrix}.
$$

**Scenario 2**. TAR (threshold autoregressive) models given by

$$
\begin{pmatrix} X\_{l,1} \\ X\_{l,2} \end{pmatrix} = \begin{pmatrix} (0.9-\delta)X\_{l-1,2}I\_{\{|X\_{l-1,1}| \le 1\}} + (\delta-0.3)X\_{l-1,1}I\_{\{|X\_{l-1,1}| > 1\}} \\ (0.9-\delta)X\_{l-1,1}I\_{\{|X\_{l-1,2}| \le 1\}} + (\delta-0.3)X\_{l-1,2}I\_{\{|X\_{l-1,2}| > 1\}} \end{pmatrix} + \begin{pmatrix} \varepsilon\_{l,1} \\ \varepsilon\_{l,2} \end{pmatrix}.
$$

**Scenario 3**. GARCH models in the form (*Xt*,1, *Xt*,2) = (*σt*,1*t*,1, *σt*,2*t*,2) with

$$
\sigma\_{t,1}^2 = 0.01 + 0.05X\_{t-1,1}^2 + 0.94\sigma\_{t-1,1'}^2
$$

$$
\sigma\_{t,2}^2 = 0.5 + 0.2X\_{t-1,2}^2 + 0.5\sigma\_{t-1,2'}^2
$$

$$
\begin{pmatrix} \varepsilon\_{t,1} \\ \varepsilon\_{t,2} \end{pmatrix} \sim N\left[ \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 & \rho\_t \\ \rho\_t & 1 \end{pmatrix} \right],
$$

where the correlation between the standardized shocks is given by *ρ<sup>t</sup>* = 0.9 − *δ*.

Series *Xt* (1) is always generated by taking *<sup>δ</sup>* <sup>=</sup> 0, while *<sup>X</sup><sup>t</sup>* (2) is generated using different values of *δ*, thus allowing us to obtain simulation schemes under the null hypothesis, when *δ* = 0 also for *X<sup>t</sup>* (2) , and under the alternative hypothesis otherwise.

In each trial, *B* = 200 bootstrap replicates were considered to approximate the distribution of the test statistic under the null hypothesis. In all cases, we selected the bandwidth *hT* = *T*−1/3 to compute *d* - *QCD* and its bootstrap replicates. This choice ensures the consistency of the smoothed CCR-periodogram as an estimate of QCD (Theorem S4.1 in [16]). As for the two key hyperparameters, we chose *b* = *T*1/3 and *p* = *T*−1/3 for the block size in MBB and the probability in SB, respectively, since both values led to the best overall behavior of both procedures in our numerical experiments. Note that these choices are also consistent with the related literature. For instance, ref. [19] addressed the issue of selecting *b* in the context of bias and variance bootstrap estimation, concluding that the optimal block size is of order *T*1/3. However, since the mean block size in SB corresponds to 1/*p*, it is reasonable to select *p* of order *T*−1/3.

Simulations were carried out for different values of series length *T*. Our results show that both bootstrap procedures exhibit relatively high power when low-to-moderate sample sizes are used. However, larger sample sizes are necessary to reach a reasonable approximation of the nominal level. For this reason, the results included in the next section correspond to *T* ∈ {500, 1000}, in the case of the null hypothesis, and *T* ∈ {100, 200, 300}, in the case of the alternative hypothesis. In all cases, the results were obtained for a significance level *α* = 0.05.

#### *4.2. Results and Discussion*

The results under the null hypothesis are summarized in Table 1, where the simulated rejection probabilities of the proposed bootstrap tests are displayed.


**Table 1.** Simulated rejection probabilities under the null hypothesis for *α* = 0.05.

Table 1 clearly shows that both bootstrap techniques exhibit different behaviors under the null hypothesis. The MBB method provides rejection probabilities greater than expected for both values of *T*. In fact, the deviation from the theoretical significance level is more marked when *T* = 1000, particularly for Scenario 3. The technique SB seems to adjust the significance level quite well in all the analyzed scenarios, which makes this test the most accurate one in terms of size approximation.

The estimated rejection probabilities under the set of considered alternative hypotheses are provided in Table 2.

**Table 2.** Simulated rejection probabilities of the bootstrap tests under several alternative hypotheses determined by the deviation parameter *δ*.


In short, MBB shows the best performance in terms of power but an overrejecting behavior in terms of size.

#### **5. Case Study: Did the Dotcom Bubble Change the Global Market Behavior?**

This section is devoted to analyzing the effect that the dotcom bubble crash produced over the global economy. Specifically, the described bootstrap procedures are used to determine whether this landmark event had a permanent effect on the behavior of financial markets worldwide.

#### *5.1. The Dotcom Bubble Crash*

Historically, the dotcom bubble was a rapid rise in U.S. technology stock equity valuations exacerbated by investments in Internet-based companies during the bull market

in the late 1990s. The value of equity markets grew substantially during this period, with the Nasdaq index rising from under 1000 to more than 5000 between the years 1995 and 2000. Things started to change in 2000, and the bubble burst between 2001 and 2002 with equities entering a bear market [20]. The crash that followed saw the Nasdaq index tumble from a peak of 5048.62 on 10 March 2000, to 1139.90 on 4 October 2002, a 76.81% fall [21]. By the end of 2001, most dotcom stocks went bust.

Concerning the time period of the dotcom bubble, the majority of authors consider the dotcom bubble to take place in the period 1996–2000 [22]. In addition, it is assumed that the bubble-burst period was between 2000 and 2002, since, as stated before, the Nasdaq index fell by 76.81% in 4 October 2002.

#### *5.2. The Considered Data*

To analyze the effects of the dotcom bubble in the global economy, we considered three well-known stock market indexes, which are briefly described below.


We focus on the trivariate time series formed by the daily stock prices of the three previous indexes. The data were sourced from the finance section of the Yahoo website (https://es.finance.yahoo.com, accessed on 20 July 2021). As our goal is to determine whether the dotcom bubble distorted the global market behavior, we split this MTS into two separate periods: before and after the bubble-burst period. To this end, we consider the periods from 1987 to 2002 and from 2003 to 2018. In addition, we only select dates corresponding to trading days for the three indexes and forming two periods of the same length. Based on these considerations, the first period covers the simultaneous trading days from 2 January 1987 to 25 July 2002, and the second period includes the simultaneous trading days from 26 July 2002 to 28 December 2018. In this way, each MTS is constituted by 3928 daily observations.

Since the series of closing prices are not stationary in mean, we proceed to take the first difference of the natural logarithm of the original values, thus obtaining series of so-called daily returns, which are depicted in Figure 1. The new series exhibit common characteristics of financial time series, so-called "stylized facts", as heavy tails, volatility clustering, and leverage effects.

Two MTS were constructed by considering simultaneously the three UTS in Figure 1 before and after the dotcom bubble crash (vertical line). Then, the equality of the generating processes of both MTS was checked using the bootstrap tests proposed throughout the manuscript based on *B* = 500 bootstrap replicates.

#### *5.3. Results*

The *p*-values obtained by means of the methods MBB and SB were all 0. Therefore, both bootstrap techniques indicate rejection of the null hypothesis at any reasonable significance level. This suggests that the whole MTS exhibits a different dependence structure in each of the considered periods. A direct implication of this fact could be that the dotcom bubble crash in the early 2000s provoked a permanent change in the behavior of the global economy.

**Figure 1.** Daily returns of the S&P 500 (top panel), FTSE 100 (middle panel), and Nikkei 225 (bottom panel) stock market indexes from 2 January 1987 to 28 December 2018. The vertical line indicates the end of the dotcom bubble burst.

#### **6. Conclusions**

In this work, we addressed the problem of testing the equality of the stochastic processes generating two multivariate time series. For that purpose, we first defined a distance measure between multivariate processes based on comparing the quantile cross-spectral densities, called *dQCD*. Then, two tests considering a proper estimate of this dissimilarity (*d* - *QCD*) were proposed. Both approaches are based on bootstrap techniques. Their behavior under the null and the alternative hypotheses was analyzed through a simulation study. The techniques were also used to answer the question regarding whether or not the dotcom bubble crash of the 2000s affected global market behavior.

**Author Contributions:** Conceptualization, Á.L.-O. and J.A.V.; methodology, Á.L.-O. and J.A.V.; software, Á.L.-O.; writing—review and editing, Á.L.-O. and J.A.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research has been supported by MINECO (MTM2017-82724-R and PID2020-113578RB-100), the Xunta de Galicia (ED431C-2020-14), and "CITIC" (ED431G 2019/01).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors thank ITISE 2022 organisers for allowing them to submit this paper to the proceedings.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

