*Editorial* **Time Series Modelling**

**Christian H. Weiß**

Department of Mathematics and Statistics, Helmut Schmidt University, 22043 Hamburg, Germany; weissc@hsu-hh.de

**Keywords:** time series; models

Time series consist of data observed sequentially in time, and they are assumed to stem from an underlying stochastic process. The scope of time series approaches thus covers models for stochastic processes as well as inferential procedures for model fitting, model diagnostics, forecasting, and various other applications. While time series data have been collected for a relatively long time in history (one may recall the famous time series on sunspot numbers), the development of methods and stochastic models for such time series is more recent. Indeed, one of the motivations for announcing the Special Issue in 2020 was the fact that this year can be considered a twofold 'anniversary year' of time series modeling. On the one hand, the correlogram, the autoregressive (AR), and the moving-average (MA) models for time series, all of which are nowadays part of any course on time series analysis and covered by any statistical software, date back to the 1920s (mainly driven by G. U. Yule, G. T. Walker, and E. E. Slutzky; see Nie and Wu [1] for a detailed discussion). On the other hand, the first comprehensive textbook on time series was published by Box and Jenkins [2] in 1970, so 2020 allowed the celebration of both the semi-centennial and centennial anniversary at the same time. In keeping with this anniversary, it was indeed possible to collect articles on a wide range of topics in this Special Issue: stochastic models for time series as well as methods for their analysis, univariate and multivariate time series, real-valued and discrete-valued time series, applications of time series methods to forecasting and statistical process control, and software implementations of methods and models for time series. The remainder of this editorial provides a brief summary of the contributions to this Special Issue, grouping the articles thematically.

Roughly one-half of the contributed articles deal with real-valued time series (thus having a continuous range). In Nono et al. [3], an entropy-based Student's *t*-process dynamical model is proposed for dealing with non-Gaussian and non-linear univariate time series, whose relevance is demonstrated by an application to financial time series. The paper by Davidescu et al. [4] is centered around the time series of Romanian unemployment rates, which serves as the base for comparing the forecast performance of several well-established time series models. Not a single time series, but a large collection of univariate time series is considered by Lindstrom et al. [5], who use functional kernel density estimation for uncovering anomalous time series within such a collection. They apply their approaches to time series on aviation safety events as provided by the International Air Transport Association. Another data-intensive application area is electrical power forecasting, where both statistical and machine-learning methods are used. Vivas et al. [6] provide a systematic review of both types of methods (as well as of hybrid models) regarding forecast performance. Multiple time series are also considered by Sundararajan et al. [7], but now with a focus on multivariate time series having unequal dimensions. They propose and investigate a frequency-specific spectral ratio statistic, which is used to uncover differences in the spread of spectral information in a pair of such time series, and which is applied to data from stroke experiments. Another article on multivariate time series, following types of integrated vector ARMA models, is the one by Bauer and Buschmeier [8], who investigate estimators resulting from canonical variate analysis as well as novel cointegration tests. For illustration, they present an application to hourly electricity consumption data.

**Citation:** Weiß, C.H. Time Series Modelling. *Entropy* **2021**, *23*, 1163. https://doi.org/10.3390/e23091163

Received: 25 August 2021 Accepted: 1 September 2021 Published: 4 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The final article presented in the group of real-valued time series also constitutes a bridge to the next group of articles—namely, to those on discrete-valued time series. Nüßgen and Schnurr [9] consider a multivariate long-range dependent Gaussian time series, but they analyze its dependence structure based on discrete ordinal patterns derived thereof. The estimators of ordinal pattern dependence are analyzed asymptotically and within a simulation study.

The second half of contributed articles deals with time series having a discrete range, or more precisely, with count time series, where the observations are count values from the set of non-negative integers. A common approach to adapt the ARMA model known for realvalued time series to the case of count time series consists of substituting the multiplications within the ARMA recursive model by types of thinning operators; see Chapters 2–3 in Weiß [10] for detailed background. The resulting integer-valued counterparts to the ordinary AR and MA models are then referred to as INAR and INMA models, respectively. In Huang and Zhu [11], a new type of the classical INAR model using binomial thinning is proposed, where the innovations follow the one-parameter Bell distribution. Stochastic properties and estimation approaches are investigated, and applications to time series consisting of crime counts and strike counts are presented. Liu and Zhu [12], by contrast, develop an extension of the INAR model, where a new type of thinning operator is used, relying on the extended binomial distribution. The resulting model is able to flexibly adapt to different types of dispersion behavior, which is also demonstrated by several real-data examples. Furthermore, Yu and Wang [13] consider an extension of the binomial thinning operator, achieved by allowing for a dependent counting series, and this time, the operator is used within the class of INMA models. Properties of, and estimation for, this new type of INMA model are investigated, and they are illustrated by an application to a crime-counts time series. While the three aforementioned articles consider stationary and linear count time series, the contribution by Liu et al. [14] deals with non-stationary and non-linear time series as obtained from the periodic self-exciting threshold INAR model. Properties and estimation are discussed, and an application to monthly counts of claimants is presented. In Li et al. [15], again an INAR model is considered (using a randomized binomial thinning operator); however, now the main focus is not on the model itself, but on an approach for statistical process control. The authors use a cumulative sum chart for process monitoring, discuss its performance evaluation, and apply it to a crime-counts time series. Contrary to the aforementioned papers, the articles by Kim et al. [16] and Shapovalova et al. [17] refer to multivariate count time series. For a bivariate count time series following an integer-valued generalized AR conditional heteroscedastic (INGARCH) model, Kim et al. [16] propose a minimum density power divergence estimator being robust against outliers. The asymptotics of the estimator are investigated, and an application to bivariate crime counts is presented. Shapovalova et al. [17] consider two types of models for multivariate count time series: a log-linear multivariate INGARCH model and a non-linear state-space model. These models serve as a base for a forecast performance comparison. As real-world applications, count time series about bank failures and transactions are used. Last but not least, Stapper [18] developed a comprehensive software package (in the Julia language) for count time series modeling. The package fits different types of INARMA and INGARCH models, and it offers functions for model diagnostics, forecasting, etc. In his paper, Stapper [18] illustrates the application and the potential of "CountTimeSeries.jl" with several real-data examples and simulation experiments.

**Acknowledgments:** The Guest Editor is grateful to all authors for their contributions to this Special Issue, to the anonymous peer-reviewers for carefully reading the submissions as well as for their constructive feedback.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


### *Article* **Entropy Based Student's** *t***-Process Dynamical Model**

**Ayumu Nono 1,\*, Yusuke Uchiyama <sup>2</sup> and Kei Nakagawa <sup>3</sup>**

	- k-nakagawa@nomura-am.co.jp

**Abstract:** Volatility, which represents the magnitude of fluctuating asset prices or returns, is used in the problems of finance to design optimal asset allocations and to calculate the price of derivatives. Since volatility is unobservable, it is identified and estimated by latent variable models known as volatility fluctuation models. Almost all conventional volatility fluctuation models are linear time-series models and thus are difficult to capture nonlinear and/or non-Gaussian properties of volatility dynamics. In this study, we propose an entropy based Student's *t*-process Dynamical model (ETPDM) as a volatility fluctuation model combined with both nonlinear dynamics and non-Gaussian noise. The ETPDM estimates its latent variables and intrinsic parameters by a robust particle filtering based on a generalized H-theorem for a relative entropy. To test the performance of the ETPDM, we implement numerical experiments for financial time-series and confirm the robustness for a small number of particles by comparing with the conventional particle filtering.

**Keywords:** finance; volatility fluctuation; Student's *t*-process; entropy based particle filter; relative entropy

**Citation:** Nono, A.; Uchiyama, Y.; Nakagawa, K. Entropy Based Student's *t*-Process Dynamical Model. *Entropy* **2021**, *23*, 560. https:// doi.org/10.3390/e23050560

Academic Editor: Christian H. Weiss

Received: 30 March 2021 Accepted: 27 April 2021 Published: 30 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Asset allocation and pricing derivatives have been studied in both academia and industry as significant problems in financial engineering and quantitative finance. For these problems, various methodologies have been developed based on the variation of asset returns. In an idealized situation, the variation of returns has been assumed to follow the Gaussian distribution [1]. However, it is known that the variation of returns follows non-Gaussian distributions with fat tails [2]. To explain this observed fact, volatility, which quantifies the magnitude of fluctuating returns, has been introduced and utilized. Volatility, in particular, is often used as an indicator for constructing asset allocations that focus on macroeconomic fundamentals, and there are many studies related to them. Both researchers and investors have begun to attend to develop mathematical models of volatility fluctuations. For example, Yuhuang et al. investigated the impact of fundamental data on oil price volatility by focusing on time-varying skewness and kurtosis [3]. Hou et al. studied volatility spillovers between the Chinese fuel oil futures market and the stock index futures market, taking into account the time-varying characteristics of the markets [4].

In general, volatility is defined as the variance of a conditional Gaussian distribution for the variation of returns, namely, given as a latent variable in the literature of Bayesian statistical modeling. Based on this idea, various time-series models for the dynamics of asset returns have been developed and proposed. Such time-series models are generally called volatility fluctuation models, on which forecasting, state estimation and smoothing can be implemented.

In recent years, volatility fluctuation models with a machine learning technique have been proposed [5]. Since volatility is a latent variable, it is necessary for machine learning models to incorporate latent variables into their own methodology. The Gaussian process is a candidate, such as a Bayesian learning model [6], and its applications for several problems in finance have been reported [7,8]. The Student's *t*-process is an extension of the Gaussian, for non-Gaussian distributed data such as asset returns. It has been proposed [9] and applied to the analysis of financial time-series and asset allocations, and it is confirmed for this model to estimate robustly [10].

This study extends the Student's *t*-process latent variable model to a dynamic latent variable model incorporating the structure of time-series. To estimate dynamic latent variables, we used the particle filter method [11]. The particle filter is used to estimate the latent variables. Conventional particle filters have problems called weight bias and the particle impoverishment problem (PIP), directly affecting the estimation accuracy [12]. Then, the merging particle method [13] and Monte Carlo filter particle filter [14] have been proposed. However, these methods are computationally expensive because they need a large number of particles. Therefore, we used an Entropy-based particle filter (EBPF), which constructs a parametric prior distribution on the generalized H-theorem for relative entropy [15]. It is expected to prevent the bias of particle weights and the loss of particle diversity while reducing the computational cost. Using EBPF in this experiment, and comparing it with conventional methods, we confirmed that it is effective for finance problems.

In summary, to estimate robustly and avoid the particle filter's problem, we combined *t*-process dynamical model and EBPF. We call the proposed model an entropy based Student's *t*-process dynamical model (ESTDM), in the following. We will verify this model's usefulness. The remains of this paper are summarized as follows—Section 2 introduces related statistical and machine learning models. In Section 3, we derive and propose ESTDM with its filtering method. In Section 4, we show the performance of volatility estimation using the proposed method and discuss the results. Section 5 is devoted to our conclusions and future perspectives.

#### **2. Related Work**

#### *2.1. Volatility Fluctuation Models*

One of the most basic and utilized volatility fluctuation models is the GARCH model [16] given as follows:

$$\propto \mathbf{x}\_t \sim \mathcal{N}(0, \sigma\_t^2),$$

$$
\sigma\_t^2 = \alpha\_0 + \sum\_{j=1}^q \alpha\_j \sigma\_{t-j}^2 + \sum\_{i=1}^p \beta\_i \mathbf{x}\_{t-i\prime}^2 \tag{2}
$$

where *xt* is a time-dependent random variable sampled from a Gaussian distribution with mean 0 and variance *σ*<sup>2</sup> *<sup>t</sup>* , and the time evolution of the variance is given by Equation (2). The parameters *α<sup>j</sup>* and *β<sup>i</sup>* take positive values, which can be estimated by observed data. Positive integers *p* and *q* are the order of the regression, respectively. Then this model is known as the GARCH(*p*, *q*) model. For the sake of simplicity, the order parameters are often fixed as *p* = *q* = 1. Various families of GARCH model have been developed and proposed in the area of econometrics and quantitative finance [17]. For instance, asymmetric effect has been introduced into a multivariate GARCH model [18,19].

#### *2.2. Gaussian Process*

For any finite number of vectors {*x*1, *x*2, ···, *xn*} and a stochastic process *f*(·), if the joint probability density function { *f*(*x*1), *f*(*x*2), ···, *f*(*xn*)} is a Gaussian distribution, *f*(·) is called a Gaussian process [6]. Since the Gaussian process samples an infinite-dimensional vector, the mean value function *m*(·) and the covariance function *K*(·, ·) are introduced as follows:

$$m(\mathbf{x}) = \mathbb{E}[f(\mathbf{x})],\tag{3}$$

$$K(\mathbf{x}, \mathbf{x}') = \mathbb{E}[(f(\mathbf{x}) - m(\mathbf{x}))(f(\mathbf{x}') - m(\mathbf{x}'))^\mathrm{T}].\tag{4}$$

Then, given a matrix *X* = [*x*1, *x*2, ···, *xn*] T, *<sup>p</sup>*(*<sup>f</sup>* <sup>|</sup>*X*) = <sup>N</sup> (*m*(*X*), *<sup>K</sup>*(*X*, *<sup>X</sup>*)) is the probability density function of the Gaussian process. When we explicitly state that the stochastic process *f* is sampled from the Gaussian process, we write *f*∼GP(*m*, *K*). Without loss of generality, the mean function of the Gaussian process is often assumed to be zero. The covariance function is represented by the kernel function *k*(·, ·), which is a positive symmetric bi-variate function, satisfying

$$K(\mathbf{x}, \mathbf{x}') = k(\mathbf{x}, \mathbf{x}'). \tag{5}$$

Hence, *K*(*X*, *X*) is a positive definite symmetric matrix. As a kernel function, for example, the radial basis function

$$k\_{\rm RBF}(\mathbf{x}, \mathbf{x}') = \varkappa \exp(-l^{-2}||\mathbf{x} - \mathbf{x}'||^2) \tag{6}$$

is often used. Here, *α* and *l* are hyper parameters.

For a pair of observed data D = {(*x*1, *y*1),(*x*2, *y*2), ···,(*xn*, *yn*)}, let *X* = [*x*1, *x*2, ···, *xn*] T, *Y* = [*y*1, *y*2, ···, *yn*] T. The hyper parameters of the Gaussian process can be estimated by gradient and Monte Carlo methods on D. From the trained Gaussian process, the prediction *Y*∗ = [*y*∗ <sup>1</sup>, *y*<sup>∗</sup> <sup>2</sup>, ···, *y*<sup>∗</sup> *m*] <sup>T</sup> for unknown input *X*<sup>∗</sup> = [*x*<sup>∗</sup> <sup>1</sup> , *x*<sup>∗</sup> <sup>2</sup> , ···, *x*<sup>∗</sup> *m*] <sup>T</sup> is sampled from the conditional Gaussian distribution N (*f* <sup>∗</sup>, *K*∗). The mean function *f* <sup>∗</sup> and the covariance function *K*∗ of the conditional Gaussian distribution are given by

$$f^\* = m\_X + K\_{X^\*,X} K\_{X,X}^{-1} \mathcal{Y}\_\prime \tag{7}$$

$$K^\* = K\_{X^\*, X^\*} - K\_{X^\*, X} K\_{X, X}^{-1} K\_{X, X^\*}.\tag{8}$$

It is seen that the mean and covariance functions of the Gaussian process propagate the information of previously observed data to predicted values.

#### *2.3. Student's t-Process*

In the Gaussian process, it is assumed for the probability density function to be the Gaussian distribution. Thus, when we apply the Gaussian process to data following a probability distribution with fat tails, such as financial time-series, it is impossible to perform an accurate estimation. A model that extends the Gaussian process to such data is the Student's *t*-process [9]. The Student's *t*-process is a stochastic process *f*(·) with *ν* degrees of freedom and a Student's *t*-distribution defined as follows:

$$\mathcal{T}(m,\mathbb{K},\nu) = \frac{\Gamma\left(\frac{\nu+n}{2}\right)}{[(\nu-2)\pi]^{\frac{n}{2}}\Gamma\left(\frac{\nu}{2}\right)|\mathbb{K}\_{X,X}|^{\frac{1}{2}}} \left[1 + \frac{1}{\nu-2}(Y-m\_X)^{\mathbb{T}}\mathbb{K}\_{X,X}^{-1}(Y-m\_X)\right]^{-\frac{\nu+n}{2}}.\tag{9}$$

Here, *m*(·) and *K*(·, ·) are the mean and covariance functions, respectively, and Γ(·) is the gamma function. When the stochastic process *f*(·) is a Student's *t*-process, it is denoted by *f*∼T P(*m*, *K*; *ν*). As with the Gaussian process, the mean function of the Student's *t*-process is often assumed to be zero without loss of generality.

The predictive distribution of the Student's *t*-process is also the Student's *t*-distribution T (*m*∗, *K*∗, *ν*∗), where degrees of freedom, mean and covariance functions are then updated as follows:

$$
\nu^\* = \nu + n\_\prime \tag{10}
$$

$$m^\* = m\_X + K\_{X^\*,X} K\_{X,X}^{-1} (Y - m\_X) \tag{11}$$

$$K^\* = \frac{\nu - \beta - 2}{\nu - n - 2} \left[ K\_{X^\*, X^\*} - K\_{X^\*, X} K\_{X, X}^{-1} K\_{X, X^\*} \right]$$

$$\beta = (Y - m\_X)^T K\_{X, X}^{-1} (Y - m\_X). \tag{12}$$

Unlike the Gaussian process, in the Student's *t*-process, we can confirm that the effect of the number of data is reflected in the update equations of the degrees of freedom and the covariance function.

#### *2.4. Student's t-Process Latent Variable Model*

In the Student's *t*-process latent variable model, the input matrix *X* is given as a latent variable. Assume that the observed data *<sup>y</sup>*∈R*<sup>D</sup>* and the latent variable *<sup>x</sup>*∈R*<sup>Q</sup>* are related as *<sup>y</sup>* <sup>=</sup> *<sup>f</sup>*(*x*) by the Student's *<sup>t</sup>*-process *<sup>f</sup>*∼T P(*m*, *<sup>K</sup>*; *<sup>ν</sup>*). When we let *<sup>Y</sup>*∈R*D*×*<sup>N</sup>* be the sequence of *<sup>N</sup>* observed data, and *<sup>X</sup>*∈R*Q*×*<sup>N</sup>* be the sequence of *<sup>N</sup>* latent variables, we can define the following model as Student's *t*-process latent variable model [10]:

$$p(\boldsymbol{Y}|\boldsymbol{X}) = \frac{\Gamma\left(\frac{\boldsymbol{\nu} + \boldsymbol{D}}{2}\right)}{[(\boldsymbol{\nu} - \boldsymbol{2})\boldsymbol{\pi}]^{\frac{\boldsymbol{D}}{2}}\Gamma\left(\frac{\boldsymbol{\nu}}{2}\right)|\boldsymbol{K}\_{\boldsymbol{X},\boldsymbol{X}}|^{\frac{1}{2}}} \left[1 + \frac{1}{\boldsymbol{\nu} - \boldsymbol{2}}(\boldsymbol{Y} - \boldsymbol{m}\_{\boldsymbol{X}})^{\boldsymbol{T}}\boldsymbol{K}\_{\boldsymbol{X},\boldsymbol{X}}^{-1}(\boldsymbol{Y} - \boldsymbol{m}\_{\boldsymbol{X}})\right]^{-\frac{\boldsymbol{\nu} + \boldsymbol{D}}{2}}.\tag{13}$$

Since the Student's *t*-distribution converges to the Gaussian distribution in the limit of *ν*→∞, we can see that the Student's *t*-process latent variable model embraces the Gaussian process latent variable model [20].

#### **3. Proposed Model**

#### *3.1. Student's t-Process Dynamical Model*

Since volatility fluctuations cannot be observed directly, they are modeled as dynamic latent variables, such as the family of GARCH models, most of which are given by linear time-series models [18]. To describe nonlinear dynamics of volatility fluctuations, we extend the Student's *t*-process latent variable model to dynamic latent variables, namely, Student's *t*-process dynamical model (TPDM), which is expected to be robust for both observable and unobservable with outliers.

Suppose *pt* represents an asset price at time *t*, the log-return is given by *rt* = log (*pt*/*pt*−1). Let *σ*<sup>2</sup> *<sup>t</sup>* denote the volatility of *rt*. Here, for an observable *rt* and a latent variable *σ*<sup>2</sup> *<sup>t</sup>* , we provide a volatility fluctuation model by a TPDM as follows:

$$r\_t \sim \mathcal{T}(0, \sigma\_t^2; \nu),\tag{14}$$

$$w\_t \equiv \log \sigma\_t^2,\tag{15}$$

$$w\_t = f(v\_{t-1}, r\_{t-1}; \nu) + \varepsilon\_t \tag{16}$$

$$
\varepsilon\_t \sim \mathcal{N}\left(0, \sigma\_u^2\right),
\tag{17}
$$

where the observable *rt* as centered at 0 and following a Student's *t*-distribution with *ν* degrees of freedom, whose parameter is given by *σ*<sup>2</sup> *<sup>t</sup>* . The dynamic latent variable *vt* is defined by Equation (15) to take the whole real number as its range of value. The time evolution of the dynamic latent variable *vt* is given by Equation (16) with a Gaussian white noise whose variance is *σn*. The stochastic process *f* on the right-hand side of Equation (16) follows a Student's *t*-process given by

$$f \sim \mathcal{T}\mathcal{P}(m, \mathbb{K}; \nu), \tag{18}$$

$$w(\xi\_{t-1}) = w\_{t-1} + b\mathbf{x}\_{t-1},\tag{19}$$

$$k(\mathfrak{z}\_{t-1}, \mathfrak{z}\_{t-1}') = \gamma \exp\left(-l^{-2}||\mathfrak{z}\_{t-1} - \mathfrak{z}\_{t-1}'||^2\right),\tag{20}$$

where *ξ<sup>t</sup>* = (*xt*, *vt*) , and the hyper parameters are *θ* = (*ν*, *σn*, *a*, *b*, *γ*, *l*). Given a series of observed data *r*1:*<sup>T</sup>* = {*r*1,*r*2, ···,*rT*}, it is possible to obtain the volatility fluctuations by estimating a series of dynamic latent variables *v*1:*<sup>T</sup>* = {*v*1, *v*2, ···, *vT*}.

#### *3.2. Particle Filter*

Particle filter is a method of state estimation by Monte Carlo sampling, where a large number of particles approximates posterior distributions. Hence, it can be applied to nonlinear systems, where posterior distributions are intractable [21]. For *N* particles, let {*vi* 1:*t*−1}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> and *<sup>W</sup><sup>i</sup> <sup>t</sup>*−<sup>1</sup> (*<sup>i</sup>* <sup>=</sup> 1, 2, ···, *<sup>N</sup>*) be the realizations of the dynamic latent variables and their associated weights up to time *t* − 1, respectively. The weights are normalized to ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *W<sup>i</sup> <sup>t</sup>*−<sup>1</sup> <sup>=</sup> 1. With these values, the posterior distribution *<sup>p</sup>*(*v*1:*t*−1|*x*1:*t*−1) at time *<sup>t</sup>* <sup>−</sup> <sup>1</sup> can be approximated as follows [22]:

$$\phi(\upsilon\_{1:t-1}|\mathbf{x}\_{1:t-1}) = \sum\_{i=1}^{N} \mathcal{W}\_{t-1}^{i} \delta(\upsilon\_{1:t-1}) \, , \tag{21}$$

where *δ*(·) is the Dirac's delta function. In other words, the posterior distribution is approximated by a mixture of the delta functions.

It is however known that an insufficient number of particles fails to approximate the posterior distribution by the degeneracy of ensemble. Indeed, each particle's weights become unbalanced and biased toward a tiny number of particles as the time step progresses [11,12]. To overcome this problem, a huge amount of particles is used for filtering processes in practice.

#### *3.3. Entropy-Based Particle Filter*

In the use of the conventional particle filter, it is necessary to sample a large number of particles for accuracy. That leads to the growth of estimation time. In the case of online estimation, reducing run time is desired. For this purpose, we introduce a robust particle filter for a small number of particles.

Let us reconsider approximating the probability density function for the dynamic latent variable, *Q*(*v*, *t*), called a background distribution. In the conventional particle filter, the background distribution is approximated by the mixture of delta functions. This approximation works well only when the background distribution exhibits an extensively sharp peak. Nevertheless, the delta function has no width, and the distribution peaks only at a single point.

To improve the accuracy for the approximation of the background distribution, we replace the mixture of the delta functions with that of Gaussian distributions as

$$\hat{Q}(v,t) = \sum\_{i=1}^{M} \mathcal{W}\_t^i \mathcal{N}(\mu\_{t\prime}^i \sigma\_t^{2\cdot i}),\tag{22}$$

where *μ<sup>i</sup> <sup>t</sup>*, *<sup>σ</sup>*2,*<sup>i</sup> <sup>t</sup>* (1 ≤ *i* ≤ *M*) are the mean and variance of the Gaussian distributions at *t*. Unlike the delta function, the Gaussian distribution has a certain width in its distribution. Hence, the mixture of the Gaussian distributions is capable of fitting properly to data with large variance and fat tails.

With the use of finite samples from the background distribution *Q*(*v*, *t*), the posterior/filter distribution *P*(*v*, *t*) is inferred by the minimum principal for relative entropy, which is known as an entropy based particle filter [15]. The relative entropy (Kullback-Liebler divergence) between the filter distribution *P*(*v*, *t*) and the background distribution *Q*(*v*, *t*) are defined as follows [23–25]:

$$H[P|Q] = \int\_{\Omega\_v} P(v, t) \log \left( \frac{P(v, t)}{Q(v, t)} \right) dv,\tag{23}$$

where Ω*<sup>v</sup>* is the domain of the dynamic latent variable *vt*. On the properties of the relative entropy as a quasi-distance for probability density functions, the filter distribution is obtained as the minimizer for the relative entropy in Equation (23). Combined with the entropy based particle filter, the state estimation of the ESTDM is implemented. An overview of its algorithm is explained in the following Algorithm 1.

#### **Algorithm 1** Entropy Based Student's *t*-Process Dynamical Model (ETPDM)

**Require:** Initial particles *X*<sup>0</sup> = *X*0 <sup>0</sup> , ..., *<sup>X</sup><sup>M</sup>* 0 , Initial particles' weights *W<sup>i</sup>* <sup>0</sup> = 1/*M* **Ensure:** ∑*<sup>M</sup> <sup>i</sup>*=<sup>1</sup> *<sup>W</sup><sup>i</sup> <sup>t</sup>* = 1.0 at any time *t*

1: **while** There are observations to be assimilated **do**

2: Compute importance weights proportional to the likelihood with observation *xt*

$$\mathcal{W}\_t^i \approx p(y\_t | \mathbf{X}\_t) \tag{24}$$

According to weights {*W<sup>i</sup> <sup>t</sup>*}, resample *<sup>M</sup>* particles {*X<sup>j</sup> t*}*<sup>M</sup> <sup>j</sup>*=1. Then we can compute filter distribution *Q* (*x*) at time *t*

$$\mathbf{Q}'\_t = \sum\_{i=1}^{M} \mathbf{W}^j\_t \mathcal{N}(\mathbf{X}^j\_t). \tag{25}$$

At the same time, we're also able to estimate expected status *vt* , extracting any finite number of samples {*vk*} from background density *Qt*

$$
\mathfrak{w}\_l = \mathbb{E}(\mathfrak{v}^k). \tag{26}
$$

With stochastic process *f* ∼ T (*m*, *k*; *ν*), generate new particles

$$\{\mathbf{X}\_{t+1}^{\dot{i}}\} = f(\mathbf{X}\_{t\prime}^{\dot{j}}, \mathbf{x}\_{t}) \tag{27}$$

Then we can predict distribution *Q*ˆ(*x*) at time *t* + 1

$$\mathcal{Q}\_{t+1} = \sum\_{i=1}^{M} \mathcal{W}\_t^i \mathcal{N}(\mathbf{X}\_{t+1}^i). \tag{28}$$

3: **return** Log likelihood for estimation *p*(*yt*|*vt*). 4: **end while**

#### **4. Numerical Experiments**

In this section,we implement numerical experiments to validate the ETPDM for the time-series of a foreign exchange rate. As a dataset, we use USD/JPY exchange rate in 2010—every 1-min sampled, 30-min sampled and 1-h sampled. Figures 1–3 show the time-series of the log-return of the USD/JPY exchange rate *rt* and volatility fluctuations estimated by respective the ETPDM, the conventional particle filter for the GARCH model (cp-GARCH) and the conventional particle filter for the Student's *t*-process dynamical model (cp-TPDM). Warm up period of the estimations is 0 ≤ *t* ≤ 20, where the values of volatility show zeros. In Figure 1a, intermittent jumps are observed, which are evidence of the non-Gaussian behavior of *rt*. Indeed, the estimated volatility fluctuations show higher peaks at the same time point of the intermittent jumps in Figure 1b–d. That means all of the models capture the nature of volatility fluctuations of the USD/JPY exchange rate effectively. Besides, the same can be said for other types of data sets—30-min and 1-h—in Figure 2 or Figure 3, which means that these models can be applied to data of any sampling rate. Previous volatility estimation studies used the GARCH model with various estimation methods. A typical example is the particle filter [26], or the Markov chain Monte Carlo simulation [27]. In all of these studies, including this experiment, the GARCH model has been implemented well.

**Figure 1.** Overview of estimation results in 1-min chart.

**Figure 2.** Overview of estimation results in 30-min chart.

**Figure 3.** Overview of estimation results in 1-h chart.

Figure 4 show the estimated log-likelihoods of the ETPDM, cp-GARCH and the cp-TPDM. The likelihood tends to be higher for the ETPDM, the cp-TPDM and cp-GARCH in that order. As mentioned in Section 2, TPDM is a superordinate model that encompasses Gaussian process dynamical model, the GPDM, and the fact that the likelihood of the GARCH was at a lower level than that of TPDM is consistent with the results of previous studies comparing the GARCH and the GPDM [26]. Likelihood is the most reliable indicator to quantify the model performance, and the EPTDM had the best performance of three models. Besides, in the case of cp-TPDM, the log-likelihoods scatter around −0.7 in the range of particle numbers from 10 to 500 without convergence. This means the performance of the conventional particle filter is insufficient for the given data. On the other hand, the log-likelihoods of the ETPDM exhibit good convergence for the particle numbers larger than 100 in Figure 4b, which indicates the ETPDM is expected to be robust for fewer sampling.

To investigate the effectiveness of particle filtering, we introduce an effective particles rate

$$R\_{eff} = \frac{1}{N\sum\_{i=1}^{N} (W\_t^i)^2} \tag{29}$$

as a measure for evaluating the bias of sampled particles. This value gives the maximum value *Reff* = 1 when the weights are uniformly distributed as *W<sup>i</sup>* = 1/*N* (*i* = 1, 2, ..., *N*). In Figure 5a, the effective particle rates for the cp-TPDM scatter for whole particle numbers from 10 to 500. This kind of worse performance for the effective particle rates stems from the weight bias problem of the particle filtering. In other words, the conventional particle filter is hard to overcome the particle impoverishment problem, even if, by increasing particle numbers. On the contrary, for the case of the ETPDM as shown in Figure 5b, we can see that the effective particle rate converges beyond 50%. This is the expected advantage of the ETPDM, which stems from the finite band of each Gaussian distribution as a component of the prior distribution. Thus, the ETPDM serves as an accurate estimation for lower particle numbers and then would contribute to effective online estimation. Focusing on another comparison of the ETPDM, the cp-GARCH, also looks good in the view of the

effective particles rate. However, when we also consider the likelihood in Figure 4a, we can say that the practical ensembles didn't affect to performance because cp-GARCH was less suitable for this problem than the ETPDM. This is another evidence that the ETPDM have better potential.

**Figure 4.** Estimated log-likelihoods of (**a**) the cp-GARCH, the cp-TPDM and (**b**) the ETPDM.

**Figure 5.** Effective particle rates of (**a**) cp-GARCH, the cp-TPDM and (**b**) the ETPDM.

In order to validate the robustness for estimating intermittent return dynamics, we investigate the degree of freedom *ν* of the Student's *t*-process dynamical models. For this purpose, we split time window of return fluctuations; one is low volatility window (50 ≤ *t* ≤ 250) and the other is high volatility counterpart (360 ≤ *t* ≤ 560). The descriptive statistics of the return fluctuations in the two windows are shown in Table 1. As is seen in the table, kurtoses in both time windows are larger than 3, namely, corresponding return fluctuations follow non-Gaussian statistics. Prior research has confirmed that when the data set follows a Gaussian distribution, the strengths of models that excel at robust estimation do not come into play [28]. Therefore, such a data set that follows a non-Gaussian distribution is appropriate for the purpose of this experiment. Figure 6 exhibit the log-likelihoods of the cp-TPDM and the ETPDM in (a) low volatility window and (b) high volatility one. In these figures, the log-likelihoods of the ETPDM in both time windows have maxima in 6 ≤ *ν* ≤ 7 though the estimations by the the cp-TPDM are unstable. This result evidences the robustness of the ETPDM.

**Figure 6.** Log-likelihoods of the cp-TPDM and the ETPDM in (**a**) low volatility window and (**b**) high volatility one.

**Table 1.** Two types of window.


#### **5. Conclusions**

In this study, we proposed the ETPDM to implement robust estimation for dynamical latent variables of nonlinear and non-Gaussian fluctuations. In estimating the dynamic latent variables and hyper parameters, the entropy based particle filter with the Gaussian mixture distribution was adopted. To validate the performance of the ETPDM, we carried out the numerical experiment for the return fluctuations of a foreign exchange rate compared with the cp-GARCH and the cp-TPDM. As a result, we confirmed the advantages of the ETPDM; (i) good convergence property, (ii) high effective particle rate and (iii) robustness for a small number of particles.

Based on its advantages, the ETPDM is applicable for online volatility estimation for the problem of asset allocation and derivative pricing in a short time span. As a basis distribution for background distribution, we employed the Gaussian distribution in our numerical experiments. Nevertheless, the framework of the entropy based particle filter is able to be extended to other probability density functions. Additionally, we can adapt this research to any other time-series data, not just asset data. It has the potential to be applied to control engineering, such as the self-positioning estimation problem. These are our future works.

**Author Contributions:** Conceptualization, Y.U.; methodology, A.N.; software, A.N.; validation, A.N.; formal analysis, A.N.; investigation, A.N.; resources, A.N.; data curation, K.N.; writing—original draft preparation, A.N.; writing—review and editing, Y.U., and K.N.; visualization, A.N.; project administration, Y.U.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author, Nono, A., upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

