Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters

Rao, Nan; Peng, Qidi; Zhao, Ran

doi:10.3390/fractalfract6040222

Open AccessArticle

Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters

by

Nan Rao

^1,*

,

Qidi Peng

²

and

Ran Zhao

³

¹

School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China

²

Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711, USA

³

University of Liverpool Management School, University of Liverpool, Liverpool L69 7ZH, UK

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2022, 6(4), 222; https://doi.org/10.3390/fractalfract6040222

Submission received: 20 February 2022 / Revised: 7 April 2022 / Accepted: 11 April 2022 / Published: 14 April 2022

(This article belongs to the Special Issue Applications of Fractional Calculus in Economics)

Download

Browse Figures

Versions Notes

Abstract

:

We conduct cluster analysis of a class of locally asymptotically self-similar stochastic processes with finite covariance structures, which includes Brownian motion, fractional Brownian motion, and multifractional Brownian motion as paradigmatic examples. Given the true number of clusters, a new covariance-based dissimilarity measure is introduced, based on which we obtain approximately asymptotically consistent algorithms for clustering locally asymptotically self-similar stochastic processes. In the simulation study, clustering data sampled from fractional and multifractional Brownian motions with distinct Hurst parameters illustrates the approximated asymptotic consistency of the proposed algorithms. Clustering global financial markets’ equity indexes returns and sovereign CDS spreads provides a successful real world application. Implementations in MATLAB of the proposed algorithms and the simulation study are publicly shared in GitHub.

Keywords:

local asymptotic self-similarity; multifractional Brownian motion; stochastic processes clustering; covariance-based dissimilarity measure; approximated asymptotic consistency

1. Introduction

Learning stochastic processes is a fast growing research area in machine learning, as there is a considerable number of machine learning problems that involve the time variable as a component of datasets [1,2,3,4,5]. Due to the nature of the time index, stochastic processes often possess “path features” [6]. This additional information provided by the time index makes performing machine learning on stochastic processes quite different from other types of objects. Several new techniques are developed alongside studying such machine learning problems [7,8,9,10]. In this paper, we focus our study on a particular unsupervised learning problem: clustering of stochastic processes. Recently, clustering distribution stationary ergodic time series were motivated by and discussed in [11,12]. Later, Peng et al. [13] developed Khaleghi et al. [12]’s consistent clustering algorithms in order to cluster covariance stationary ergodic (discrete-time or continuous-time) stochastic processes. Please note that a distribution stationary ergodic stochastic process is not necessarily covariance stationary. Therefore, Peng et al. [13] enlarged the class of stochastic processes for which one can apply the consistent clustering algorithms.

The motivation for this paper lies in the fact that assuming the observed data to follow a covariance stationary stochastic process seems still unrealistic. Our paper then tries to overcome this issue through developing a promising algorithm to cluster a more general class of stochastic processes, the so-called locally asymptotically self-similar processes. More precisely, the key path features of the observed stochastic processes are assumed to be known as Assumption 1 (see Section 2). Compared to the stationarity, Assumption 1 is a much weaker constraint, for the reason that in many fields it was proved that the observed paths are sampled from functions (or functionals) of well-known locally asymptotically self-similar processes. For example, dynamics in financial markets (equity returns and interest rates) can be described based on geometric Brownian motions (gBm); long-term dependent or self-similar phenomena are often modeled by fractional Brownian motions (fBm) [14]. The long-term financial indexes and curves, such as S&P 500, Dow Jones, NASDAQ, interest rates, VIX rates, and currency exchange rates can be modeled using multifractional Brownian motions (mBm) [15,16,17,18,19,20]. Modeling events using locally asymptotically self-similar processes can be widely found in other fields such as geology, biology, power, and energy [21,22]. Recently, there was growth of investigations on how to test and estimate locally asymptotically self-similar processes, and on how to apply machine learning analysis to such processes [23,24,25,26,27,28,29,30]. However, in as far as we know, there has not yet been a study on clustering locally asymptotically self-similar processes in the literature. Our paper then aims at shedding some light on clustering such processes.

Contrary to the conventional clustering of finite-dimensional data, clustering based on the paths’ features of the processes largely removes the noise by capturing the observations’ path features. Therefore, a nice dissimilarity measure should be the one that well characterizes the path features. In this context, “nice” refers to the property that the computational complexity and the prediction errors caused by the over-fitting issues are expected to be largely reduced. Moreover, with some path features, consistency of the clustering algorithm [11,12] may be obtained. Among all the stochastic process features, we focus on characterizing the property of ergodicity in this paper. However similar analysis can be made for other patterns of process features such as seasonality, Markov property and martingale property.

Ergodicity [31] is a very typical feature possessed by several well-known processes, which is applied to financial time series analysis. It is tightly related to other process features, such as stationarity, long-term memory, and self-similarity [32,33]. In [12,13], it is shown that both distribution ergodicity and covariance ergodicity lead to obtaining an asymptotically consistent clustering algorithms for clustering processes. In this paper, we take one step further to relax the condition of ergodicity to the “local asymptotic ergodicity” [34] and obtain the so-called “approximately asymptotically consistent algorithms” for clustering processes having such path property. This setting presents such a large class of processes that includes the well-known Lévy processes, some self-similar processes and some multifractional processes [34].

Each clustering stochastic processes problem involves handling data, defining clusters, measuring dissimilarities, and finding groups efficiently; therefore, we organize the paper as follows. Section 2 is devoted to introducing a class of locally asymptotically self-similar processes to which our clustering approaches apply. In Section 3, a covariance-based dissimilarity measure is suggested and in Section 4, the approximately asymptotically consistent algorithms for clustering both offline and online datasets are designed. A simulation study is performed in Section 5, where the algorithms are applied to cluster multifractional Brownian motions (mBm), an excellent representative of the class of locally asymptotically self-similar processes. In Section 6, we perform cluster analysis over a global equity return data. Conventionally, stock returns of countries in the same region are considered to have similar patterns due to the common regional economic factors. However, recent empirical evidence shows that, as financial market globalization increases, global economic clusters switch from “geographical centriods” to “economic development centriods”. Our clustering algorithms show how “geography” and “economic development” jointly impact the equity returns of countries or regions. Considering the equities as stochastic processes in the clustering makes the analysis more promising. Finally, Section 7 concludes and provides future prospects.

2. A Class of Locally Asymptotically Self-Similar Processes

Self-similar processes are a class of processes that are invariant in distribution under suitable scaling of time [35,36,37]. These processes have been used to successfully model various time-scaling random phenomena observed in high frequency data, especially in the financial data and geological data.

Definition 1

(Self-similar process). A stochastic process

{Y_{t}^{(H)}}_{t \geq 0}

(here the time indexes set is not necessarily continuous) is self-similar to self-similarity index

H \in (0, 1)

if, for all

n \in N : = {1, 2, \dots}

, all

t_{1}, \dots, t_{n} \geq 0

and all

c > 0

,

(Y_{c t_{1}}^{(H)}, \dots, Y_{c t_{n}}^{(H)}) \overset{law}{=} (c^{H} Y_{t_{1}}^{(H)}, \dots, c^{H} Y_{t_{n}}^{(H)}),

(1)

where

\overset{l a w}{=}

denotes the equality in joint probability distribution of two finite-dimensional random vectors.

When

E | Y_{t}^{(H)} | < + \infty

for

t \geq 0

, it follows from (1) that for any

t \geq 0

,

E (Y_{t}^{(H)}) = c^{H} E (Y_{t / c}^{(H)}), for all c > 0 .

(2)

Therefore taking

t = 0

at both hand sides of (2) yields

E (Y_{0}^{(H)}) = 0 .

(3)

Self-similar processes are generally not distribution stationary but their increment processes can be distribution stationary (any finite subset’s joint distribution is invariant subject to time shift) or covariance stationary (its mean and covariance structure exist and are an invariant subject to time shift). From now on we restrict our setting to stochastically continuous-time self-similar processes only [37]. i.e., a process

{X_{t}}_{t \geq 0}

is stochastically continuous at

t_{0} \geq 0

if

P (| X (t_{0} + h) - X (t_{0}) | > ε) \underset{h \to 0^{+}}{\to} 0, for any ε > 0 .

This assumption is weaker than the almost sure continuity. The process

{X_{t}}_{t \geq 0}

is called (stochastically) continuous time over

[0, + \infty)

if it is continuous at each

t \geq 0

. For

u > 0

, we call

{Y (t)}_{t} = {X (t + u) - X (t)}_{t}

the increment process (or simply the increments) of

{X (t)}_{t}

. If a continuous-time self-similar process’ all increment processes are covariance stationary, its covariance structure can be explicitly given below:

Theorem 1.

Let

{\{X_{t}^{(H)}\}}_{t \geq 0}

be a self-similar process with index

H \in (0, 1)

and with covariance stationary increments. Then

E (X_{t}^{(H)}) = 0, f o r a l l t \geq 0,

(4)

and

C o v (X_{s}^{(H)}, X_{t}^{(H)}) = \frac{V a r (X_{1}^{(H)})}{2} ({| s |}^{2 H} + {| t |}^{2 H} - {| s - t |}^{2 H}), for any s, t \geq 0 .

(5)

Theorem 1 can be obtained by replacing the distribution stationary increments assumption in Theorem 1.2 in [36] with covariance stationary increments assumption. We briefly provide the proof below.

Proof.

We first prove (4). On one hand, using the fact that the increments of

{\{X_{t}^{(H)}\}}_{t}

are covariance stationary and (3), we obtain

E (X_{m t}^{(H)}) = E (\sum_{k = 0}^{m - 1} (X_{(k + 1) t}^{(H)} - X_{k t}^{(H)}) + X_{0}^{(H)}) = m E (X_{t}^{(H)}), for all m \in N .

(6)

On the other hand, since

{\{X_{t}^{(H)}\}}_{t}

is self-similar to index H, we have

E (X_{m t}^{(H)}) = m^{H} E (X_{t}^{(H)}), for all m \in N .

(7)

Putting together (6) and (7) and the fact that

0 < H < 1

, we necessarily have

E (X_{t}^{(H)}) = 0

for all

t \geq 0

. (4) is proved.

For proving (5) we first observe that, for

s, t \geq 0

,

E (X_{s}^{(H)} X_{t}^{(H)}) = \frac{1}{2} (E {(X_{s}^{(H)})}^{2} + E {(X_{t}^{(H)})}^{2} - E {(X_{s}^{(H)} - X_{t}^{(H)})}^{2}) .

(8)

Next we can see from the facts that

{\{X_{t}^{(H)}\}}_{t}

is self-similar to index H; that its increments are covariance stationary and (4), that

\begin{matrix} E {(X_{s}^{(H)})}^{2} = {| s |}^{2 H} E {(X_{1}^{(H)})}^{2} = {| s |}^{2 H} V a r (X_{1}^{(H)}), for s \geq 0; \\ E {(X_{s}^{(H)} - X_{t}^{(H)})}^{2} = V a r (X_{| s - t |}^{(H)}) = {| s - t |}^{2 H} V a r (X_{1}^{(H)}), for s, t \geq 0 . \end{matrix}

(9)

The covariance stationarity yields

V a r (X_{1}^{(H)}) < + \infty

. (5) then follows from (8) and (9). Theorem 1 is proved. □

We highlight that, contrary to Theorem 1.2 in [36], the covariance stationary increment process of

{\{X_{t}^{(H)}\}}_{t}

in Theorem 1 is not necessarily distribution stationary. This fact inspires us to relax the distribution stationarity of the processes to the covariance stationarity in the forthcoming Assumption (

A

). Below, we introduce a natural extension of self-similar processes, the so-called locally asymptotically self-similar processes [34,38,39].

Definition 2

(Locally asymptotically self-similar process). A continuous-time stochastic process

{\{Z_{t}^{(H (t))}\}}_{t \geq 0}

with its index

H (•)

being a continuous function valued in

(0, 1)

, is called locally asymptotically self-similar, if for each

t \geq 0

, there exists a non-degenerate self-similar process

{\{Y_{u}^{(H (t))}\}}_{u \geq 0}

with self-similarity index

H (t)

, such that

{\{\frac{Z_{t + τ u}^{(H (t + τ u))} - Z_{t}^{(H (t))}}{τ^{H (t)}}\}}_{u \geq 0} \to_{τ \to 0^{+}}^{f . d . d .} {\{Y_{u}^{(H (t))}\}}_{u \geq 0},

(10)

where the convergence

\overset{f . d . d .}{\to}

is in the sense of all the finite-dimensional distributions.

In (10),

{\{Y_{u}^{(H (t))}\}}_{u}

is called the tangent process of

{\{Z_{t}^{(H (t))}\}}_{t}

at t [38,39]. Moreover, it is shown (see Theorem 3.8 in [39]) that, if

{\{Y_{u}^{(H (t))}\}}_{u}

is unique in law, it is then self-similar to index

H (t)

and it has distribution stationary increments. Then the local asymptotic self-similarity generalizes the conventional self-similarity, in the sense that, any non-degenerate self-similar process with distribution stationary increments is locally asymptotically self-similar and its tangent process is itself. Furthermore, in a weaker sense, it is not difficult to show the following:

Proposition 1.

Let

{\{Z_{t}^{(H)}\}}_{t \geq 0}

be a continuous-time self-similar process with self-similarity index

H \in (0, 1)

and with covariance stationary increments. Then all its tangent processes share the same mean and covariance function.

Proof.

Since

{\{Z_{t}^{(H)}\}}_{t \geq 0}

is locally asymptotically self-similar, by definition at each

t \geq 0

there exists a tangent process

{\{Y_{u}^{(H)}\}}_{u \geq 0}

such that

{\{\frac{Z_{t + τ u}^{(H)} - Z_{t}^{(H)}}{τ^{H}}\}}_{u \geq 0} \to_{τ \to 0^{+}}^{f . d . d .} {\{Y_{u}^{(H)}\}}_{u \geq 0} .

(11)

Next we show

{\{Y_{u}^{(H)}\}}_{u \geq 0}

s mean and covariance structure are uniquely determined.

Since

{\{Z_{t}^{(H)}\}}_{t \geq 0}

has covariance stationary increments, for any

u \geq 0

,

τ > 0

, define the scaled increments

Y_{u, τ}^{(H)} : = \frac{Z_{t + τ u}^{(H)} - Z_{t}^{(H)}}{τ^{H}} .

Again by the fact that

{\{Z_{t}^{(H)}\}}_{t \geq 0}

has covariance stationary increments, using (4) in Theorem 1 we obtain

E (Y_{u, τ}^{(H)}) = 0, for all u \geq 0, τ > 0,

(12)

and by (5) in Theorem 1, we have for

u_{1}, u_{2} \geq 0

and

τ > 0

,

\begin{matrix} C o v (Y_{u_{1}, τ}^{(H)}, Y_{u_{2}, τ}^{(H)}) & = & τ^{- 2 H} C o v (Z_{t + τ u_{1}}^{(H)} - Z_{t}^{(H)}, Z_{t + τ u_{2}}^{(H)} - Z_{t}^{(H)}) \\ = & \frac{V a r (Z_{1}^{(H)})}{2} (| u_{1} |^{2 H} + | u_{2} |^{2 H} - {| u_{1} - u_{2} |}^{2 H}), \end{matrix}

(13)

which does not depend on

τ

.

It follows from (11)–(13) that

\begin{matrix} E (Y_{u}^{(H)}) = lim_{τ \to 0^{+}} E (Y_{u, τ}^{(H)}) = 0, for all \geq 0; \\ C o v (Y_{u_{1}}^{(H)}, Y_{u_{2}}^{(H)}) = lim_{τ \to 0^{+}} C o v (Y_{u_{1}, τ}^{(H)}, Y_{u_{2}, τ}^{(H)}) \\ = \frac{V a r (Z_{1}^{(H)})}{2} (| u_{1} |^{2 H} + | u_{2} |^{2 H} - {| u_{1} - u_{2} |}^{2 H}), for u_{1}, u_{2} \geq 0 . \end{matrix}

(14)

(14) implies that all tangent processes of

{\{Z_{t}^{(H)}\}}_{t \geq 0}

possess zero-mean and equal covariance functions. By the way it is easy to derive from (14) that these tangent processes have covariance stationary increments. Proposition 1 is proved. □

We remark from Proposition 1 that the tangent processes of

{\{Z_{t}^{(H)}\}}_{t \geq 0}

may not be unique in law, but their finite-dimensional subsets have unique first and second order moments.

Based on the above discussion, throughout this paper, we assume that the observed datasets are sampled from a known number (denoted by

κ

) of continuous-time processes satisfying the following condition:

Assumption 1.

The processes are locally asymptotically self-similar; their tangent processes’ increment processes are autocovariance ergodic.

In Assumption 1, the autocovariance-ergodicity means that the sample autocovariance functions of the covariance stationary process converges in squared mean to the autocovariance functions of the process, i.e., a zero-mean (that is the case for the tangent processes’ increments) continuous-time process

{X (t)}_{t \geq 0}

is autocovariance ergodic if it is covariance stationary and satisfies

\frac{1}{T} \int_{0}^{T} X (t + τ) X (t) d t \to_{T \to + \infty}^{L^{2} (P)} E (X (u + τ) X (u)), for all u > 0, τ \geq 0,

(15)

where

X_{n} \to_{n \to + \infty}^{L^{2} (P)} X

denotes the mean squared convergence:

E | X_{n} {- X |}^{2} \underset{n \to + \infty}{\to} 0

. Please note that the above convergence (15) yields

\frac{1}{n - τ - 1} \sum_{k = 1}^{n - τ} X (k) X (k + τ) \to_{n \to + \infty}^{P} E (X (1) X (1 + τ)), for τ \in N .

(16)

Thus, Assumption 1 says that the observed processes’ tangent processes have covariance stationary increments. The typical examples of locally asymptotically self-similar processes satisfying Assumption 1 are fractional Brownian motion (fBm) [40], multifractional Brownian motion (mBm) [41,42,43] and the generalized multifractional Brownian motion introduced in [44]. Below, we focus our attention on mBm, which is used in our simulation study and real world application (see Section 5 and Section 6). MBm is a paradigmatic example of both multifractional stochastic processes and locally asymptotically self-similar processes. It naturally extends the classical fBm by allowing its Hurst parameter to vary with time. The mBm was introduced independently by Peltier and Lévy-Véhel [41] and Benassi et al. [42], using, respectively, integral moving average type representation and harmonizable integral representation of the fBm. These two types of mBms share several core features and their precise connection was studied by Stoev and Taqqu [43], who show that the two types of mBms generally have different correlation structures. The most recent discussion on the definition of mBm is also made in [43], where they define a general class of multifractional Gaussian processes which includes the above two types of mBms as two particular cases. In this paper, we adopt a definition of mBm through the so-called harmonizable integral representation (see (1.3) in [43] or see [42,44]). Please note that our analysis and approaches are valid for all other versions of mBms in the literature.

Definition 3

(Multifractional Brownian motion). A multifractional Brownian motion

{W_{H (t)} (t)}_{t \geq 0}

is a continuous-time Gaussian process defined by:

W_{H (0)} (0) = 0 a . s . a n d W_{H (t)} (t) = \int_{R} \frac{e^{i t ξ} - 1}{{| ξ |}^{H (t) + 1 / 2}} d \tilde{W} (ξ), f o r t > 0,

where:

$d \tilde{W} (ξ)$ denotes a complex-valued Gaussian measure (see Proposition 2.1 in [43]) satisfying

$\int_{R} \tilde{f} (ξ) d \tilde{W} (ξ) = \int_{R} f (t) d W (t) a . s .$

for any

$f \in L^{2} (R) : = \{\{f : R \to R\} : \int_{R} {| f (u) |}^{2} d u < + \infty\},$

with $\tilde{f} (ξ) : = {(2 π)}^{- 1 / 2} \int_{R} e^{i ξ u} f (u) d u$ being the Fourier transform of f and ${W (t)}_{t \in R}$ being a standard Brownian motion.
The Hurst functional parameter $H : [0, + \infty) ⟶ (0, 1)$ is a Hölder function with exponent $β > sup_{t \in [0, + \infty)} H (t)$ . Subject to this constraint the paths of mBm are almost surely continuous functions.

Theorem 4.1 in [43] gives the covariance function of

{W_{H (t)} (t)}_{t \geq 0}

: for

s, t \geq 0

,

C o v (W_{H (s)} (s), W_{H (t)} (t)) = D (H (s), H (t)) (s^{H (s) + H (t)} + t^{H (s) + H (t)} - {| t - s |}^{H (s) + H (t)}),

(17)

where

D (s, t) : = \frac{\sqrt{Γ (2 s + 1) Γ (2 t + 1) sin (π s) sin (π t)}}{2 Γ (s + t + 1) sin (π (s + t) / 2)} .

It is known that the pointwise Hölder exponent (pHe) of

{W_{H (t)} (t)}_{t \geq 0}

is almost surely equal to

H (•)

at each t [42]. Recall that, for a continuous-time nowhere differentiable process

{Y (t)}_{t \geq 0}

, its local Hölder regularity can be measured by the pHe

ρ_{Y}

defined by: for each

t_{0} \geq 0

,

ρ_{Y} (t_{0}) : = sup \{α \in [0, 1] : \underset{ε \to 0^{+}}{lim sup} \frac{| Y (t_{0} + ε) - Y (t_{0}) |}{{| ε |}^{α}} = 0\} .

For a continuous but undifferentiable function, the pHe measures its “local roughness”: the smaller the pHe is at time t, the more “fractals” should be observed around t in the path. When

H (•) \equiv H

becomes a constant, mBm reduces to an fBm with Hurst parameter equal to H. More generally, it can be seen from [34] that mBm is locally asymptotically self-similar satisfying Assumption 1. Its tangent process at each t is an fBm

{B^{(H (t))} (u)}_{u}

with index

H (t)

:

{\{\frac{W_{H (t + τ u)} (t + τ u) - W_{H (t)} (t)}{τ^{H (t)}}\}}_{u} \to_{τ \to 0^{+}}^{f.d.d.} C_{H (t)} {\{B^{(H (t))} (u)\}}_{u},

(18)

where

C_{H (t)}

is a deterministic function only depending on

H (t)

. In the literature, some studies are particularly focused on the statistical inference problems around the pHe of processes and their applications. This study is motivated by modeling using locally asymptotically self-similar processes as well. We refer the readers to [45,46,47,48,49,50].

As one of the most natural extensions of fBm, mBm has, at present, broad applications. Unlike fBm, mBm allows its Hurst parameter H to change with time. This allows us to model different regimes of the stochastic process with one single model. For example, during a financial crisis, asset volatility may rise significantly, while it is much lower in the peaceful period. Likewise, empirical evidence shows that there have been periods of different volatilities in either exchange rates or interest rates. An fBm (or a self-similar process) is unable to capture the above phenomena. This motivates researchers to introduce mBm into finance as an alternative or improvement of fBm.

The assumption of covariance stationarity inspires us to introduce a covariance-based dissimilarity measure between the sample paths, in order to capture the level of differences between the two corresponding covariance stationary processes. Later we show that the assumption of autocovariance-ergodicity is sufficient for the clustering algorithms to be approximately asymptotically consistent.

3. Clustering Stochastic Processes

3.1. Covariance-Based Dissimilarity Measure between Autocovariance Ergodic Processes

Let Z be a process satisfying Assumption 1. Denote by Y its tangent process (see (10)) and denote by X an increment process of Y, i.e., there is some

u \geq 0

such that

X (t) = Y (t + u) - Y (u)

for all

t \geq 0

. In Assumption 1, X is autocovariance ergodic. Since we will show that clustering distinct Zs is approximately asymptotically equivalent to clustering the corresponding increment processes Xs, then the dissimilarity measures of Zs can be constructed based on those of the autocovariance ergodic processes Xs. From (4) we know that the autocovariance process X is zero-mean. Our first main result is then introduction to the following covariance-based dissimilarity measure between autocovariance ergodic processes. We refer the readers to [13] for more features of this dissimilarity measure.

Definition 4.

The covariance-based dissimilarity measure d between the discrete-time stochastic processes

X^{(1)}

,

X^{(2)}

(in fact

X^{(1)}

,

X^{(2)}

denote two covariance structures, each class may contain different process distributions but they share the same autocovariance function) is defined by

d (X^{(1)}, X^{(2)}) : = \sum_{m, l = 1}^{+ \infty} w_{m} w_{l} ρ (C o v (X_{l \dots l + m - 1}^{(1)}), C o v (X_{l \dots l + m - 1}^{(2)})),

(19)

where:

For any integers $l \geq 1$ , $m \geq 0$ , $X_{l \dots l + m - 1}^{(1)}$ is the shortcut notation of the row vector $(X_{l}^{(1)}, \dots, X_{l + m - 1}^{(1)})$ .
The distance ρ between two equal-sized covariance matrixes $M_{1}, M_{2}$ denotes the Frobenius norm of $M_{1} - M_{2}$ . Recall that for a matrix $A_{M \times N}$ , its Frobenius norm is defined by $∥ A_{M \times N} ∥_{F} : = \sqrt{\sum_{i = 1}^{M} \sum_{j = 1}^{N} a_{i j}^{2}},$ where for each $(i, j) \in {1, \dots, M} \times {1, \dots, N}$ , $a_{i j}$ denotes the $(i, j)$ -coefficient of $A_{M \times N}$ .
The sequence of positive weights ${w_{j}}_{j \geq 1}$ should be chosen such that $d (X^{(1)}, X^{(2)}) < + \infty$ , i.e., the series on the right-hand side of Equation (19) is convergent. The choice of ${w_{j}}_{j}$ will be discussed in the forthcoming simulation study in Section 5.

Remark 1.

It is important to note that Definition 4 only defines the dissimilarity measures between discrete-time stochastic processes (time series). This is the most common object studied in the literature on clustering stochastic processes. Studying clustering of time series is sufficient for practical applications for at least two reasons. First, a continuous-time path is not observable in the real world. Second, any (stochastically) continuous-time process can be well approximated by its discrete-time paths. In what follows, we will find a mean sample path by a finite-length subsequence of a discretized stochastic process.

Thanks to the autocovariance-ergodicity of the sample processes, the dissimilarity measure d can be estimated by the empirical dissimilarity measure

\hat{d}

below:

Definition 5.

Given two processes’ discrete-time sample paths

x_{j} = (X_{1}^{(j)}, \dots, X_{n_{j}}^{(j)})

for

j = 1, 2

, let

n = min {n_{1}, n_{2}}

, then the empirical covariance-based dissimilarity measure between

x_{1}

and

x_{2}

is given by

\hat{d} (x_{1}, x_{2}) : = \sum_{m = 1}^{m_{n}} \sum_{l = 1}^{n - m + 1} w_{m} w_{l} ρ (ν (X_{l \dots l + m - 1}^{(1)}), ν (X_{l \dots l + m - 1}^{(2)})),

(20)

where:

$m_{n}$ ( $\leq n$ ) is the largest dimension of the covariance matrix considered by $\hat{d}$ ; in this framework we take $m_{n} = ⌊ log n ⌋$ , i.e., the floor number of $log n$ [12,13].
For $j = 1, 2$ , $1 \leq l \leq n$ and $m \leq n - l + 1$ , $ν (X_{l \dots l + m - 1}^{(j)})$ denotes the empirical covariance matrix of the process $X^{(j)}$ s path $(X_{l}^{(j)}, \dots, X_{l + m - 1}^{(j)})$ , which is given below:

$ν (X_{l \dots l + m - 1}^{(j)}) : = \frac{\sum_{i = l}^{n - m + 1} {(X_{i}^{(j)} \dots X_{i + m - 1}^{(j)})}^{T} (X_{i}^{(j)} \dots X_{i + m - 1}^{(j)})}{n - m - l + 2},$

(21)

where ${(•)}^{T}$ denotes the transpose of a matrix.

Remark 2.

Since X is autocovariance ergodic, every empirical covariance matrix

ν (X_{l \dots l + m - 1})

is a consistent estimator of the covariance matrix

C o v (X_{l \dots l + m - 1})

under Frobenius norm and in probability, i.e.,

∥ ν (X_{l \dots l + m - 1}) - C o v (X_{l \dots l + m - 1}) ∥_{F} \to_{n \to + \infty}^{P} 0, for any l \geq 0 .

(22)

Furthermore, the fact that both d and

\hat{d}

satisfy the triangle inequalities implies that

\hat{d}

is a consistent estimator of d. The proof is quite similar to that of Lemma 1 in [13], except that in the former statement the convergence holds in probability. These ergodicity and triangle inequalities are the keys to demonstrate that our algorithms in the next section are approximately asymptotically consistent. We list them in the following paragraphs.

Remark 3.

For every pair of paths

x_{1} = (X_{1}^{(1)}, \dots, X_{n_{1}}^{(1)}) a n d x_{2} = (X_{1}^{(2)}, \dots, X_{n_{2}}^{(2)}),

sampled from two autocovariance ergodic processes

X^{(1)}

and

X^{(2)}

, respectively, we have

\hat{d} (x_{1}, x_{2}) \to_{n_{1}, n_{2} \to + \infty}^{P} d (X^{(1)}, X^{(2)}),

(23)

and

\hat{d} (x_{i}, X^{(j)}) \to_{n_{i} \to \infty}^{P} d (X^{(i)}, X^{(j)}), for i, j \in {1, 2},

(24)

where the dissimilarity measure

\hat{d} (x_{i}, X^{(j)})

between the sample path

x_{i}

and the stochastic process

X^{(j)}

is defined to be

\hat{d} (x_{i}, X^{(j)}) : = \hat{d} (x_{i}, (X_{1}^{(j)}, \dots, X_{n_{i}}^{(j)})) .

Remark 4.

Thanks to their definitions, the triangle inequalities hold for the covariance-based dissimilarity measure d in (19), as well as for its empirical estimates

\hat{d}

in (20). Therefore, for arbitrary processes

X^{(i)}, i = 1, 2, 3

and arbitrary random vectors

x_{i}, i = 1, 2, 3

we have

\begin{matrix} d (X^{(1)}, X^{(2)}) \leq d (X^{(1)}, X^{(3)}) + d (X^{(2)}, X^{(3)}), \\ \hat{d} (x_{1}, x_{2}) \leq \hat{d} (x_{1}, x_{3}) + \hat{d} (x_{2}, x_{3}), \\ \hat{d} (x_{1}, X^{(1)}) \leq \hat{d} (x_{1}, X^{(2)}) + d (X^{(1)}, X^{(2)}) . \end{matrix}

In the next section, we define a proper covariance-based dissimilarity measure between locally asymptotically self-similar processes satisfying Assumption (

A

), based on the dissimilarity measure d.

3.2. Covariance-Based Dissimilarity Measure between Locally Asymptotically Self-Similar Processes

Now under Assumption (

A

), we study the asymptotic relationship between the locally asymptotically self-similar process

{\{Z_{t}^{(H (t))}\}}_{t}

in (10) and its tangent process’ increment process. The following result reveals the relationship between local asymptotic self-similarity and covariance stationarity.

Proposition 2.

Let

{\{Z_{t}^{(H (t))}\}}_{t \geq 0}

be a locally asymptotically self-similar process satisfying Assumption (

A

). For each

h > 0

,

{\{\frac{Z_{t + τ (u + h)}^{(H (t + τ (u + h)))} - Z_{t + τ u}^{(H (t + τ u))}}{τ^{H (t)}}\}}_{u \geq 0} \to_{τ \to 0^{+}}^{f . d . d .} {\{X_{u, h}^{(H (t))}\}}_{u \geq 0},

(25)

where

{\{X_{u, h}^{(H (t))}\}}_{u \geq 0} : = {\{Y_{u + h}^{(H (t))} - Y_{u}^{(H (t))}\}}_{u \geq 0}

(see (10)) is an autocovariance ergodic process.

Proof.

Let us fix

h > 0

and pick any finite discrete-time indexes set

T \subset [0, + \infty)

. In Assumption (

A

), the f.d.d. convergence (10) holds. It then implies

{(\frac{Z_{t + τ (u + h)}^{(H (t + τ (u + h)))} - Z_{t}^{(H (t))}}{τ^{H (t)}}, \frac{Z_{t + τ u}^{(H (t + τ u))} - Z_{t}^{(H (t))}}{τ^{H (t)}})}_{u \in T} \to_{τ \to 0^{+}}^{l a w} {(Y_{u + h}^{(H (t))}, Y_{u}^{(H (t))})}_{u \in T^{'}}

(26)

where we adopt the notation

{(a_{u}, b_{u})}_{u \in {u_{1}, \dots, u_{N}}}

to denote the vector

(a_{u_{1}}, b_{u_{1}}, a_{u_{2}}, b_{u_{2}}, \dots, a_{u_{N}}, b_{u_{N}}) .

It follows from (26) and the continuous mapping theorem that

{(\frac{Z_{t + τ (u + h)}^{(H (t + τ (u + h)))} - Z_{t}^{(H (t))}}{τ^{H (t)}} - \frac{Z_{t + τ u}^{(H (t + τ u))} - Z_{t}^{(H (t))}}{τ^{H (t)}})}_{u \in T} \to_{τ \to 0^{+}}^{law} {(Y_{u + h}^{(H (t))} - Y_{u}^{(H (t))})}_{u \in T} .

(27)

(25) then results from (27) and the fact that the choice of T is arbitrary. In Assumption (

A

),

{\{X_{u, h}^{(H (t))}\}}_{u} : = {\{Y_{u + h}^{(H (t))} - Y_{u}^{(H (t))}\}}_{u}

is autocovariance ergodic, hence Proposition 2 is proved. □

From a statistical point of view, the sequence at the left-hand side of (25) cannot straightforwardly serve to estimate the distribution of the right-hand side

{\{X_{u, h}^{H (t)}\}}_{u}

, because the functional index

H (•)

at the left-hand side is not observable in practice. To overcome this inconvenience, we note that (25) can be interpreted as follows: when

τ

is sufficiently small,

{\{Z_{t + τ (u + h)}^{(H (t + τ (u + h)))} - Z_{t + τ u}^{(H (t + τ u))}\}}_{u \in [0, K h]} \overset{f.d.d.}{\approx} {\{τ^{H (t)} X_{u, h}^{(H (t))}\}}_{u \in [0, K h]^{'}}

(28)

where K is an arbitrary positive integer. Smaller is

τ

, closer are the two hand sides of (28). However in practice

τ

has its optimal value, due to the sample size limitation. Therefore, when applied to the real world, (28) says that: given a discrete-time path

Z_{t_{1}}^{(H (t_{1}))}, \dots, Z_{t_{n}}^{(H (t_{n}))}

with

t_{i} = i h Δ t

for each

i \in {1, \dots, n}

, sampled from a locally asymptotically self-similar process

{\{Z_{t}^{(H (t))}\}}_{t}

, its localized increment paths with time index around

t_{i}

, i.e.,

z^{(i)} : = (Z_{t_{i + 1}}^{(H (t_{i + 1}))} - Z_{t_{i}}^{(H (t_{i}))}, \dots, Z_{t_{i + 1 + K}}^{(H (t_{i + 1 + K}))} - Z_{t_{i + K}}^{(H (t_{i + K}))}),

(29)

is approximately distributed as an autocovariance ergodic increment process of the self-similar process

{\{{Δ t}^{H (t_{i})} X_{u, h}^{(H (t_{i}))}\}}_{u \in [0, K h]}

. In this example,

Δ t

is the smallest value one can consider for

τ

in (28). This fact drives us to define the empirical covariance-based dissimilarity measure between two paths of locally asymptotically self-similar processes

z_{1}

and

z_{2}

as follows:

\hat{d^{*}} (z_{1}, z_{2}) : = \frac{1}{L} \sum_{i = 1}^{L} \hat{d} (z_{1}^{(i)}, z_{2}^{(i)}),

(30)

where:

L is chosen from ${1, \dots, n - K - 1}$ .
$z_{1}^{(i)}$ , $z_{2}^{(i)}$ are the localized increment paths defined as in (29). Heuristically speaking, for $i = 1, \dots, n - K - 1$ , $\hat{d} (z_{1}^{(i)}, z_{2}^{(i)})$ computes the “distance” between the two covariance structures (of the increments of ${\{Z_{t}^{H (t)}\}}_{t}$ ) indexed by the time in the neighborhood of $t_{i}$ , and $\hat{d^{*}} (z_{1}, z_{2})$ averages the above distances. It is worth noting that the value K describes the “sample size” used to approximate each local distance $\hat{d}$ . Therefore, its value should be picked neither too large nor too small and it can depend on n. It is suggested that $K \geq 5$ in order that the result of estimating the dissimilarity measure $\hat{d}$ is acceptable. The largest value one can set for K is $n - 2$ (correspondingly, $L = 1$ ).

The following observation is straightforward.

Remark 5.

Based on the definition (30) and Remark 2,

\hat{d^{*}}

is also a (weakly) consistent estimator of d (see Remark 3) and it also satisfies the triangle inequalities as in Remark 4.

Through employing the dissimilarity measure

\hat{d^{*}}

on locally asymptotically self-similar processes, we obtain the so-called “approximately asymptotically consistent algorithms”, which are introduced in the following section.

4. Approximately Asymptotically Consistent Algorithms

4.1. Offline and Online Algorithms

Please note that the covariance-based dissimilarity measure

\hat{d^{*}}

defined in (30) will aim to cluster covariance structures, not process distributions; therefore the ground truths of clustering should be those of covariance structures. We thus define the ground truth as follows [13].

Definition 6

(Ground truth of covariance structures). Let

G = \{G_{1}, \dots, G_{κ}\}

be a partitioning of

N

into κ disjoint sets

G_{k}

,

k = 1, \dots, κ

, such that the means and covariance structures of

x_{i}

,

i \in N

are identical, if and only if

i \in G_{k}

for some

k = 1, \dots, κ

. Such G is called ground truth of covariance structures. For

N \geq 1

, we denote by

{G |}_{N}

the restriction of G to the first N sequences:

{G |}_{N} = \{G_{k} \cap {1, \dots, N} : k = 1, \dots, κ\} .

The processes Z satisfying Assumption 1 are generally not covariance stationary; however, their tangent processes’ increments X are covariance stationary. In view of (25) and (28), clustering these processes Z is equivalent to clustering X, based on the covariance structure ground truth of the latter increments. Below we will introduce algorithms aiming to approximate the covariance structure ground truth of X.

Depending on how the information is collected, the processes clustering problems consist of dealing with two separate model settings: the offline and online settings. In the offline setting, the sample size and each path length are time-independent. However, in the online setting, they may both grow with time. As stated in [12], using the offline algorithm in the online setting by simply applying it to the entire data observed at every time step, does not result in an asymptotically consistent algorithm. As a result, we consider clustering offline and online datasets as two approaches and study them separately. Hence the approximated asymptotic consistency will be described in Theorems 2 and 3 below, respectively, for offline and online clustering algorithms. Our offline and online clustering algorithms below are obtained by replacing the dissimilarity measures in Algorithms 1 and 2 in [13] with

\hat{d^{*}}

.

For the offline setting, we cluster observed data using Algorithm 1 below. It is a 2-point initialization centroid-based clustering approach. From the distance

\hat{d^{*}}

defined in (30), the algorithm picks the farthest two points to be the first two cluster centers (Lines 1–2). Then, iteratively, each following cluster center is chosen to be the point farthest to all the previously assigned cluster centers (Lines 3–5). Finally, the algorithm assigns each remaining point to the nearest cluster (Lines 7–10).

The strategy for clustering online data is presented in Algorithm 2 as follows. At each time t, first update the collection of sample paths

S (t)

(Lines 1–2); then for each

j = κ, \dots, N (t)

, use Algorithm 1 to group the first j paths in

S (t)

into

κ

clusters (Lines 6–7); for each cluster the center is selected as the point having the smallest index among that cluster, and sort these indexes increasingly (Line 8); calculate the minimum inter-cluster distance

γ_{j}

and update the normalization factor

η

(Lines 9–11). Finally, every sample path in

S (t)

is assigned to the “nearest” cluster, based on the weighted combination of the distances (given in Line 15) between this observation and the candidate cluster centers obtained at each iteration on j (Lines 14–17).

Algorithm 1: Offline clustering.

$Fractalfract 06 00222 i001$

Algorithm 2: Online clustering.

$Fractalfract 06 00222 i002$

4.2. Computational Complexity and Consistency of the Algorithms

We describe the computational complexity based on the number of computations of the distance

ρ

. For Algorithm 1, the 2-point initialization requires

N (N - 1) / 2

times calculations of

\hat{d^{*}}

. From (30) we see that each calculation of

\hat{d^{*}}

consists of

n_{min} - K - 1

times computations of

\hat{d}

. By (20),

\hat{d}

can be obtained through computing

K - log K + 1

times distances

ρ

. Therefore, the total number of computations of

ρ

is not greater than

N (N - 1) (n_{min} - K - 1) (K - log K + 1) / 2

. For Algorithm 2, since at each step

j \in {κ, \dots, N - κ + 1}

, Algorithm 1 is run on j observations, the total number of

ρ

s computations is then less than

(n_{min} - K - 1) (K - log K + 1) \sum_{j = κ}^{N - κ + 1} j (j - 1) / 2

. The computational complexity is acceptable in practice, and it is quite competitive compared with existing algorithms for clustering stochastic processes.

Now we introduce the notion of approximately asymptotic consistency. Fix a positive integer K. Let

Z^{(1)}

,

Z^{(2)}

be two locally asymptotically self-similar processes with respect functional indexes

H_{1} (•)

,

H_{2} (•)

. Furthermore, let

\{z_{1}^{(1)}, \dots, z_{1}^{(n - K - 1)}\} and \{z_{2}^{(1)}, \dots, z_{2}^{(n - K - 1)}\}

be, respectively, their sample paths

z_{1}

,

z_{2}

’ increments, defined as in (29). For

j = 1, 2

, we define the normalized increments by taking the following linear transformation:

H (z_{j}^{(i)}) : = \frac{z_{j}^{(i)}}{{Δ t}^{H_{j} (t_{i})}}, for i = 1, \dots, n - K - 1 .

(31)

Then using (25) we obtain

H (z_{j}^{(i)}) \to_{Δ t \to 0}^{law} (X_{0, j}^{(H_{j} (t_{i}))}, X_{h, j}^{(H_{j} (t_{i}))}, \dots, X_{K h, j}^{(H_{j} (t_{i}))}),

(32)

where

(X_{0, j}^{(H_{j} (t_{i}))}, X_{h, j}^{(H_{j} (t_{i}))}, \dots, X_{K h, j}^{(H_{j} (t_{i}))})

denotes a discrete-time path of the increment of a self-similar process with self-similarity index

H_{j} (t_{i})

. Fix

L \geq 1

. For each empirical dissimilarity measure

\hat{d^{*}} (z_{1}, z_{2})

, we correspondingly define

\tilde{d^{*}} (z_{1}, z_{2}) : = \frac{1}{L} \sum_{i = 1}^{L} \hat{d} (H (z_{1}^{(i)}), H (z_{2}^{(i)})) .

(33)

\tilde{d^{*}} (z_{1}, z_{2})

is another dissimilarity measure between

z_{1}, z_{2}

, which has a tight relationship to the distance

\hat{d^{*}}

between their tangent processes’ increments. Indeed using (32) and the continuous mapping theorem, it is easy to derive the following result.

Proposition 3.

For any N independent sample paths

z_{1}, z_{2}, \dots, z_{N}

,

{(\tilde{d^{*}} (z_{i}, z_{j}))}_{i, j \in {1, \dots, N}, i \neq j} \to_{Δ t \to 0}^{l a w} {(\hat{d^{*}} (x_{i}, x_{j}))}_{i, j \in {1, \dots, N}, i \neq j},

(34)

where

x_{1}, x_{2}

are the increments of the tangent processes corresponding to

z_{1}, z_{2}

, respectively.

In particular, when

Δ t = 1

,

\tilde{d^{*}} (z_{1}, z_{2}) = \hat{d^{*}} (z_{1}, z_{2})

. In this sense,

\tilde{d^{*}} (z_{1}, z_{2})

“approximates”

\hat{d^{*}}

.

\tilde{d^{*}}

cannot be observed in practice since the functional indexes of the locally asymptotically self-similar processes are supposed to be unknown. In what follows,

\tilde{d^{*}}

only serves to define the approximate asymptotic consistency. In this concept, “approximate” means the clustering locally asymptotically self-similar processes problem is “approximately” equivalent to the clustering their tangent processes problem.

Now we state the consistency theorems. Through Theorems 2 and 3, below we show that Algorithms 1 and 2 are both approximately asymptotically consistent. Their proofs are inspired by the ones in [12,13]. However different from the consistency theorems in [12,13], the convergences in Theorems 2 and 3 have weaker senses, which are in probability, not almost sure.

Theorem 2.

In Assumption 1, Algorithm 1 is approximately asymptotically consistent for clustering the offline sample paths

S = {z_{1}, \dots, z_{N}}

. This means: if

\hat{d^{*}}

is replaced with

\tilde{d^{*}}

in Algorithm 1, the output clusters converge to the covariance structure ground truths of the increments of the corresponding tangent processes

S^{'} = {x_{1}, \dots, x_{N}}

in probability, as

Δ t \to 0

and

n_{min} : = min {n_{1}, \dots, n_{N}} \to + \infty

. More formally,

lim_{n_{min} \to + \infty} lim_{Δ t \to 0} P (f (S, κ, \tilde{d^{*}}) = G_{S^{'}}) = 1,

(35)

where f is given in Algorithm 1 and

G_{S^{'}}

denotes the ground truths of the covariance structures that generate the set of paths

S^{'}

.

Proof.

First, we show

P (f (S, κ, \tilde{d^{*}}) = G) \underset{Δ t \to 0}{\to} P (f (S^{'}, κ, \hat{d^{*}}) = G),

(36)

where

G = {C_{1}, \dots, C_{κ}}

denotes any

κ

-partition of

{1, \dots, N}

. Since if the estimated clusters

f (S, κ, \tilde{d^{*}}) = G

, the samples in S that are generated by the same cluster in G are closer under

\tilde{d^{*}}

to each other than to the rest of the samples. Hence we can write

\begin{matrix} P (f (S, κ, \tilde{d^{*}}) = G) \\ = P (⋃_{ε > 0} (\{max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in C_{l} \end{matrix}} \tilde{d^{*}} (z_{i}, z_{j}) < ε\} ⋂ \{min_{\begin{matrix} k, k^{'} \in {1, \dots, κ}, k \neq k^{'} \\ i \in C_{k}, j \in C_{k^{'}} \end{matrix}} \tilde{d^{*}} (z_{i}, z_{j}) > ε\})) . \end{matrix}

(37)

It follows from (37) and (34) that

\begin{matrix} lim_{Δ t \to 0} P (f (S, κ, \tilde{d^{*}}) = G) \\ = P (⋃_{ε > 0} (\{max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in C_{l} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) < ε\} ⋂ \{min_{\begin{matrix} i \in C_{k}, j \in C_{k^{'}} \\ k, k^{'} \in {1, \dots, κ} \\ k \neq k^{'} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) > ε\})) \\ = P (f (S^{'}, κ, \hat{d^{*}}) = G), \end{matrix}

which proves (36).

Next we show that Algorithm 1 is asymptotically consistent on clustering

S^{'}

under

\hat{d^{*}}

:

P (f (S^{'}, κ, \hat{d^{*}}) = G_{S^{'}}) \underset{n_{min} \to \infty}{\to} 1 .

(38)

Denote by

G_{S^{'}} : = {G_{1}, \dots, G_{κ}} .

Fix

δ > 0

. Let

n_{min}

denote the shortest path length in

S^{'}

:

\begin{matrix} n_{min} : = min \{n_{i} : i \in {1, \dots, N}\} . \end{matrix}

Denote by

δ_{min}

the minimal non-zero dissimilarity measure between the processes with different covariance structures:

δ_{min} : = min \{d (X^{(k)}, X^{(k^{'})}) : k, k^{'} \in {1, \dots, κ}, k \neq k^{'}\} .

(39)

Fix

ε \in (0, δ_{min} / 4)

and let

δ > 0

be arbitrarily small. Since there are a finite number N of samples, by Remark 3 there is

n_{0}

such that for

n_{m i n} > n_{0}

we have

\begin{matrix} P (max_{\begin{matrix} l \in {1, \dots, κ} \\ i \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) > ε) < δ . \end{matrix}

(40)

On one hand, by applying the triangle inequalities (see Remark 5), we obtain

\begin{matrix} max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) \\ \leq max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) + max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{j}, X^{(l)}) \\ = 2 max_{\begin{matrix} l \in {1, \dots, κ} \\ i \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) . \end{matrix}

(41)

Then by (41) and the fact that

2 ε < δ_{min} / 2

, the following inclusion holds:

\{max_{\begin{matrix} l \in {1, \dots, κ} \\ i \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) \leq ε\} \subset \{max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) \leq 2 ε < \frac{δ_{min}}{2}\} .

(42)

On the other hand, by applying the triangle inequalities (see Remark 5) and the fact that

2 ε < δ_{min} / 2

, we also obtain: if

max_{\begin{matrix} l \in {1, \dots, κ} \\ i \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) \leq ε,

then

\begin{matrix} min_{\begin{matrix} k, k^{'} \in {1, \dots, κ}, k \neq k^{'} \\ i \in G_{k} \cap {1, \dots, N} \\ j \in G_{k^{'}} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) \\ \geq min_{\begin{matrix} k, k^{'} \in {1, \dots, κ}, k \neq k^{'} \\ i \in G_{k} \cap {1, \dots, N} \\ j \in G_{k^{'}} \cap {1, \dots, N} \end{matrix}} \{d (X^{(k)}, X^{(k^{'})}) - \hat{d^{*}} (x_{i}, X^{(k)}) - \hat{d^{*}} (x_{j}, X^{(k^{'})})\} \\ \geq δ_{min} - 2 ε > \frac{δ_{min}}{2} . \end{matrix}

Equivalently,

\{max_{\begin{matrix} l \in {1, \dots, κ} \\ i \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) \leq ε\} \subset \{min_{\begin{matrix} k, k^{'} \in {1, \dots, κ}, k \neq k^{'} \\ i \in G_{k} \cap {1, \dots, N} \\ j \in G_{k^{'}} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) > \frac{δ_{min}}{2}\} .

(43)

It follows from (42), (43) and (40) that for

n_{min} > n_{0}

,

\begin{matrix} P (\{max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) \geq \frac{δ_{min}}{2}\} ⋃ \{min_{\begin{matrix} k, k^{'} \in {1, \dots, κ}, k \neq k^{'} \\ i \in G_{k} \cap {1, \dots, N} \\ j \in G_{k^{'}} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) \leq \frac{δ_{min}}{2}\}) \\ \leq P (max_{\begin{matrix} l \in {1, \dots, κ} \\ i \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, X^{(l)}) > ε) < δ . \end{matrix}

(44)

Please note that (44) is equivalent to

\begin{matrix} P (\{max_{\begin{matrix} l \in {1, \dots, κ} \\ i, j \in G_{l} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) < \frac{δ_{min}}{2}\} ⋂ \{min_{\begin{matrix} k, k^{'} \in {1, \dots, κ}, k \neq k^{'} \\ i \in G_{k} \cap {1, \dots, N} \\ j \in G_{k^{'}} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}, x_{j}) > \frac{δ_{min}}{2}\}) \\ \underset{n_{min} \to + \infty}{\to} 1 . \end{matrix}

This tells us that the sample paths in S that are generated by the same covariance structures are closer to each other than to the rest of sample paths. Then, by (44), for

n_{min} > n_{0}

, each sample path should be “close” enough to its cluster center, i.e.,

P (max_{i \in {1, \dots, N}} min_{k \in {1, \dots, κ - 1}} \hat{d^{*}} (x_{i}, x_{c_{k}}) \leq \frac{δ_{min}}{2}) < δ,

(45)

where the

κ

cluster centers’ indexes

c_{1}, \dots, c_{κ}

are determined by Algorithm 1 in the following way:

(c_{1}, c_{2}) : = \underset{i, j \in {1, \dots, N}, i < j}{arg max} \hat{d^{*}} (x_{i}, x_{j}),

and

c_{k} : = \underset{i \in {1, \dots, N}}{arg max} min_{j \in {1, \dots, k - 1}} \hat{d^{*}} (x_{i}, x_{c_{j}}), k = 3, \dots, κ .

These

c_{1}, \dots, c_{κ}

are chosen to index sample paths generated by different process covariance structures. Then by (44), each remaining sample path will be assigned to the cluster center corresponding to the sample path generated by the same process covariance structure. Finally, (38) results from (44) and (45); and (35) is proved by combining (36) and (38). □

Below we state the consistency theorem concerning the online clustering algorithm.

Theorem 3.

In Assumption 1, Algorithm 2 is approximately asymptotically consistent for clustering the online sample paths

S (t) = {z_{1}^{t}, \dots, z_{N (t)}^{t}}

,

t = 1, 2, \dots

. This means: if

\hat{d^{*}}

is replaced with

\tilde{d^{*}}

in Algorithm 2, for any integer

N \geq 1

, the output clusters of the first N paths in

S (t)

{S (t) |}_{N} : = \{z_{1}^{t}, \dots, z_{N}^{t}\},

converge to the covariance structure ground truths of the increments of the corresponding tangent processes

S^{'} {(t) |}_{N} : = {x_{1}^{t}, \dots, x_{N}^{t}}

in probability, as

Δ t \to 0

and

t \to + \infty

. In other words,

lim_{t \to + \infty} lim_{Δ t \to 0} P (f (S (t), κ, \tilde{d^{*}}) {|_{N} = G_{S^{'} (t)} |}_{N}) = 1,

(46)

where

f (S (t), κ, \tilde{d^{*}}) |_{N}

denotes the clustering

f (S (t), κ, \tilde{d^{*}})

restricted to the first N sample paths in

S (t)

. We also recall that

G_{S^{'} (t)} |_{N}

is the restriction of

G_{S^{'} (t)}

to the first N sample paths

{x_{1}^{t}, \dots, x_{N}^{t}}

in

S^{'} (t)

(see Definition 6).

Proof.

Let us fix

N \geq 1

. First, similar to the derivation of (36) in the proof of Theorem 2, we can obtain

P (f (S (t), κ, \tilde{d^{*}}) {|_{N} = G_{S^{'} (t)} |}_{N}) \underset{Δ t \to 0}{\to} P (f (S^{'} (t), κ, \hat{d^{*}}) {|_{N} = G_{S^{'} (t)} |}_{N}) .

(47)

Then it remains to prove

P (f (S^{'} (t), κ, \hat{d^{*}}) {|_{N} = G_{S^{'} (t)} |}_{N}) \underset{t \to + \infty}{\to} 1 .

(48)

In what follows we prove (48).

Let

δ > 0

be arbitrarily small. Fix

ε \in (0, δ_{min} / 4)

, where

δ_{min}

is defined as in (39).

Denote by

δ_{max} : = max \{d (X^{(k)}, X^{(k^{'})}) : k, k^{'} \in {1, \dots, κ}\} .

(49)

For

k \in {1, \dots, κ}

, denote by

s_{k}

the index of the first path in

S^{'} (t)

sampled from

X^{(k)}

, i.e.,

\begin{matrix} s_{k} : = min \{i \in G_{k} \cap {1, \dots, N (t)}\} . \end{matrix}

(50)

Please note that

s_{k}

does not depend on t but only on k. Then denote by

\begin{matrix} m : = max_{k \in {1, \dots, κ}} s_{k} . \end{matrix}

(51)

For

j \geq 1

denote by

S^{'} {(t) |}_{j}

the first j sample paths in

S (t)

. Then from (51) we see that,

m \leq N (t)

and

S^{'} {(t) |}_{m}

contains paths sampled from all

κ

distinct processes (covariance structures). Using the fact that

\sum_{j = 1}^{+ \infty} w_{j} < + \infty

, we can find a fixed value

J \geq m

such that

\sum_{j = J + 1}^{+ \infty} w_{j} \leq ε .

(52)

Recall that in the online setting, the ith sample path’s length

n_{i} (t)

grows with time t. Therefore, by Remark 5, for every

j \in {1, \dots, J}

there exists some

T_{1} (j) > 0

such that

sup_{t \geq T_{1} (j)} P (max_{\begin{matrix} k \in {1, \dots, κ} \\ i \in G_{k} \cap {1, \dots, j} \end{matrix}} \hat{d^{*}} (x_{i}^{t}, X^{(k)}) > ε) < δ .

(53)

Since

J \geq m

, by Theorem 2 for every

j \in {m, \dots, J}

there exists

T_{2} (j) > 0

such that

A l g 1 (S^{'} (t) |_{j}, κ, \hat{d^{*}})

is asymptotically consistent for all

t \geq T_{2} (j)

. Since

N (t)

is increasing as

t \to + \infty

, there is

T_{3} > 0

such that

N (t) > J

for

t \geq T_{3}

. Let

\begin{matrix} T : = max \{max_{\begin{matrix} i \in {1, 2} \\ j \in {1, \dots, J} \end{matrix}} T_{i} (j), T_{3}\} . \end{matrix}

From Algorithm 2 (Lines 9, 11) we see

η^{t} : = \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t}, w i t h γ_{j}^{t} : = min_{\begin{matrix} k, k^{'} \in {1, \dots, κ} \\ k \neq k^{'} \end{matrix}} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, x_{c_{k^{'}}^{j}}^{t}) .

(54)

Below we provide upper bounds in probability of

η^{t}

and

γ_{j}^{t}

.

Upper bound of $γ_{j}^{t}$ : Similar to how (43) is derived, we use the triangle inequalities (Remark 5) and (39) to obtain:

$\begin{matrix} sup_{t \geq T} P (min_{j \in {1, \dots, N (t)}} γ_{j}^{t} < \frac{δ_{min}}{2}) \\ \leq sup_{t \geq T} P (min_{\begin{matrix} j \in {1, \dots, N (t)} \\ k, k^{'} \in {1, \dots, κ} \\ k \neq k^{'} \end{matrix}} (d (X^{(k)}, X^{(k^{'})}) - 2 \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)})) < \frac{δ_{min}}{2}) \\ \leq sup_{t \geq T} P (max_{\begin{matrix} j \in {1, \dots, N (t)} \\ k \in {1, \dots, κ} \end{matrix}} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > \frac{δ_{min}}{4}) . \end{matrix}$

(55)

Since the clusters are sorted in the order of appearance of the distinct process covariance structures, we have $x_{c_{k}^{j}}^{t} = x_{s_{k}}^{t}$ for all $j \geq m$ and $k \in {1, \dots, κ}$ , where we recall that the index $s_{k}$ is defined in (50). It follows from (55), the fact that $ε < δ_{min} / 4$ and (53) that

$sup_{t \geq T} P (min_{j \in {1, \dots, N (t)}} γ_{j}^{t} < \frac{δ_{min}}{2}) \leq sup_{t \geq T} P (max_{\begin{matrix} j \in {1, \dots, m} \\ k \in {1, \dots, κ} \end{matrix}} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε) < m δ .$

(56)

For $j \in {1, \dots, N (t)}$ , by (54), the triangle inequality, (49) and (56), we have

$\begin{matrix} sup_{t \geq T} P (max_{j \in {1, \dots, N (t)}} γ_{j}^{t} > δ_{max} + 2 ε) \\ \leq sup_{t \geq T} P (max_{\begin{matrix} j \in {1, \dots, N (t)} \\ k, k^{'} \in {1, \dots, κ} \\ k \neq k^{'} \end{matrix}} (d (X^{(k)}, X^{(k^{'})}) + 2 \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)})) > δ_{max} + 2 ε) \\ \leq sup_{t \geq T} P (max_{\begin{matrix} j \in {1, \dots, m} \\ k \in {1, \dots, κ} \end{matrix}} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε) < m δ . \end{matrix}$

(57)
Upper bound of $η^{t}$ : By (56) and the fact that $\sum_{j = 1}^{N (t)} w_{j} \geq w_{m},$ we have

$\begin{matrix} sup_{t \geq T} P (η^{t} < \frac{w_{m} δ_{min}}{2}) \leq sup_{t \geq T} P (min_{j \in {1, \dots, N (t)}} γ_{j}^{t} \sum_{j = 1}^{N (t)} w_{j} < \frac{w_{m} δ_{min}}{2}) \\ \leq sup_{t \geq T} P (min_{j \in {1, \dots, N (t)}} γ_{j}^{t} < \frac{δ_{min}}{2}) < m δ . \end{matrix}$

(58)

Recall that $N (t) > J$ for $t \geq T$ . Therefore for every $k \in {1, \dots, κ}$ we can write

$\begin{matrix} \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) = \frac{1}{η^{t}} \sum_{j = 1}^{m - 1} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) \\ + \frac{1}{η^{t}} \sum_{j = m}^{J} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) + \frac{1}{η^{t}} \sum_{j = J + 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) . \end{matrix}$

(59)

Now we provide upper bounds in probability of the three terms on the right-hand side of (59).

Upper bound of the first term: Using (53) and the fact that ${(η^{t})}^{- 1} \sum_{j = 1}^{m - 1} w_{j} γ_{j}^{t} \leq 1,$ we obtain

$\begin{matrix} sup_{t \geq T} P (\frac{1}{η^{t}} \sum_{j = 1}^{m - 1} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε) \leq sup_{t \geq T} P (max_{\begin{matrix} j \in {1, \dots, m - 1} \\ k \in {1, \dots, κ} \end{matrix}} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε) \\ \leq (m - 1) δ . \end{matrix}$

(60)
Upper bound of the second term: Recall that $x_{c_{k}^{j}}^{t} = x_{s_{k}}^{t}$ for all $j \in {m, \dots, J}$ and $k \in {1, \dots, κ}$ . Therefore, by (52) and the fact that ${(η^{t})}^{- 1} \sum_{j = m}^{J} w_{j} γ_{j}^{t} \leq 1,$ for every $k \in {1, \dots, κ}$ we have

$\begin{matrix} sup_{t \geq T} P (\frac{1}{η^{t}} \sum_{j = m}^{J} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε) = sup_{t \geq T} P (\hat{d^{*}} (x_{s_{k}}^{t}, X^{(k)}) \frac{1}{η^{t}} \sum_{j = m}^{J} w_{j} γ_{j}^{t} > ε) \\ \leq sup_{t \geq T} P (\hat{d^{*}} (x_{s_{k}}^{t}, X^{(k)}) > ε) < δ . \end{matrix}$

(61)
Upper bound of the third term: By (52), (58) and (57),

$\begin{matrix} sup_{t \geq T} P (\frac{1}{η^{t}} \sum_{j = J + 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}}) \\ \leq sup_{t \geq T} P (max_{\begin{matrix} j \in {1, \dots, N (t)} \\ k \in {1, \dots, κ} \end{matrix}} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε) \\ + sup_{t \geq T} P (max_{j \in {1, \dots, N (t)}} γ_{j}^{t} > δ_{max} + 2 ε) \\ < 2 m δ . \end{matrix}$

(62)

Combining (59)–(62) we obtain, for

k \in {1, \dots, κ}

,

\begin{matrix} sup_{t \geq T} P (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε (2 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) < 3 m δ . \end{matrix}

(63)

Now we explain how to use (63) to prove the asymptotic consistency of Algorithm 2. Consider an index

i \in G_{k^{'}}

for some

k^{'} \in {1, \dots, κ}

. On one hand, using the triangle inequalities, we obtain for

k \in {1, \dots, κ}

,

k \neq k^{'}

,

\begin{matrix} \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{i}^{t}, x_{c_{k}^{j}}^{t}) \geq \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} d^{*} (X^{(k)}, X^{(k^{'})}) \\ - (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t}) \hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) - \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) \\ \geq δ_{min} - \hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) + \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) . \end{matrix}

Then applying (53) and (63) we obtain

\begin{matrix} sup_{t \geq T} P (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{i}^{t}, x_{c_{k}^{j}}^{t}) < δ_{min} - ε (3 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) \\ \leq sup_{t \geq T} P (\hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) + \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε (3 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) \\ \leq sup_{t \geq T} P (\hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) > ε) \\ + sup_{t \geq T} P (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k}^{j}}^{t}, X^{(k)}) > ε (2 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) \\ < (3 m + 1) δ . \end{matrix}

(64)

On the other hand, from (53) we see for any

N \geq 1

,

\begin{matrix} sup_{t \geq T_{1} (N)} P (max_{\begin{matrix} k \in {1, \dots, κ} \\ i \in G_{k} \cap {1, \dots, N} \end{matrix}} \hat{d^{*}} (x_{i}^{t}, X^{(k)}) > ε) < δ . \end{matrix}

(65)

Using again the triangle inequalities, we obtain

\begin{matrix} \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{i}^{t}, x_{c_{k^{'}}^{j}}^{t}) \\ \leq (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t}) \hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) + \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k^{'}}^{j}}^{t}, X^{(k^{'})}) \\ \leq \hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) + \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k^{'}}^{j}}^{t}, X^{(k^{'})}) . \end{matrix}

(66)

Let

T^{'} : = max {T, T_{1} (N)}

. It results from (66), (65) and (63) that

\begin{matrix} sup_{t \geq T^{'}} P (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{i}^{t}, x_{c_{k^{'}}^{j}}^{t}) > ε (3 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) \\ \leq sup_{t \geq T^{'}} P (\hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) + \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k^{'}}^{j}}^{t}, X^{(k^{'})}) > ε (3 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) \\ \leq sup_{t \geq T^{'}} P (\hat{d^{*}} (x_{i}^{t}, X^{(k^{'})}) > ε) \\ + sup_{t \geq T^{'}} P (\frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j}^{t} \hat{d^{*}} (x_{c_{k^{'}}^{j}}^{t}, X^{(k^{'})}) > ε (2 + \frac{2 ε^{2} (δ_{max} + 2 ε)}{w_{m} δ_{min}})) \\ < (3 m + 1) δ . \end{matrix}

(67)

Since

δ

and

ε

can be chosen arbitrarily small, it follows from (64) and (67) that

\begin{matrix} P (\underset{k \in {1, \dots, κ}}{arg min} \frac{1}{η^{t}} \sum_{j = 1}^{N (t)} w_{j} γ_{j} \hat{d^{*}} (x_{i}^{t}, x_{c_{k}^{j}}^{t}) = k^{'}) \underset{t \to + \infty}{\to} 1, \end{matrix}

(68)

for all

i \in {1, \dots, N}

. (47) as well as Theorem 3 is proved. □

5. Tests on Simulated Data: Clustering Multifractional Brownian Motions

5.1. Efficiency Improvement: ${log}^{*}$ -Transformation

In this section, we show the performance of the proposed clustering approaches (Algorithms 1 and 2) on clustering simulated mBms given in Definition 3. Recall that mBm is a paradigmatic example of locally asymptotically self-similar processes. Its tangent process is an fBm, which is self-similar. Since the covariance structure of the fBm is nonlinearly dependent on its self-similarity index, we can then apply the so-called

{log}^{*}

-transformation to the covariance matrixes of its increments, in order to improve the efficiency of the clustering algorithms. More precisely, in our clustering algorithms, we replace in

\hat{d^{*}}

the coefficients of all the covariance matrixes and their estimators with their

{log}^{*}

-transformations. For

x \in R

, its

{log}^{*}

-transformation is defined to be

{log}^{*} (x) : = \{\begin{matrix} log x, & if x > 0; \\ - log (- x), & if x < 0; \\ 0, & if x = 0 . \end{matrix}

(69)

By applying such transformation, the observations assigned to any two clusters by the covariance structure ground truths become well separated thus the clustering algorithms become more efficient. For more detail on this efficiency enhancement approach, we refer the reader to Section 3 in [13].

5.2. Simulation Methodology

We select Wood–Chan’s method [51,52] to simulate mBm, and use the implementation (MATLAB) of Wood–Chan’s method in FracLab (version 2.2) by INRIA in our simulation study (https://project.inria.fr/fraclab/download/overview/, accessed on 1 January 2022). This method outputs independent sample paths of the following form:

{\{W_{H (i / n)} (\frac{i}{n})\}}_{i = 0, 1, \dots, n},

(70)

where

n \geq 1

is an input integer. Now we would select

w_{j} = 1 / (j^{2} {(j + 1)}^{2})

so that d (see (19)) is a convergent series (well-defined). To show that this choice is reasonable, we consider the stochastic process

{\{W_{H (i Δ)} (i Δ)\}}_{i = 0, 1, \dots},

(71)

where

Δ > 0

is some given mesh. For each

t_{0} = 0, Δ, 2 Δ, \dots

, the increments of the tangent process (see the right-hand side of (28))

{\{C_{H (t_{0})} n^{- H (t_{0})} B^{(H (t_{0}))} (i Δ)\}}_{i = 0, 1, \dots}

is given by: for

i = 0, 1, \dots

,

X^{(H (t_{0}))} (i Δ) = C_{H (t_{0})} n^{- H (t_{0})} (B^{(H (t_{0}))} ((i + 1) Δ) - B^{(H (t_{0}))} (i Δ)) .

As increments of fBm,

X^{(H (t_{0}))} (•)

is autocovariance ergodic. Moreover for

i, j = 0, 1, \dots

, we have, using the definition of

{log}^{*}

(see (67)), the covariance function of fBm (see (5)) and the fact that

{sup}_{s \geq 0} H (s) \leq 1

,

\begin{matrix} {log}^{*} (C o v (X^{H (t_{0}))} (i Δ), X^{(H (t_{0}))} (j Δ))) \\ = {log}^{*} (\frac{C_{H (t_{0})}^{2} Δ^{2 H (t_{0})}}{2} (| i - j - 1 |^{2 H (t_{0})} + | i - j + 1 |^{2 H (t_{0})} - 2 | i - j |^{2 H (t_{0})})) \\ = O (log (| i - j | + 1)), a s | i - j | \to + \infty . \end{matrix}

(72)

From the definition of d in (19) and (72) we can see that, by taking

w_{j} = 1 / (j^{2} {(j + 1)}^{2})

and using (72), for any

t_{0}, t_{0}^{'} = 0, Δ, 2 Δ, \dots

,

d (X^{(H (t_{0}))}, X^{(H (t_{0}^{'}))}) = O (\sum_{l, m = 1}^{+ \infty} \frac{\sum_{i, j = l}^{l + m - 1} log (| i - j | + 1)}{l^{2} {(l + 1)}^{2} m^{2} {(m + 1)}^{2}}),

where

\begin{matrix} \sum_{l, m = 1}^{+ \infty} \frac{\sum_{i, j = l}^{l + m - 1} log (| i - j | + 1)}{l^{2} {(l + 1)}^{2} m^{2} {(m + 1)}^{2}} = 2 \sum_{l, m = 1}^{+ \infty} \frac{\sum_{k = 1}^{m - 1} (m - k) log (k + 1)}{l^{2} {(l + 1)}^{2} m^{2} {(m + 1)}^{2}} \\ \leq 2 \sum_{l, m = 1}^{+ \infty} \frac{log m}{l^{2} {(l + 1)}^{2} {(m + 1)}^{2}} < + \infty . \end{matrix}

Therefore, we showed that the choice

w_{j} = 1 / (j^{2} {(j + 1)}^{2})

makes d well-defined.

5.3. Synthetic Datasets

To construct a collection of the mBm paths with distinct functional indexes

H (•)

, we set the function form of

H (•)

in each of the predetermined clusters. Two functional forms of

H (•)

are selected for synthetic data study:

Case 1 (Monotonic function): The general form is taken to be

$H (t) = 0.5 + h \cdot t / Q, t \in [0, Q],$

(73)

where $Q > 0$ is a fixed integer and different values of h correspond to different clusters. We then predetermine five clusters with various hs to separate different clusters. In this study, we set $Q = 100$ , $h_{1} = - 0.4, h_{2} = - 0.2, h_{3} = 0, h_{4} = 0.2$ and $h_{5} = 0.4$ . The trajectories of the five functional forms of $H (•)$ in different clusters are illustrated in the left graph of Figure 1.
Case 2 (Periodic function): The general form is taken to be

$H (t) = 0.5 + h \cdot sin (π t / Q), t \in [0, Q],$

(74)

where different values of h lead to different clusters. Specifically, we take $Q = 100$ , $h_{1} = 0.4, h_{2} = 0.2, h_{3} = 0, h_{4} = - 0.2$ and $h_{5} = - 0.4$ . The trajectories of the corresponding five functional forms of $H (•)$ are illustrated in the left graph of Figure 2.
Case 3 (Small turbulence on $H$ ): We proceed to examine if the proposed algorithm has the capacity of distinguishing processes with very similar behaviors. To this end we take consider clustering an fBm with Hurst parameter $H_{f}$ , and an mBm with the Hurst functional parameter

$H (t) = H_{f} + 0.1 \cdot sin (π t / Q), t \in [0, Q],$

(75)

We perform the tests using two values of $H_{f}$ : (i) $H_{f} = 0.2$ and (ii) $H_{f} = 0.8$ . In both cases, the index $H (t)$ of the corresponding mBm is regarded to be $H_{f}$ plus some noise.

We demonstrate the approximated asymptotic consistency of the proposed algorithms by conducting both offline and online clustering analysis. We denote the number of observed data points in each time series by

n (t)

, and denote the number of time series paths by

N (t)

.

In the offline setting, the number of observed paths does not depend on time t; however, the lengths do. To construct offline datasets, we perform the following:

1.: For $i = 1, \dots, 5$ , simulate 20 mBm paths in group i (corresponding to $h_{i}$ ), each path is with length of 305. Then the total number of paths $N = 100$ . To be more explicit we denote by

$S : = (\begin{matrix} x_{1, 1} & x_{1, 2} & \dots & x_{1, 305} \\ x_{2, 1} & x_{2, 2} & \dots & x_{2, 305} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{100, 1} & x_{100, 2} & \dots & x_{100, 305} \end{matrix}),$

(76)

where each row is an mBm discrete-time path. For $i = 1, \dots, 5$ , the data from the ith group are given as:

$S^{(i)} : = (\begin{matrix} x_{20 (i - 1) + 1, 1} & x_{20 (i - 1) + 1, 2} & \dots & x_{20 (i - 1) + 1, 305} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{20 i, 1} & x_{20 i, 2} & \dots & x_{20 i, 305} \end{matrix}) .$

(77)
2.: At each $t = 1, \dots, 100$ , we observe the first $n (t) = 3 t + 5$ values of each path, i.e.,

$S_{offline} (t) : = (\begin{matrix} x_{1, 1} & x_{1, 2} & \dots & x_{1, 3 t + 5} \\ x_{2, 1} & x_{2, 2} & \dots & x_{2, 3 t + 5} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{100, 1} & x_{100, 2} & \dots & x_{100, 3 t + 5} \end{matrix}) .$

The online dataset does not require observed paths to be of equal length, and can be regarded as an extension of the offline case. Introducing the online dataset aims at mimicking the situation where new time series are observed as time goes. In our simulation study, we use the following method to construct online datasets:

1.

For

i = 1, \dots, 5

, simulate 20 mBm paths in group i (corresponding to

h_{i}

), each path is with length of 305 (see (74) and (75)).

2.

At each

t = 1, \dots, 100

and

i = 1, \dots, 5

, we observe the following dataset in the ith group:

S_{online}^{(i)} (t) : = (\begin{matrix} {\tilde{x}}_{1, 1} & {\tilde{x}}_{1, 2} & \dots & \dots & \dots & \dots & {\tilde{x}}_{1, n_{1} (t)} \\ {\tilde{x}}_{2, 1} & {\tilde{x}}_{2, 2} & \dots & \dots & \dots & {\tilde{x}}_{2, n_{2} (t)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\tilde{x}}_{N_{i} (t), 1} & {\tilde{x}}_{N_{i} (t), 2} & \dots & {\tilde{x}}_{N_{i} (t), n_{N_{i} (t)} (t)} \end{matrix}),

where

${\tilde{x}}_{k, l}$ s are the $(k, l)$ -coefficients in $S^{(i)}$ given in (75).
$N_{i} (t) : = 6 + ⌊ (t - 1) / 10 ⌋$ denotes the number of paths in the ith group. Here $⌊ • ⌋$ denotes the floor number. That is, starting from 6 paths in each group, 1 new path will be added into each group as t increases by 10.
$n_{l} (t) : = 3 {(t - {(l - 6)}^{+})}^{+} + 5$ , with ${(•)}^{+} : = max (•, 0)$ . This means each path observes three new values as t increases by 1.

Since at each time t, the covariance structure ground truth being known, we can then evaluate the clustering performance in terms of the so-called “misclassification rate”, which is calculated based on averaging the proportion of mis-clustered paths in each scenario. For more detail on this notion of misclassification rate we refer the readers to Section 4 in [13].

5.4. Experimental Results

We demonstrate the asymptotic consistency of our clustering algorithms by computing the misclassification rates using simulated offline and online datasets. Below we summarize the simulation study results.

Case 1 (Monotonic function):

When

H (•)

s are chosen to be give monotonic functionals of the form (73) (see the left graph in Figure 1), the right graph in Figure 1 illustrates the behavior of the misclassification rates corresponding to Algorithm 1 applied to offline data setting (solid line), and Algorithm 2 applied to online data setting (dashed line). From this result we observe the following:

(1): Both algorithms attempt to be consistent in their circumstances, as the time t increases, in the sense that the corresponding misclassification rates decrease to 0.
(2): Clustering mBms are asymptotically equivalent to clustering their tangent processes’ increments.
(3): The online algorithm seems to have an overall better performance: its misclassification rates are 5– $10 %$ lower than that of offline algorithm. The reason may be that at early time steps the differences among the $H (•)$ s are not significant. Unlike the offline clustering algorithm, the online one is flexible enough to catch these small differences.

Case 2 (Periodic function):

The same converging behaviors are found in the case of periodic functional form of

H (•)

as specified in (74). Their trajectories are illustrated in the left graph of Figure 2. The clustering performance shown in the right graph of Figure 2 indicates the following:

(1): Both misclassification rates of the clustering algorithms have generally a declining trend as time increases.
(2): As the differences among the periodic function $H (•)$ s values go up and down, the misclassification rates go down and up accordingly.
(3): The online clustering algorithm has an overall worse performance than the offline one. This may be because starting from $t = 20$ the differences among $H (•)$ s become significantly large. In this situation, the offline clustering algorithm can better catch these differences, since it has a larger sample size (20 paths in each group) than the online one.

Case 3 (Small turbulence on $H$ ):

We continue the simulation analysis using the case where we intend to distinguish the classical fBm process with the mBm process with similar function H. We take two values of

H_{f}

(as a constant) for the fBm: (i)

H_{f} = 0.2

and (ii)

H_{f} = 0.8

. In both cases, the corresponding mBm has the H function slightly different with

H_{f}

. The functional form of H for mBm is specified in Equation (75).

The clustering results are presented in Figure 3. We find general convergence performance in both cases (for

H_{f} = 0.2, 0.8

). More specifically, the online algorithm outperforms the offline algorithm for both datasets. The misclassification rate is lower than 20% when

t \geq 40

for the online algorithm results. In both cases, as the difference between the two Hurst parameters goes up and down, we see the corresponding misclassification rates go down and up, which is in line with the consistency of our clustering algorithms. When

H_{f} = 0.8

, the percent changes between the Hurst parameters of fBm and mBm are smaller than those for

H_{f} = 0.2

, therefore the performance of clustering presented in the right graph is slightly worse. In conclusion, the proposed algorithms demonstrate a robust convergence feature when clustering fBm and mBm with similar Hurst parameters.

Please note that for each pair of paths with length

n (t)

, we took

K = n (t) - 2

and

L = 1

in

\hat{d^{*}}

; however, any other value of K could be taken. We have provided easily readable and editable MATLAB codes of the proposed algorithms and simulation study replications. All the codes used in this section can be found publicly online (https://github.com/researchcoding/clustering_locally_asymptotically_self_similar_processes/, accessed on 1 January 2022).

5.5. Comparison to Traditional Approaches Designed for Clustering Finite-Dimensional Vectors

In this section, we use the benchmark methods to cluster the simulated dataset proposed in Section 5.3 and compare them to our novel approach. For the specification of the benchmark method, we select K-means and hierarchical clustering with Euclidean distance, which are commonly used in clustering analysis of non-stochastic vectors or processes. First of all, performing both approaches is more time-consuming than our clustering algorithms. More importantly, the misclassification rates from these two traditional algorithms are plotted in Figure 4 and Figure 5.

As opposed to the performance shown in Figure 1 and Figure 2, the benchmark algorithms do not converge when clustering locally asymptotically self-similar processes. Though the numbers of clusters are clearly predetermined in both algorithms, none of the benchmark algorithm demonstrates declining misclassification rate with the increased number of observations. Figure 4 shows the diverging performance of the clustering algorithm with an increased then flat misclassification over time. Figure 5 shows almost flat misclassification rates. None of the results shows a misclassification rate lower than 60%.

From the simulation analysis, we conclude that the conventional approaches have failed to cluster the locally asymptotically self-similar processes (with different functional forms of

H (•)

) in this particular case. The proposed methods in this paper are tailored to cluster the locally asymptotically self-similar stochastic processes with known number of clusters.

6. Real World Application: Clustering Global Financial Markets

6.1. Motivation

In this section, we apply the proposed clustering algorithm on the real world datasets. We select a global equity return data to perform the clustering analysis. Conventionally, stock returns of countries in the same region are considered to have similar patterns due to the common regional economic factors. In other words, economic entities in close geographical distance have potentially more trades and other economic ties. Therefore, these countries are influenced by similar economic and finance factors. However, recent empirical evidence regarding the financial markets show that globalization has broken the geographical barrier and creates new economic factors in contrast to the geographical ones. As a result, global economic clusters switch from “geographical centriods” to “economic development centriods”. Emerging markets demonstrate increasingly similar financial market patterns and correlations [53], whereas developed economic entities share increasing financial market similarity [54].

Bianchi and Pianese [15] pioneered the stock returns modeling using locally asymptotically self-similar processes (e.g., mBm process). Bianchi et al. [18] and Peng and Zhao [27] provide empirical support on this time-varying self-similar feature of the financial time series. Given this fact, the financial market index is a perfect example of stochastic processes, where our proposed clustering algorithms are applicable. We examine the connection of global financial markets by answering as to whether economic entities are well-behaved and clustered by geographical distribution or by development level.

6.2. Data and Methodology

In empirical cluster analysis, we use two asset classes from global financial markets: the country-level stock index and the sovereign credit default swap spread. The stock index return captures the upside (growth) characteristics, and the sovereign CDS spread captures the downside (credit risk) characteristics of underlying economic entities. We provide a more detailed description below.

Equity indexes returns: We cluster the global stock indexes based on their empirical time-varying covariance structure. We use Algorithms 1 and 2 as the clustering approach. We select the index constituents of MSCI ACWI (All Country World Index) as the underlying stochastic processes in the datasets for clustering analysis. Each of the indexes is a realized path representing the historical monthly returns of the underlying economic entities. MSCI ACWI is the leading global equity market index and covers more than 85% of the market capitalization of the global stock market (As of December 2018, as reported on https://www.msci.com/acwi, accessed on 1 January 2022).
Sovereign CDS spreads: We cluster the sovereign credit default swap (CDS) spreads of global economic entities. The sovereign CDS is an insurance-like financial product that provides default protection of treasury bonds for the economic entity (e.g., the government). The CDS spread reflects the cost to insurer of the exposure on a sovereign entity’s default. We select a five-year sovereign CDS spread as the indicator of sovereign credit risk, as the five-year product has the best liquidity on the CDS market. We overlap the sample of economic entities between the stock and CDS datasets, and the same set of underlying economics entities are present in the clustering analysis. Our CDS data source is Bloomberg.

The extant literature [15,18,27] shows that these types of financial time series demonstrate a “long memory” path feature. These stochastic processes are naturally modeled as locally asymptotically self-similar processes such as fBm and mBm. In the fBm and mBm cases, the proposed algorithms are applicable to cluster the stock returns and CDS spreads. Similar to Section 5, we cluster the increments of the indexes returns with the

{log}^{*}

-transformed dissimilarity measure. The data sample consists of stock indexes and sovereign CDS spreads including 23 developed economic entities and 24 emerging markets. The detailed constituents are present in Table 1 from the data source at https://www.msci.com/acwi, accessed on 1 January 2022. The geographical regions include the Americas, EMEA (Europe, Middle East and Africa), the Pacific, and Asia.

We construct both offline and online datasets using monthly returns of stock indexes and monthly spreads of sovereign CDS. The offline dataset of the monthly return of stock index data starts in June 2005, includes the financial crisis period in September 2007, and ends in November 2019. We include the global financial market crisis to present the systematic risk contagion effect in our clustering analysis. The online dataset starts on January 1989, which covers the 1997 Asian financial crisis, 2003 dot-com bubble andthe 2007 subprime mortgage crisis, and ends in November 2019. The offline dataset contains 47 observed paths, and each has 174 time points. In the online setting, the longest stochastic path has 371 observations, and the shortest time series have 174 observations. The online dataset begins with 33 economic entities and ends with 47 economic entities.

As for monthly sovereign CDS spreads, we remove the Netherlands, Qatar, Singapore, Greece and United Arabic from the sample due to insufficient observations. We end up with 42 economic entities with CDS spread in the clustering analysis. The offline dateset starts in June 2005 and ends in November 2019, and each economic entity has 174 observations.

6.3. Clustering Results

We compare the clustering results using offline and online datasets with the number of clusters based on (i) geographical regions (four groups: Americas, Europe and the Middle East, the Pacific, and Asia) and (ii) development level (two groups: emerging markets and developed markets). The clustering factor (geographical region or development level) with the lower misclassification rate represents the most relevant partition of the economics entities. We examine whether the geographical region or development level differentiates between the financial market’s behaviors the most.

Table 2 shows that the misclassification rates for clustering using development levels are significantly and consistently lower than that of the geographical region. The empirical result is robust, using offline or online settings and using stock indexes or sovereign CDS spreads. The clustering performance tends to support the dominance of the development level in comparison to the geographical region, in terms of characterizing the financial market features in recent decades. The best results (lowest misclassification rate) are (i) clustering the offline dataset using offline algorithm, and (ii) clustering the online dataset using an online algorithm, when the cluster factor is using the development level (emerging markets vs. developed markets). There are 30% to 50% decreases on misclassification rates when clustering using development level compared to when clustering using region.

Table 3 presents misclassified economic entities in the cluster analysis, when we use the development level as the clustering factor. For global stock indexes, the misclassification often occurs in the case where developed economic entities is clustered into the emerging market group. Stock index returns from Austria, Finland, and Portugal are clustered in the emerging market group in both offline and online algorithms. The misclassification within the developed market group demonstrates a more random pattern. For sovereign CDS spreads, the misclassification occurs in the cases where emerging economic entities are clustered into the developed market group. Sovereign CDS spreads of Chile, China (Mainland), Czech Republic, Korea, Malaysia, Mexico, Poland, and Thailand are consistently misclassified into the developed market group. A plausible reason is that these economic entities have low sovereign credit risk compared to their peers in the most recent years.

In both stock and CDS cases, we show that clustering financial time series by development level outperforms geographical region in our sample. This empirical evidence supports that economic globalization is breaking the geographic barrier and enhances the comovement of financial market performance by economics development status.

7. Conclusions and Future Prospects

In this paper, we introduce the problem of clustering locally asymptotically self-similar processes. A novel covariance-based dissimilarity measure is proposed to obtain approximately asymptotically consistent clustering algorithms for both offline and online settings. We showed that the recommended algorithms are competitive for at least three reasons:

(1): Given their flexibility, our algorithms are applicable to clustering any distribution stationary ergodic processes with finite variances, any autocovariance ergodic processes, and locally asymptotically self-similar processes whose tangent processes have autocovariance ergodic increments. Multifractional Brownian motion (mBm) is an excellent representative of the latter class of processes.
(2): Our algorithms are efficient enough in terms of their computational complexity. A simulation study is performed on clustering mBm. The results show that both offline and online algorithms are approximately asymptotically consistent.
(3): Our algorithms are successfully applied to cluster the real world financial time series (equity returns and sovereign CDS spreads) via the development level and via regions. The outcomes are self-consistent with the financial markets behavior and they reveal the level of impact between the economic development and regions on equity returns.

Finally, we list the following open problems which could be left for future research.

(1): The clustering framework proposed in our paper only focuses on the cases where the true number of clusters $κ$ is known. The problem for which $κ$ is supposed to be unknown remains open.
(2): If we drop the Gaussianity assumption, the class of stationary incremental self-similar processes becomes much larger. This will yield an introduction to a more general class of locally asymptotically self-similar processes, whose autocovariances do not exist. This class includes linear multifractional stable motion [55,56] as a paradigmatic example. Cluster analysis of such stable processes will no doubt lead to a wide range of applications, especially when the process distributions exhibit heavy-tailed phenomena. Neither the distribution dissimilarity measure introduced in [12] nor the covariance-based dissimilarity measures used in this paper would work in this case, hence new techniques are required to cluster such processes, such as considering replacing the covariances with covariations [35] or symmetric covariations [57] in the dissimilarity measures.

Author Contributions

Conceptualization and Methodology: N.R. and Q.P.; Formal analysis and investigation: N.R., Q.P. and R.Z.; Writing: N.R., Q.P. and R.Z.; Data collection and analysis: R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in the simulation study are available at https://github.com/researchcoding/clustering_locally_asymptotically_self_similar_processes/, accessed on 1 January 2022. The datasets used in the real world application are available at https://www.msci.com/acwi, accessed on 1 January 2022.

Acknowledgments

The authors would like to express their special thanks to the assistant editor Finley Shen and the three anonymous reviewers for their careful reading of the manuscript and many insightful feedback and suggestions, which have led to a significant improvement of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

fBm	fractional Brownian motion
mBm	multifractional Brownian motion
gBm	geometric Brownian motion

References

Cotofrei, P. Statistical temporal rules. In Proceedings of the 15th Conference on Computational Statistics–Short Communications and Posters, Berlin, Germany, 24–28 August 2002. [Google Scholar]
Harms, S.K.; Deogun, J.; Tadesse, T. Discovering sequential association rules with constraints and time lags in multiple sequences. In International Symposium on Methodologies for Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2002; pp. 432–441. [Google Scholar]
Jin, X.; Lu, Y.; Shi, C. Distribution discovery: Local analysis of temporal rules. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2002; pp. 469–480. [Google Scholar]
Jin, X.; Wang, L.; Lu, Y.; Shi, C. Indexing and mining of the local patterns in sequence database. In International Conference on Intelligent Data Engineering and Automated Learning; Springer: Berlin/Heidelberg, Germany, 2002; pp. 68–73. [Google Scholar]
Keogh, E.; Kasetty, S. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Min. Knowl. Discov. 2003, 7, 349–371. [Google Scholar] [CrossRef]
Lin, J.; Keogh, E.; Lonardi, S.; Patel, P. Finding motifs in time series. In Proceedings of the 2nd Workshop on Temporal Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 53–68. [Google Scholar]
Li, C.S.; Yu, P.S.; Castelli, V. MALM: A framework for mining sequence database at multiple abstraction levels. In Proceedings of the Seventh International Conference on Information and Knowledge Management, Bethesda, MA, USA, 2–7 November 1998; pp. 267–272. [Google Scholar]
Bradley, P.S.; Fayyad, U.M. Refining Initial Points for K-Means Clustering. ICML 1998, 98, 91–99. [Google Scholar]
Keogh, E.; Chakrabarti, K.; Pazzani, M.; Mehrotra, S. Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 2001, 3, 263–286. [Google Scholar] [CrossRef]
Java, A.; Perlman, E.S. Predictive Mining of Time Series Data. Bull. Am. Astron. Soc. 2002, 34, 741. [Google Scholar]
Khaleghi, A.; Ryabko, D.; Mary, J.; Preux, P. Online clustering of processes. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands, 21–23 April 2012; pp. 601–609. [Google Scholar]
Khaleghi, A.; Ryabko, D.; Mary, J.; Preux, P. Consistent algorithms for clustering time series. J. Mach. Learn. Res. 2016, 17, 1–32. [Google Scholar]
Peng, Q.; Rao, N.; Zhao, R. Covariance-based dissimilarity measures applied to clustering wide-sense stationary ergodic processes. Mach. Learn. 2019, 108, 2159–2195. [Google Scholar] [CrossRef] [Green Version]
Comte, F.; Renault, E. Long memory in continuous-time stochastic volatility models. Math. Financ. 1998, 8, 291–323. [Google Scholar] [CrossRef]
Bianchi, S.; Pianese, A. Multifractional properties of stock indices decomposed by filtering their pointwise Hölder regularity. Int. J. Theor. Appl. Financ. 2008, 11, 567–595. [Google Scholar] [CrossRef]
Bianchi, S.; Pantanella, A.; Pianese, A. Modeling and simulation of currency exchange rates using multifractional process with random exponent. Int. J. Model. Optim. 2012, 2, 309–314. [Google Scholar] [CrossRef]
Bertrand, P.R.; Hamdouni, A.; Khadhraoui, S. Modelling NASDAQ series by sparse multifractional Brownian motion. Methodol. Comput. Appl. Probab. 2012, 14, 107–124. [Google Scholar] [CrossRef]
Bianchi, S.; Pantanella, A.; Pianese, A. Modeling stock prices by multifractional Brownian motion: An improved estimation of the pointwise regularity. Quant. Financ. 2013, 13, 1317–1330. [Google Scholar] [CrossRef]
Bianchi, S.; Frezza, M. Fractal stock markets: International evidence of dynamical (in) efficiency. Chaos Interdiscip. J. Nonlinear Sci. 2017, 27, 071102. [Google Scholar] [CrossRef] [PubMed]
Pianese, A.; Bianchi, S.; Palazzo, A.M. Fast and unbiased estimator of the time-dependent Hurst exponent. Chaos Interdiscip. J. Nonlinear Sci. 2018, 28, 031102. [Google Scholar] [CrossRef] [PubMed]
Marquez-Lago, T.T.; Leier, A.; Burrage, K. Anomalous diffusion and multifractional Brownian motion: Simulating molecular crowding and physical obstacles in systems biology. IET Syst. Biol. 2012, 6, 134–142. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Song, W.; Cattani, C.; Chi, C.H. Multifractional Brownian motion and quantum-behaved particle swarm optimization for short term power load forecasting: An integrated approach. Energy 2020, 194. [Google Scholar] [CrossRef]
Sikora, G. Statistical test for fractional Brownian motion based on detrending moving average algorithm. Chaos Solitons Fractals 2018, 116, 54–62. [Google Scholar] [CrossRef] [Green Version]
Balcerek, M.; Burnecki, K. Testing of fractional Brownian motion in a noisy environment. Chaos Solitons Fractals 2020, 140, 110097. [Google Scholar] [CrossRef]
Balcerek, M.; Burnecki, K. Testing of multifractional Brownian motion. Entropy 2020, 22, 1403. [Google Scholar] [CrossRef]
Jin, S.; Peng, Q.; Schellhorn, H. Estimation of the pointwise Hölder exponent of hidden multifractional Brownian motion using wavelet coefficients. Stat. Inference Stoch. Process. 2018, 21, 113–140. [Google Scholar] [CrossRef] [Green Version]
Peng, Q.; Zhao, R. A general class of multifractional processes and stock price informativeness. Chaos Solitons Fractals 2018, 115, 248–267. [Google Scholar] [CrossRef] [Green Version]
Vu, H.T.; Richard, F.J. Statistical tests of heterogeneity for anisotropic multifractional Brownian fields. Stoch. Process. Their Appl. 2020, 130, 4667–4692. [Google Scholar] [CrossRef] [Green Version]
Bicego, M.; Trudda, A. 2D shape classification using multifractional Brownian motion. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR); Springer: Berlin/Heidelberg, Germany, 2008; pp. 906–916. [Google Scholar]
Kirichenko, L.; Radivilova, T.; Bulakh, V. Classification of fractal time series using recurrence plots. In Proceedings of the 2018 International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S&T), Kharkiv, Ukraine, 9–12 October 2018; pp. 719–724. [Google Scholar]
Krengel, U. Ergodic Theorems; de Gruyter Studies in Mathematics; de Gruyter: Berlin, Germany, 1985. [Google Scholar]
Grazzini, J. Analysis of the emergent properties: Stationarity and ergodicity. J. Artif. Soc. Soc. Simul. 2012, 15, 7. [Google Scholar] [CrossRef]
Samorodnitsky, G. Extreme value theory, ergodic theory and the boundary between short memory and long memory for stationary stable processes. Ann. Probab. 2004, 32, 1438–1468. [Google Scholar] [CrossRef] [Green Version]
Boufoussi, B.; Dozzi, M.; Guerbaz, R. Path properties of a class of locally asymptotically self-similar processes. Electron. J. Probab. 2008, 13, 898–921. [Google Scholar] [CrossRef]
Samorodnitsky, G.; Taqqu, M.S. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance; Chapman & Hall: New York, NY, USA, 1994. [Google Scholar]
Embrechts, P.; Maejima, M. An introduction to the theory of self-similar stochastic processes. Int. J. Mod. Phys. B 2000, 14, 1399–1420. [Google Scholar] [CrossRef]
Embrechts, P.; Maejima, M. Selfsimilar Processes; Princeton Series in Applied Mathematics; Princeton University Press: Princeton, NJ, USA, 2002. [Google Scholar]
Falconer, K. Tangent fields and the local structure of random fields. J. Theor. Probab. 2002, 15, 731–750. [Google Scholar] [CrossRef]
Falconer, K. The local structure of random processes. J. Lond. Math. Soc. 2003, 67, 657–672. [Google Scholar] [CrossRef]
Mandelbrot, B.B.; Van Ness, J.W. Fractional Brownian motions, fractional noises and applications. SIAM Rev. 1968, 10, 422–437. [Google Scholar] [CrossRef]
Peltier, R.F.; Lévy-Véhel, J. Multifractional Brownian Motion: Definition and Preliminary Results; Technical Report 2645; Institut National de Recherche en Informatique et en Automatique, INRIA: Roquecourbe, France, 1995. [Google Scholar]
Benassi, A.; Jaffard, S.; Roux, D. Elliptic Gaussian random processes. Revista Matemática Iberoamericana 1997, 13, 19–90. [Google Scholar] [CrossRef] [Green Version]
Stoev, S.A.; Taqqu, M.S. How rich is the class of multifractional Brownian motions? Stoch. Process. Their Appl. 2006, 116, 200–221. [Google Scholar] [CrossRef] [Green Version]
Ayache, A.; Véhel, J.L. On the identification of the pointwise Hölder exponent of the generalized multifractional Brownian motion. Stoch. Process. Their Appl. 2004, 111, 119–156. [Google Scholar] [CrossRef] [Green Version]
Ayache, A.; Taqqu, M.S. Multifractional processes with random exponent. Publicacions Matemàtiques 2005, 49, 459–486. [Google Scholar] [CrossRef] [Green Version]
Bianchi, S.; Pantanella, A. Pointwise regularity exponents and well-behaved residuals in stock markets. Int. J. Trade Econ. Financ. 2011, 2, 52–60. [Google Scholar] [CrossRef] [Green Version]
Cadoni, M.; Melis, R.; Trudda, A. Financial crisis: A new measure for risk of pension fund portfolios. PLoS ONE 2015, 10, e0129471. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Frezza, M. Modeling the time-changing dependence in stock markets. Chaos Solitons Fractals 2012, 45, 1510–1520. [Google Scholar] [CrossRef]
Frezza, M.; Bianchi, S.; Pianese, A. Forecasting Value-at-Risk in turbulent stock markets via the local regularity of the price process. Comput. Manag. Sci. 2022, 19, 99–132. [Google Scholar] [CrossRef]
Garcin, M. Fractal analysis of the multifractality of foreign exchange rates. Math. Methods Econ. Financ. 2020, 13–14, 49–73. [Google Scholar]
Wood, A.T.; Chan, G. Simulation of stationary Gaussian processes in [0,1]^d. J. Comput. Graph. Stat. 1994, 3, 409–432. [Google Scholar] [CrossRef]
Chan, G.; Wood, A.T. Simulation of multifractional Brownian motion. In COMPSTAT; Springer: Berlin/Heidelberg, Germany, 1998; pp. 233–238. [Google Scholar]
Demirer, R.; Omay, T.; Yuksel, A.; Yuksel, A. Global risk aversion and emerging market return comovements. Econ. Lett. 2018, 173, 118–121. [Google Scholar] [CrossRef]
Ang, A.; Longstaff, F.A. Systemic sovereign credit risk: Lessons from the US and Europe. J. Monet. Econ. 2013, 60, 493–510. [Google Scholar] [CrossRef]
Stoev, S.; Taqqu, M.S. Stochastic properties of the linear multifractional stable motion. Adv. Appl. Probab. 2004, 36, 1085–1115. [Google Scholar] [CrossRef]
Stoev, S.; Taqqu, M.S. Path properties of the linear multifractional stable motion. Fractals 2005, 13, 157–178. [Google Scholar] [CrossRef]
Ding, Y.; Peng, Q. Series representation of jointly SαS distribution via a new type of symmetric covariations. Commun. Math. Stat. 2021, 9, 203–238. [Google Scholar] [CrossRef]

$Fractalfract 06 00222 g001 550$

Figure 1. The functional form of

H (•)

follows Equation (73):

H (t) = 0.5 + h_{i} \cdot t / 100

with

t = 0, 1, \dots, 100

. The left graph plots

H (•)

correspond to 5 different clusters. The right graph illustrates the misclassification rates output by (i) offline algorithm on offline dataset (solid line) and (ii) online algorithm on online dataset (dashed line). Both algorithms are performed based on the

{log}^{*}

- transformed covariance-based dissimilarity measure.

Figure 1. The functional form of

H (•)

follows Equation (73):

H (t) = 0.5 + h_{i} \cdot t / 100

with

t = 0, 1, \dots, 100

. The left graph plots

H (•)

correspond to 5 different clusters. The right graph illustrates the misclassification rates output by (i) offline algorithm on offline dataset (solid line) and (ii) online algorithm on online dataset (dashed line). Both algorithms are performed based on the

{log}^{*}

- transformed covariance-based dissimilarity measure.

$Fractalfract 06 00222 g001$

$Fractalfract 06 00222 g002 550$

Figure 2. The functional form of

H (•)

follows Equation (74):

H (t) = 0.5 + h_{i} \cdot sin (π t / 100)

with

t = 0, 1, \dots, 100

. The left graph plots

H (•)

corresponding to 5 different clusters. The right graph illustrates the misclassification rates output by (i) offline algorithm on offline dataset (solid line) and (ii) online algorithm on online dataset (dashed line). Both algorithms are performed based on the

{log}^{*}

- transformed covariance-based dissimilarity measure.

Figure 2. The functional form of

H (•)

follows Equation (74):

H (t) = 0.5 + h_{i} \cdot sin (π t / 100)

with

t = 0, 1, \dots, 100

. The left graph plots

H (•)

corresponding to 5 different clusters. The right graph illustrates the misclassification rates output by (i) offline algorithm on offline dataset (solid line) and (ii) online algorithm on online dataset (dashed line). Both algorithms are performed based on the

{log}^{*}

- transformed covariance-based dissimilarity measure.

$Fractalfract 06 00222 g002$

$Fractalfract 06 00222 g003 550$

Figure 3. The clustering performance using fBm with

H_{f}

and mBm with

H (•)

given in Equation (74):

H (t) = H_{f} + 0.1 \cdot sin (π t / 100)

with

t = 0, 1, \dots, 100

. The left graph is the case where

H_{f} = 0.2

, and the right graph is the case where

H_{f} = 0.8

. Both graphs illustrate the misclassification rates output by

(i)

offline algorithm on offline dataset (solid line) and

(i i)

online algorithm on online dataset (dashed line). Both algorithms are performed based on the

{log}^{*}

- transformed covariance-based dissimilarity measure.

Figure 3. The clustering performance using fBm with

H_{f}

and mBm with

H (•)

given in Equation (74):

H (t) = H_{f} + 0.1 \cdot sin (π t / 100)

with

t = 0, 1, \dots, 100

. The left graph is the case where

H_{f} = 0.2

, and the right graph is the case where

H_{f} = 0.8

. Both graphs illustrate the misclassification rates output by

(i)

offline algorithm on offline dataset (solid line) and

(i i)

online algorithm on online dataset (dashed line). Both algorithms are performed based on the

{log}^{*}

- transformed covariance-based dissimilarity measure.

$Fractalfract 06 00222 g003$

$Fractalfract 06 00222 g004 550$

Figure 4. The functional form of

H (•)

follows Equation (73):

H (t) = 0.5 + h_{i} \cdot t / 100

with

t = 0, 1, \dots, 100

. The left graph plots the misclassification rates output by K-means and hierarchical clustering methods using the offline dataset. The right graph shows the misclassification rates using the online dataset.

Figure 4. The functional form of

H (•)

follows Equation (73):

H (t) = 0.5 + h_{i} \cdot t / 100

with

t = 0, 1, \dots, 100

. The left graph plots the misclassification rates output by K-means and hierarchical clustering methods using the offline dataset. The right graph shows the misclassification rates using the online dataset.

$Fractalfract 06 00222 g004$

$Fractalfract 06 00222 g005 550$

Figure 5. The functional form of

H (•)

follows Equation (74):

H (t) = 0.5 + h_{i} \cdot sin (π t / 100)

with

t = 0, 1, \dots, 100

. The left graph plots the misclassification rates output by K-means and hierarchical clustering methods using the offline dataset. The right graph shows the misclassification rates using the online dataset.

Figure 5. The functional form of

H (•)

follows Equation (74):

H (t) = 0.5 + h_{i} \cdot sin (π t / 100)

with

t = 0, 1, \dots, 100

. The left graph plots the misclassification rates output by K-means and hierarchical clustering methods using the offline dataset. The right graph shows the misclassification rates using the online dataset.

$Fractalfract 06 00222 g005$

Table 1. The categories of major stock and sovereign CDS markets in the MSCI ACWI (All Country World Index). There are 23 developed economic entities and 24 emerging countries or areas. The geographical regions contain Americas, EMEA (Europe, Middle East and Africa), Pacific and Asia. Markets with * have missing sovereign CDS data in our data sample.

Developed Markets			Emerging Markets
Americas	Europe & Middle East	Pacific	Americas	Europe & Middle East & Africa	Asia
Canada	Austria	Australia	Brazil	Czech Republic	China (Mainland)
USA	Belgium	Hong Kong	Chile	Greece *	India
	Denmark	Japan	Colombia	Hungary	Indonesia
	Finland	New Zealand	Mexico	Poland	Korea
	France	Singapore *	Peru	Russia	Malaysia
	Germany			Turkey	Pakistan
	Ireland			Egypt	Philippines
	Israel			South Africa	Taiwan
	Italy			Qatar *	Thailand
	The Netherlands *			United Arab Emirates *
	Norway
	Portugal
	Spain
	Sweden
	Switzerland
	United Kingdom

Source: MSCI ACWI (All Country World Index) market allocation. https://www.msci.com/acwi, accessed on 1 January 2022.

Table 2. The misclassification rates of clustering algorithms on different datasets. The clustering facts are geographical region and development level, respectively. Panel A presents the results from clustering global stock indexes, and Panel B presents the results from clustering sovereign CDS spreads.

Panel A	Offline Algorithm		Online Algorithm
Stock Returns	Regions	Emerging/Developed	Regions	Emerging/Developed
offline dataset	61.70%	29.79%	55.32%	36.17%
online dataset	53.19%	44.68%	51.06%	14.89%
Panel B	Offline Algorithm		Online Algorithm
CDS Spreads	Regions	Emerging/Developed	Regions	Emerging/Developed
offline dataset	64.29%	28.57%	71.43%	26.19%
online dataset	54.76%	47.62%	59.52%	26.19%

Table 3. The misclassification cases when using offline algorithm on offline dataset and online algorithm on online dataset. Panel A reports the mis-categorized economics entities in the stock index clustering case, and Panel B reports the mis-categorized economics entities in the sovereign CDS spreads clustering case. The algorithm clusters the dataset into two groups: emerging market group and developed market group. The mis-categorized outcome are reported, where (i) entities from developed markets incorrectly clusters in emerging market, or (ii) vice versa.

Panel A: Equity Indexes Returns
Group 1 (Emerging Markets)		Group 2 (Developed Markets)
Incorrect-Offline	Incorrect-Online	Incorrect-Offline	Incorrect-Online
Austria	Austria	Korea	Czech Republic
Finland	Finland	Chile	Qatar
Germany	Portugal	Philippines	Peru
Ireland		Malaysia	South Africa
Italy		Mexico
Norway
Portugal
Spain
New Zealand
Panel B: Sovereign CDS Spreads
Group 1 (Emerging Markets)		Group 2 (Developed Markets)
Incorrect-Offline	Incorrect-Online	Incorrect-Offline	Incorrect-Online
Ireland	Ireland	Chile	Chile
Italy	Portugal	China (Mainland)	China (Mainland)
Portugal		Czech Republic	Czech Republic
Spain		Korea	Hungary
		Malaysia	Korea
		Mexico	Malaysia
		Poland	Mexico
		Thailand	Poland
			Thailand

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rao, N.; Peng, Q.; Zhao, R. Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters. Fractal Fract. 2022, 6, 222. https://doi.org/10.3390/fractalfract6040222

AMA Style

Rao N, Peng Q, Zhao R. Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters. Fractal and Fractional. 2022; 6(4):222. https://doi.org/10.3390/fractalfract6040222

Chicago/Turabian Style

Rao, Nan, Qidi Peng, and Ran Zhao. 2022. "Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters" Fractal and Fractional 6, no. 4: 222. https://doi.org/10.3390/fractalfract6040222

Article Menu

Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters

Abstract

1. Introduction

2. A Class of Locally Asymptotically Self-Similar Processes

3. Clustering Stochastic Processes

3.1. Covariance-Based Dissimilarity Measure between Autocovariance Ergodic Processes

3.2. Covariance-Based Dissimilarity Measure between Locally Asymptotically Self-Similar Processes

4. Approximately Asymptotically Consistent Algorithms

4.1. Offline and Online Algorithms

4.2. Computational Complexity and Consistency of the Algorithms

5. Tests on Simulated Data: Clustering Multifractional Brownian Motions

5.1. Efficiency Improvement: ${log}^{*}$ -Transformation

5.2. Simulation Methodology

5.3. Synthetic Datasets

5.4. Experimental Results

5.5. Comparison to Traditional Approaches Designed for Clustering Finite-Dimensional Vectors

6. Real World Application: Clustering Global Financial Markets

6.1. Motivation

6.2. Data and Methodology

6.3. Clustering Results

7. Conclusions and Future Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Cluster Analysis on Locally Asymptotically Self-Similar Processes with Known Number of Clusters

Abstract

1. Introduction

2. A Class of Locally Asymptotically Self-Similar Processes

3. Clustering Stochastic Processes

3.1. Covariance-Based Dissimilarity Measure between Autocovariance Ergodic Processes

3.2. Covariance-Based Dissimilarity Measure between Locally Asymptotically Self-Similar Processes

4. Approximately Asymptotically Consistent Algorithms

4.1. Offline and Online Algorithms

4.2. Computational Complexity and Consistency of the Algorithms

5. Tests on Simulated Data: Clustering Multifractional Brownian Motions

5.1. Efficiency Improvement: log * -Transformation

5.2. Simulation Methodology

5.3. Synthetic Datasets

5.4. Experimental Results

5.5. Comparison to Traditional Approaches Designed for Clustering Finite-Dimensional Vectors

6. Real World Application: Clustering Global Financial Markets

6.1. Motivation

6.2. Data and Methodology

6.3. Clustering Results

7. Conclusions and Future Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. Efficiency Improvement: ${log}^{*}$ -Transformation