1. Introduction
As a common data type, time series is a sequence of discrete data obtained from a target with a fixed frequency in a period. A fundamental task regarding the time series is to measure the similarity between two given ones, which is critical to downstream works in terms of classification [
1,
2,
3,
4,
5], clustering [
6,
7,
8,
9,
10] and pattern recognition [
11,
12,
13,
14]. The dynamic time warping (DTW) [
15] algorithm and its variants [
16,
17,
18] are competent in similarity evaluation [
19].
Given series
X and
Y, if they are of the same length
N, then the similarity
S could be described as Expression (1).
where
stands for the Euclidean distance,
and
are the
ith node of
X and
Y, respectively. However, more generally, the length of
X and
Y may not be the same. A key feature of DTW is that it can deal with two series of different lengths.
Let
N and
M be the length of
X and
Y, respectively; DTW finds the similarity by maintaining a two-dimensional cumulative distance matrix (CDM)
D as shown in Expression (2). The algorithm calculates each element of
D in row-major order (i.e., from left to right, from top to bottom), which starts from
till
according to Expression (3).
where
is the distance between two nodes. After the traversal,
will hold the value of the similarity. The matching results (or the optimal warping path in other words) could be determined according to the CDM.
For the evaluation of series with different lengths, as depicted in
Figure 1, DTW aims to find the optimal alignment between
X and
Y [
20], and a node in
X may be matched with multiple nodes in
Y (and vice versa). However, if too many nodes (marked within a green dotted circle in
Figure 1) are matched with the same one (marked within a red solid circle in
Figure 1) which is unreasonable in a real case, it is referred to as the well-known pathological alignment problem of DTW.
To solve that, Zhang et al. [
21] presented a state-of-the-art method named dynamic time warping under limited warping path length (LDTW). By limiting the length of the warping path in a third dimension (see
Figure 2), the pathological alignment problem could be relieved. As a result, LDTW boosts the accuracy against other variants [
22,
23,
24,
25] on the benchmark platform offered by the University of California-Riverside (UCR) [
26]. However, it also leads to a much higher space-time consumption.
To reduce the complexity of LDTW, an alternating matrix whose size is much smaller than the three-dimensional CDM used in LDTW is presented, and an evolutionary tree is introduced to represent the warping paths as well. The main contributions of this paper are twofold:
- (1)
A two-channel matrix with an alternating scheme is proposed for similarity calculation.
- (2)
A chain tree with an evolutionary scheme is proposed to find the optimal warping path with the similarity calculation process simultaneously.
The rest of this paper is organized as follows. The preliminary is given in
Section 2.
Section 3 presents the proposed method. The experiment and results are shown in
Section 4.
Section 5 concludes the work.
3. The Proposed Method
There are two goals for DTW and the variant algorithms in general, which are finding (1) the similarity and (2) the optimal warping path of two given time series. This section will present our solutions, respectively.
3.1. The Alternating Matrix Based Similarity Calculation
The primary innovation of the proposed method is the usage of a two-channel matrix with an alternating scheme, which can replace the three-dimensional CDM of LDTW and save a lot of computer memory.
As illustrated in
Figure 3, the proposed matrix has two channels indicated by
and
, respectively. It could be seen as a subset of the three-dimensional CDM and travels over the CDM space during the similarity calculation process step by step. In each step, data in
stand for the calculated result of the previous step. Moreover, it is reserved to participate in the calculation of the current step, which happens in
. The last thing to accomplish in each step is to alternate the role of the two channels, in other words
(or
) in Step
will be
(or
) in Step
, which is the main reason why we call our matrix the alternating matrix (AM).
The calculation workflow can be seen in
Figure 4. The system takes the above-mentioned
as input and outputs the similarity
S which equals to a specific element of the AM (i.e.,
). The core step is the update of the AM, which is described in Algorithm 1. In the beginning, the algorithm travels over Y and the warping path dimension as shown from Step 1 to Step 4, where minS and maxS are the ranges calculated by functions named MinStep() and MaxStep(), respectively. Readers can find the calculation details in Ref. [
21]. Step 5 specifies how an element
, as shown in
Figure 3, is determined by pre-calculated
,
and
. Channel
will be reset in Step 9 before the alternating process, for it will become
in the next round of iteration. The iteration stops when
i becomes larger than
N.
Algorithm 1: AM Update |
Input: X, Y, N, M, D, i, cur, pre, LUB |
Ouput: updated D |
1 | for j from 1 to M do |
2 | minS←MinStep(i, j), maxS←MaxStep(i, j, N, M, LUB) |
3 | if minS < maxS do |
4 | for s from minS to maxS do |
5 |
|
6 |
end for |
7 |
end if |
8 | end for |
9 | |
3.2. The Evolutionary Chain Tree Based Optimal Warping Path Determination
Besides the similarity, we can also find the corresponding warping path, which shows the matching pairs of two series. To achieve that, a chain tree with an evolutionary scheme is proposed. We also modified the structure of the AM, where each element possesses not only a value but also a pointer.
For example, the nodes and links of the chain tree are shown as dots and arrows in
Figure 5, and six AM elements are drawn as cubes. Each cube is divided into two parts, the top part is the pointer domain leading to a corresponding tree node, while the bottom part is the value domain for the storage of the cumulative distance.
The above tree is referred to as the evolutionary chain tree (ECT) because we use a chain tree to represent the warping paths and the tree is growing and pruning dynamically during the process. The usage of ECT is another major contribution of this work.
With the ECT, the workflow demonstrated in
Figure 4 can be extended to an updated version shown in
Figure 6. The main differences are marked as blocks in grey, which include the growing and pruning of the ECT, and the retrieval of the optimal warping path.
3.2.1. Growing
The scale of ECT grows after each update step of AM. Specifically, as soon as the computation in
finished, tree nodes will be created and linked to the ECT. Each tree node is initialized as a structure
shown in Expression (9).
where
is the pointer that leads to a prior tree node. Description of
will be given later.
If a node
is initialized and linked from AM element
as shown in
Figure 7a, the next question is which node is its precursor. According to Step 5 in Algorithm 1,
is partially determined by the minimum among
,
and
. Therefore, the precursor of
is the tree node that links from the minimum among
,
and
as well. The above processes are shown in Algorithm 2, from Steps 5 to Step 7.
The
term of a tree node
p is a four-digit value. The higher two digits are defined in
Table 1, which is a clue to finding all the X and Y indexes of the optimal warping path nodes since we did not save them. Specifically, when retrieving the optimal warping path, it begins from the tree node linked from
backwards to the first one following the pointers. Because the indexes of the last node are known, with the higher two digits, it is easy to find the indexes of the rests. While the lower two digits stand for the number of its successors, which is no more than three as shown in
Figure 7b. The lower two digits are crucial to the pruning process introduced in the next section. Step 8 in Algorithm 2 describes the process related to the
term accordingly.
Algorithm 2: ECT Growing |
Input: N, M, D, i, cur, pre, LUB |
Ouput: updated D |
1 | for j from 1 to M do |
2 | minS←MinStep(i, j), maxS←MaxStep(i, j, N, M, LUB) |
3 | if minS < maxS do |
4 | for s from minS to maxS do |
5 |
|
6 | q←min{Dpre [j][s−1], Dpre [j-1][s−1], Dcur [j-1][s−1]} |
7 | |
8 |
|
9 |
end for |
10 |
end if |
11 | end for |
3.2.2. Pruning
As the ECT grows, some branches lose their activity.
Figure 8a demonstrates such a case, where two branches are not growing after new nodes have been added to ECT. Those branches can be pruned to save memory; the pruning result is shown in
Figure 8b.
In our method, the pruning starts from leaf nodes drawn as circles in
Figure 8a. They can be found from
as shown in Algorithm 3, Step 5. If their lower two-digit data term equals 0b00, then they need to be removed because it means they have no successor.
Algorithm 3: ECT Pruning |
Input: N, M, D, i, cur, pre, LUB |
Ouput: updated D |
1 | for j from 1 to M do |
2 | minS←MinStep(i−1, j), maxS←MaxStep(i−1, j, N, M, LUB) |
3 | if minS < maxS do |
4 | for s from minS to maxS do |
5 | |
6 | while lower(p.data) equal to 0b00 do |
7 | q←p, p←p.prior, p.data--, delete q |
8 |
end while |
9 |
end for |
10 |
end if |
11 | end for |
Figure 9a shows the final ECT applying the proposed method on SyntheticControl. Moreover, if no pruning is used, it would look like the one shown in
Figure 9b.
Figure 9c shows the optimal warping path.
5. Discussion
Thanks to the proposed alternating matrix, great achievement has been made in reducing the memory cost compared to the LDTW method. The price of this huge deflation is the need for an additional data structure to maintain the warping paths, as well as a new strategy for optimal warping path retrieval. We solve that problem by the proposed evolutionary chain tree, which will sacrifice little time and space, but it is just a drop in the ocean compared to the contributions. The performance of the proposed method still outranges the LDTW a lot.
Another issue is about the choice of
, which is the only parameter in this method. The usage and setting criteria of
in our work follow the idea introduced by the LDTW algorithm [
21]. In experiments, we found that different values of
may slightly alter the accuracy, but it is insensitive to our final space costs as shown in the ablation experiment. Therefore, to get a fairer comparison, we adopted the same method as [
21] for
to keep a similar parameters environment.