A Novel Time-Sensitive Composite Similarity Model for Multivariate Time-Series Correlation Analysis
Abstract
:1. Introduction
- A time-series segmentation approach that is designed for DTW is proposed. Generally, when using DTW to compute two time-series distances, similar important points such as poles need to be aligned, so we design a segmentation that cuts time series by the number of eligible peaks and troughs and stipulate counting rules to ignore tiny fluctuations.
- A time-sensitive composite similarity model is proposed that can consider more sequential stock parameters and combine the traditional ‘rise or fall together’ similarity measures to measure the similarity of different stocks.
- Comparisons of prediction accuracy between traditional similarity measures are devised to validate the effectiveness of the proposed similarity model, and the time sensitivity of the composite model is also embodied in the experimental results.
2. Related Work
2.1. Correlation
2.2. Stock Time-Series Correlation
2.3. Dynamic Time Warping
- Monotonicity. All the points in the time series should be aligned by the time order. For example, in (a) of Figure 5, all the black dotted lines connect all the pairs of points aligned with each other, but the pair of points connected by the red dotted line is not allowed to be aligned.
- Continuity. To ensure that all the points in the two sequences are matched in the calculation process, the calculation of the two points’ distances cannot be skipped, and it should be continuously calculated. It is easy to find that Dc(i, j) is dependent on Dc(i − 1, j − 1), Dc(i − 1, j) and Dc(i, j − 1) in step 2 of the loss-matrix-obtaining process. For example, if we skip the calculation of Dc(i − 1, j), that means we could not obtain the value of Dc(i − 1, j); then, the value of MIN(Dc(i − 1, j − 1), Dc(i − 1, j), Dc(i, j − 1)) could not be obtained, and that means that the value of Dc(i, j) could not be obtained.
- Boundary conditions. The start point and end point of one time series should be aligned with the start point and end point of another time series. When matching one time series to another, the matching direction should be consistent, both from the start point to the end point. For example, in (b) of the Figure 5, if we want to obtain the DTW distance between two time series, the pairs of points connected by the red dotted lines must be aligned with each other.
- Slope constraints. To avoid the same points in one time series being aligned too many times in another time series, just as in (c) in Figure 5, the slope could be constrained.
- Warping windows. Generally, the best-matching paths tend to be near the diagonal, just as in the condition in Figure 6, so sometimes only a suitable path in a window near the diagonal needs to be considered.
3. DTW-Based Temporal Composite Similarity Model
3.1. Peaks and Troughs Time-Series Segmentation (PTS)
- Step 1: Backtracking from the last point (X [m]) of X and finding the first peak as the beginning point of it, call this point X [start].
- Step 2: Backtracking X from the point X [start], we found in the last step, X [i]. If the last step is step 1, X [i] may be X [start]. If the last step is step 2, X [i] may be any point before X [start]. Find a trough next to X [i] and set it as X [j]. If the deviation of the value of X [i] and the value of X [j] is larger than the threshold value δ, similar to the peak and trough deviation marked by the red line in Figure 9, then we take this peak and trough as eligible fluctuations. If the deviation of the value of X [i] and the value of X [j] is not larger than the threshold value δ, similar to the peak and trough deviation marked by the blue line in Figure 9, then we go on to backtrack and find the next trough that could meet the condition.
- Step 3: Go on the backtrack sequence to find a new peak next to the trough, which is obtained in step 2, and repeat step 2 until the number of eligible fluctuations reaches nef (the eligible fluctuation number set according to segmenting demand). We suppose the last peak we find is X [end], where X [end…start] is the sequence used as input to the DTW approach.
Algorithm 1 Algorithm for PTS |
Input:Sequence, ▷ The original time series δ, ▷ Peak and trough deviation threshold value nef ▷ The number of eligible fluctuations Output: newSequence fluctuation = 0 i = len(Sequence) – 1 j = 0 while i >= 2 do i = i – 1 if Sequence[i − 1] <= Sequence[i] and Sequence[i] >= Sequence[i + 1] do ▷find a peak j = i while j >= 2 do j = j – 1 if Sequence[j − 1] >= Sequence[j] and Sequence[j] <= Sequence[j + 1] and abs(Sequence[i] − Sequence[j]) >= δ do ▷ find a trough which could construct an eligible fluctuation with the peak we found before fluctuation = fluctuation + 1 i = j break end if end while end if if fluctuation == nef do break end if end while newSequence ← Sequence[j:] |
3.2. Time-Sensitive Composite Similarity Model
Algorithm 2 Calculation of number of days rising or falling together |
Input: changesequence1, changesequence2 ▷change sequences of stock1 and stock2 Output:roft ▷number of days rising or falling together roft = 0 if len(changesequence1) > len(changesequence2) do days = len(changesequence2) end if else do days = len(changesequence1) end else for i = 0 to days − 1 do if sequence1[i] > 0 do if sequence2[i] > 0 do roft = roft + 1 end if end if if sequence1[i] < 0 do if sequence2[i] < 0 do roft = roft + 1 end if end if if sequence1[i] == 0 do if sequence2[i] == 0 do roft = roft + 1 end if end if end for |
4. Performance Evaluation
4.1. Experimental Setup
4.2. Results and Discussion
4.3. Verification of Generality
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. J. 2020, 90, 106181. [Google Scholar] [CrossRef] [Green Version]
- Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics. Speech Signal Process. 1978, 26, 43. [Google Scholar] [CrossRef] [Green Version]
- Udagawa, Y. Approach for retrieving similar stock price patterns using dynamic programming method. In Proceedings of the iiWAS2017: The 19th International Conference on Information Integration and Web-based Applications & Services, Salzburg, Austria, 4–6 December 2017; pp. 126–130. [Google Scholar] [CrossRef]
- Yao, X.; Wei, H.L. Short-term stock price forecasting based on similar historical patterns extraction. In Proceedings of the 23rd International Conference on Automation and Computing, University of Huddersfield, Huddersfield, UK, 7–8 September 2017. [Google Scholar] [CrossRef]
- Tsinaslanidis, P.E. Subsequence dynamic time warping for charting: Bullish and bearish class predictions for NYSE stocks. Expert Syst. Appl. 2018, 94, 193–204. [Google Scholar] [CrossRef] [Green Version]
- Thomakos, D.; Klepsch, J.; Politis, D. Model Free Inference on Multivariate Time Series with Conditional Correlations. Stats 2020, 3, 31. [Google Scholar] [CrossRef]
- Li, H.; Liang, M.; He, T. Optimizing the Composition of a Resource Service Chain with Interorganizational Collaboration. IEEE Trans. Ind. Inform. 2017, 13, 1152–1161. [Google Scholar] [CrossRef]
- Yu, R.; Gao, J.; Yu, M.; Lu, W.; Xu, T.; Zhao, M.; Zhang, J.; Zhang, R.; Zhang, Z. LSTM-EFG for wind power forecasting based on sequential correlation features. Future Gener. Comput. Syst. 2019, 93, 33–42. [Google Scholar] [CrossRef]
- John, M.; Wu, Y.; Narayan, M.; John, A.; Ikuta, T.; Ferbinteanu, J. Estimation of Dynamic Bivariate Correlation Using a Weighted Graph Algorithm. Entropy 2020, 22, 617. [Google Scholar] [CrossRef]
- Zhang, E.; Li, J.; Yu, H.; Lin, H.; Chen, G. Correlation analysis between stock prices and four financial indexes for some listed companies of mainland China. In Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI 2017), Shanghai, China, 14–16 October 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Chaudhari, H.; Crane, M. Cross-correlation dynamics and community structures of cryptocurrencies. J. Comput. Sci. 2020, 44, 101130. [Google Scholar] [CrossRef]
- Chen, Y.; Wei, Z.; Huang, X. Incorporating Corporation Relationship via Graph Convolutional Neural Networks for Stock Price Prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1655–1658. [Google Scholar] [CrossRef]
- Rubin, D.N.; Bassett, D.S.; Ready, R. Uncovering dynamic stock return correlations with multilayer network analysis. Appl. Netw. Sci. 2019, 4. [Google Scholar] [CrossRef] [Green Version]
- Yan, Y.; Wu, B.; Tian, T.; Zhang, H. Development of stock networks using part mutual information and australian stock market data. Entropy 2020, 22, 773. [Google Scholar] [CrossRef]
- Brandi, G.; Gramatica, R.; Matteo, T. Di Unveil stock correlation via a new tensor-based decomposition method. J. Comput. Sci. 2020, 101116. [Google Scholar] [CrossRef]
- Arévalo, R.; García, J.; Guijarro, F.; Peris, A. A dynamic trading rule based on filtered flag pattern recognition for stock market price forecasting. Expert Syst. Appl. 2017, 81, 177–192. [Google Scholar] [CrossRef]
- Farias Nazário, R.T.; e Silva, J.L.; Sobreiro, V.A.; Kimura, H. A literature review of technical analysis on stock markets. Q. Rev. Econ. Financ. 2017, 66, 115–126. [Google Scholar] [CrossRef]
- Li, D.; Li, Z.; Li, R. Automate the identification of technical patterns: A K-nearest-neighbour model approach. Appl. Econ. 2018, 50, 1978–1991. [Google Scholar] [CrossRef]
- Wang, G.J.; Xie, C.; Stanley, H.E. Correlation Structure and Evolution of World Stock Markets: Evidence from Pearson and Partial Correlation-Based Networks. Comput. Econ. 2018, 51, 607–635. [Google Scholar] [CrossRef]
- Guan, H.; Guan, S.; Zhao, A. Forecasting model based on neutrosophic logical relationship and Jaccard similarity. Symmetry 2017, 9, 191. [Google Scholar] [CrossRef] [Green Version]
- Xi, X.; An, H. Research on energy stock market associated network structure based on financial indicators. Phys. A Stat. Mech. Its Appl. 2018, 490, 1309–1323. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, Y.; Wang, S.; Yao, Y.; Fang, B.; Yu, P.S. Improving stock market prediction via heterogeneous information fusion. Knowl.-Based Syst. 2018, 143, 236–247. [Google Scholar] [CrossRef] [Green Version]
- Kim, S.H.; Lee, H.S.; Ko, H.J.; Jeong, S.H.; Byun, H.W.; Oh, K.J. Pattern matching trading system based on the dynamic time warping algorithm. Sustainability 2018, 10, 4641. [Google Scholar] [CrossRef] [Green Version]
- Deng, S.; Xiang, Y.; Fu, Z.; Wang, M.; Wang, Y. A hybrid method for crude oil price direction forecasting using multiple timeframes dynamic time wrapping and genetic algorithm. Appl. Soft Comput. J. 2019, 82, 105566. [Google Scholar] [CrossRef]
- Liu, Y.T.; Zhang, Y.A.; Zeng, M. Adaptive Global Time Sequence Averaging Method Using Dynamic Time Warping. IEEE Trans. Signal Process. 2019, 67, 2129–2142. [Google Scholar] [CrossRef]
- Li, M.; Bijker, W. Vegetable classification in Indonesia using Dynamic Time Warping of Sentinel-1A dual polarization SAR time series. Int. J. Appl. Earth Obs. Geoinf. 2019, 78, 268–280. [Google Scholar] [CrossRef]
- Wang, X.; Yu, F.; Pedrycz, W.; Yu, L. Clustering of interval-valued time series of unequal length based on improved dynamic time warping. Expert Syst. Appl. 2019, 125, 293–304. [Google Scholar] [CrossRef]
- Yao, X.; Wei, H.L. Off-line signature verification based on a new symbolic representation and dynamic time warping. In Proceedings of the 22nd International Conference on Automation and Computing, ICAC 2016: Tackling the New Challenges in Automation and Computing 2016, Colchester, UK, 7–8 September 2016; pp. 108–113. [Google Scholar] [CrossRef]
- Parziale, A.; Diaz, M.; Ferrer, M.A.; Marcelli, A. SM-DTW: Stability Modulated Dynamic Time Warping for signature verification. Pattern Recognit. Lett. 2019, 121, 113–122. [Google Scholar] [CrossRef]
- Lerato, L.; Niesler, T. Feature trajectory dynamic time warping for clustering of speech segments. Eurasip J. Audio Speech Music. Process. 2019, 2019. [Google Scholar] [CrossRef]
- Yang, C.Y.; Chen, P.Y.; Wen, T.J.; Jan, G.E. Imu consensus exception detection with dynamic time warping—A comparative approach. Sensors 2019, 19, 2237. [Google Scholar] [CrossRef] [Green Version]
- Soheily-Khah, S.; Marteau, P.F. Sparsification of the alignment path search space in dynamic time warping. Appl. Soft Comput. J. 2019, 78, 630–640. [Google Scholar] [CrossRef] [Green Version]
- Jazayeri, S.; Saghafi, A.; Esmaeili, S.; Tsokos, C.P. Automatic object detection using dynamic time warping on ground penetrating radar signals. Expert Syst. Appl. 2019, 122, 102–107. [Google Scholar] [CrossRef]
Feature Name | Description | Value |
---|---|---|
stockcode | code of a stock | 600028.SH |
tradedate | trading date | 2020-01-02 |
close | closing price of a day | 5.170 |
turnover_rate | turnover rate | 0.130 |
volume_ratio | volume ratio | 2.200 |
pe | price-to-earnings ratio | 9.922 |
pe_ttm | price-to-earnings trailing twelve months ratio | 13.49 |
pb | price-to-book ratio (total market value/net assets) | 0.860 |
ps | price-to-sales ratio | 0.220 |
ps_ttm | price-to-sales trailing twelve months ratio | 0.210 |
No. | Stock Code | Stock Name |
---|---|---|
1 | 600000.SH | Shanghai Pudong Development Bank |
2 | 600004.SH | Baiyun Airport |
3 | 600009.SH | Shanghai Airport |
4 | 600010.SH | Inner Mongolia Baotou Steel Union |
5 | 600011.SH | Huaneng Power International |
6 | 600015.SH | Hua Xia Bank |
7 | 600016.SH | China Minsheng Banking Corp., Ltd. |
8 | 600018.SH | Shanghai International Port (Group) Co., Ltd. |
9 | 600019.SH | Baoshan Iron & Steel |
10 | 600023.SH | Zheneng Electric Power Co., Ltd. |
11 | 600025.SH | Huaneng Hydropower |
12 | 600027.SH | Huadian Power International |
13 | 600028.SH | Sinopec |
14 | 600029.SH | China Southern Airlines |
15 | 600030.SH | Citic Securities |
Statedate | Stockcode | Close | Turnover_Rate | Volume_Ratio | PE | (PETTM, PB, PS) | PSTTM |
---|---|---|---|---|---|---|---|
6 June 2019 | 600028 | 5.480 | 0.070 | 0.850 | 10.516 | … | 0.22 |
10 June 2019 | 600028 | 5.540 | 0.100 | 1.310 | 10.632 | … | 0.22 |
11 June 2019 | 600028 | 5.300 | 0.150 | 1.750 | 10.171 | … | 0.21 |
12 June 2019 | 600028 | 5.260 | 0.080 | 0.810 | 10.094 | … | 0.21 |
13 June 2019 | 600028 | 5.250 | 0.080 | 0.820 | 10.075 | … | 0.21 |
… | … | … | … | … | … | … | … |
31 December 2019 | 600028 | 5.110 | 0.090 | 2.040 | 9.806 | … | 0.20 |
No. | Stockcode | Composite Similarity |
---|---|---|
1 | 601618 | 12.1626 |
2 | 600606 | 11.4819 |
3 | 600068 | 11.1654 |
4 | 601669 | 11.0645 |
5 | 000630 | 11.0280 |
6 | 600362 | 8.6147 |
7 | 000709 | 8.0107 |
8 | 600297 | 7.9419 |
9 | 000100 | 7.0628 |
10 | 600104 | 6.4541 |
No. | Stockcode | Composite Similarity |
---|---|---|
1 | 603259 | 0.9996 |
2 | 601066 | 0.9260 |
3 | 600518 | 0.9106 |
4 | 600196 | 0.8926 |
5 | 603986 | 0.7938 |
6 | 002602 | 0.7108 |
7 | 601828 | 0.6393 |
8 | 002450 | 0.4152 |
9 | 002411 | 0.3485 |
10 | 002252 | 0.1839 |
Type | Top 100 | Top 150 |
---|---|---|
composite similarity | 0.830 | 0.753 |
number of days rising or falling together | 0.760 | 0.746 |
DTW (close) | 0.680 | 0.706 |
average rate of whole sample | 0.719 |
Type | 1 Day | 2 Days | 3 Days |
---|---|---|---|
composite similarity | 0.830 | 0.500 | 0.490 |
number of days rising or falling together | 0.760 | 0.460 | 0.460 |
DTW (close) | 0.680 | 0.420 | 0.420 |
average rate of whole sample | 0.719 | 0.321 | 0.318 |
Value of nef | Value of δ | 1 Day | 2 Days | 3 Days |
---|---|---|---|---|
10 | 0.05 | 0.78 | 0.46 | 0.45 |
10 | 0.10 | 0.79 | 0.49 | 0.48 |
10 | 0.15 | 0.78 | 0.49 | 0.48 |
10 | 0.20 | 0.80 | 0.50 | 0.49 |
10 | 0.25 | 0.80 | 0.51 | 0.50 |
10 | 0.30 | 0.83 | 0.50 | 0.49 |
10 | 0.35 | 0.78 | 0.49 | 0.48 |
10 | 0.40 | 0.74 | 0.44 | 0.44 |
Weight of 8 Features | 1 Day | 2 Days | 3 Days |
---|---|---|---|
[0.3,0.1,0.1,0.1,0.1,0.1,0.1,0.1] | 0.77 | 0.46 | 0.45 |
[0.1,0.3,0.1,0.1,0.1,0.1,0.1,0.1] | 0.78 | 0.45 | 0.45 |
[0.1,0.1,0.3,0.1,0.1,0.1,0.1,0.1] | 0.83 | 0.50 | 0.49 |
[0.1,0.1,0.1,0.3,0.1,0.1,0.1,0.1] | 0.79 | 0.44 | 0.43 |
[0.1,0.1,0.1,0.1,0.3,0.1,0.1,0.1] | 0.79 | 0.47 | 0.46 |
[0.1,0.1,0.1,0.1,0.1,0.3,0.1,0.1] | 0.77 | 0.47 | 0.46 |
[0.1,0.1,0.1,0.1,0.1,0.1,0.3,0.1] | 0.77 | 0.45 | 0.44 |
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.3] | 0.78 | 0.45 | 0.44 |
Feature | Value of δ | Value of nef |
---|---|---|
close | 0.05 | 35 |
turnover rate | 0.05 | 10 |
volume ratio | 0.05 | 60 |
pe | 0.1 | 30 |
pe_ttm | 0.1 | 30 |
pb | 0.005 | 50 |
ps | 0.005 | 25 |
ps_ttm | 0.005 | 24 |
Type | Top 50_Rate | Top 100_Rate | Top 150_Rate |
---|---|---|---|
composite similarity | 0.96 | 0.9 | 0.873 |
number of days rising or falling together | 0.86 | 0.85 | 0.86 |
DTW(close) | 0.92 | 0.89 | 0.86 |
average rate of whole sample | 0.866 |
Time | Aqi | PM 2_5 | PM 10 | CO | NO2 | (O3, SO2, Temp) | Humi |
---|---|---|---|---|---|---|---|
1 January 2020 | 62 | 35 | 56 | 0.8 | 49 | … | 36.583 |
2 January 2020 | 80 | 51 | 80 | 1.2 | 64 | … | 41.875 |
3 January 2020 | 82 | 50 | 72 | 1.2 | 65 | … | 46.750 |
4 January 2020 | 74 | 43 | 66 | 1.1 | 59 | … | 44.542 |
5 January 2020 | 83 | 61 | 73 | 1.3 | 66 | … | 70.958 |
… | … | … | … | … | … | … | … |
31 January 2021 | 89 | 66 | 83 | 1.1 | 34 | … | 77.269 |
Feature | Meaning |
---|---|
time | time index |
pm2_5 | Particulate Matter 2.5 |
pm10 | Particulate Matter 10 |
no2 | Nitrogen Dioxide |
co | Carbon Oxide |
o3 | Ozone |
so2 | Sulfur Dioxide |
temp | Temperature |
humi | Humidity |
No. | City | Composite Similarity |
---|---|---|
1 | Langfang | 2.8641 |
2 | Zhangjiakou | 1.6770 |
3 | Baoding | 1.5155 |
4 | Tianjin | 1.4962 |
5 | Hengshui | 1.4928 |
6 | Dalian | 1.4920 |
7 | Dongying | 1.4423 |
8 | Chengde | 1.3974 |
9 | Qingdao | 1.3807 |
Type | Top 50_Rate |
---|---|
composite similarity | 0.880 |
number of days rising or falling together | 0.700 |
DTW(temperature) | 0.860 |
average rate of whole sample | 0.862 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, M.; Wang, X.; Wu, S. A Novel Time-Sensitive Composite Similarity Model for Multivariate Time-Series Correlation Analysis. Entropy 2021, 23, 731. https://doi.org/10.3390/e23060731
Liang M, Wang X, Wu S. A Novel Time-Sensitive Composite Similarity Model for Multivariate Time-Series Correlation Analysis. Entropy. 2021; 23(6):731. https://doi.org/10.3390/e23060731
Chicago/Turabian StyleLiang, Mengxia, Xiaolong Wang, and Shaocong Wu. 2021. "A Novel Time-Sensitive Composite Similarity Model for Multivariate Time-Series Correlation Analysis" Entropy 23, no. 6: 731. https://doi.org/10.3390/e23060731