Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile †
Abstract
:1. Introduction
1.1. Literature Review
1.2. Contribution
2. Methodology
2.1. Underlying Algorithms
2.1.1. Symbolic Aggregate Approximation (SAX)
2.1.2. Minimum Description Length (MDL)
2.1.3. Matrix Profile (MP)
2.2. Combined Methodology
2.2.1. Application of MDL to SAX strings
2.2.2. Hyperparameter Selection: Influence of Alphabet Size Choice upon MDL Compression Rate
2.2.3. Motif Discovery
2.2.4. Independent Side-Length Motif Discovery Process
Algorithm 1: Side-Length-Independent Motif (SLIM) pseudo-code. |
Data: Input raw time series |
Result: Candidate motif locations with variable side-length |
Step A: Transform raw input series into a suitable SAX representation |
Step B: Compress the SAX series using MDL to create an MDL-SAX series |
Step C: MDL-SAX series serves as input to the MP algorithm creating an |
MDL-SAX-MP series |
while examining MDL-SAX-MP series do |
|
end |
2.2.5. Advantages
- Permits identification of motif pairs in which the length of each side is independent.
- Properties of the underlying algorithms are inherited.
- -
- Dimensionality reduction of SAX (if required).
- -
- Efficiency and scalability of the MP.
- Is independent of SAX and MP versions used and so can take advantage of further improvements to these algorithms.
3. Results and Discussions
3.1. Finance
3.1.1. Side-Length-Independent Motif Discovery
3.1.2. Alternative Motif Identification Algorithms Comparison
3.1.3. Localised Volatility Analysis
3.2. Energy Sector
3.2.1. Side-Length-Independent Motif Discovery
3.2.2. Globalised Volatility Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SLIM | Side-Length-Independent Motif |
SES | Simple Exponential Smoothing |
ARIMA | Autoregressive Integrated Moving Average |
SVR | Support Vector Regression |
WIG20 | Warsaw stock exchange index |
SAX | Symbolic Aggregate Approximation |
MDL | Minimum Description Length |
MP | Matrix Profile |
MPI | Matrix Profile Index |
SFA | Symbolic Fourier Approximation |
S&P500 | Standard and Poor’s 500 |
References
- Mueen, A.; Keogh, E.; Zhu, Q.; Cash, S.; Westover, B. Exact Discovery of Time Series Motifs. In Proceedings of the SIAM International Conference on Data Mining, Sparks, NV, USA, 30 April–2 May 2009; pp. 35–53, 473–484. [Google Scholar] [CrossRef]
- Lin, J.; Keogh, E.; Lonardi, S.; Patel, P. Finding motifs in timeseries. In Proceedings of the Second Workshop on Temporal Data Mining, (KDD 2002), Edmonton, AB, Canada, 23–26 July 2002. [Google Scholar]
- Mueen, A. Time series motif discovery: Dimensions and applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2014, 4, 152–159. [Google Scholar] [CrossRef]
- Investopedia (a): Common Chart Pattern Definitions. Available online: https://www.investopedia.com/articles/technical/112601.asp (accessed on 6 December 2021).
- Vivas, E.; Allende-Cid, H.; Salas, R.; Vivas, E. A Systematic Review of Statistical and Machine Learning Methods for Electrical Power Forecasting with Reported MAPE Score. Entropy 2020, 22, 1412. [Google Scholar] [CrossRef] [PubMed]
- He, X.J. Crude Oil Prices Forecasting: Time Series vs. SVR Models. Int. Inf. Manag. Assoc. 2018, 27, 25. Available online: https://scholarworks.lib.csusb.edu/jitim/vol27/iss2/2 (accessed on 6 December 2021).
- Domino, K. The use of the Hurst exponent to investigate the global maximum of the Warsaw Stock Exchange WIG20 index. Phys. Stat. Mech. Its Appl. 2012, 391, 156–169. [Google Scholar] [CrossRef]
- Xiaoxi, D.; Ruoming, J.; Liang, D.; Lee, V.E.; Thornton, J.H. Migration Motif A Spatial Temporal Pattern Mining Approach for Financial Markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 1135–1144. [Google Scholar] [CrossRef]
- Elangovan, R.; Padmavathi, S. A Review on Time Series Motif Discovery Techniques an Application to ECG Signal Classification: ECG Signal Classification Using Time Series Motif Discovery Techniques. Int. J. Artif. Intell. Mach. Learn. (IJAIML) 2019, 9, 39–56. [Google Scholar] [CrossRef]
- Silva, D.F.; Yeh, C.-C.M.; Zhu, Y.; Batista, G.E.A.P.A.; Keogh, E. Fast Similarity Matrix Profile for Music Analysis and Exploration. IEEE Trans. Multimed. 2019, 21, 29–38. [Google Scholar] [CrossRef]
- Gao, Y.; Lin, J. Exploring variable-length time series motifs in one hundred million length scale. Data Min. Knowl. Discov. 2018, 32, 1200–1228. [Google Scholar] [CrossRef]
- Torkamani, S.; Lohweg, V. Survey on time series motif discovery. WIREs Data Min. Knowl. Discov. 2017, 7, e1199. [Google Scholar] [CrossRef]
- Fu, T.K. A review on time series data mining. Eng. Appl. Artif. Intell. 2011, 32, 164–181. [Google Scholar] [CrossRef]
- Chiu, B.; Keogh, E.; Lonardi, S. Probabilistic discovery of time series motifs. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2013; pp. 493–498. [Google Scholar] [CrossRef]
- Lin, J.; Keogh, E.; Wei, L.; Lonardi, S. Experiencing SAX: A novel symbolic representation of time series. Data Min. Knowl. Discov. 2007, 15, 107–144. [Google Scholar] [CrossRef] [Green Version]
- Castro, N.; Azevedo, P.J. Multiresolution Motif Discovery in Time Series. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM2010), Columbus, ON, USA, 29 April–1 May 2021; pp. 665–676. [Google Scholar] [CrossRef] [Green Version]
- Castro, N.; Azevedo, P.J. Time Series Motifs Statistical Significance. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM2011), Mesa, AZ, USA, 28–30 April 2011; pp. 687–698. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Hou, U.; Yiu, M.L.; Gong, Z. Quick-motif: An efficient and scalable framework for exact motif discovery. In Proceedings of the IEEE 31st International Conference on Data Engineering (ICDE 2015), Seoul, Korea, 13–16 April 2015; pp. 579–590. [Google Scholar] [CrossRef]
- Yeh, C.M.; Zhu, Y.; Ulanova, L.; Begum, N.; Ding, Y.; Dau, H.; Silva, D.F.; Mueen, A.; Keogh, E. Matrix Profile I: All pairs similarity joins for time series a unifying view that includes motifs discords and shapelets. In Proceedings of the IEEE ICDM, Barcelona, Spain, 1–15 December 2016; pp. 1317–1322. [Google Scholar] [CrossRef]
- The University of California Riverside (UCR) Matrix Profile. Available online: https://www.cs.ucr.edu/~eamonn/MatrixProfile.html (accessed on 6 December 2021).
- Yuan, L.; Lin, J. Approximate variable-length time series motif discovery using grammar inference. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, Washington, DC, USA, 25 July 2010; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Nunthanid, P.; Niennattrakul, V.; Ratanamahatana, C.A. Discovery of variable length time series motif. In Proceedings of the 8th Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2011), Khon Kaen, Thailand, 17–19 May 2011; pp. 472–475. [Google Scholar] [CrossRef]
- Nunthanid, P.; Niennattrakul, V.; Ratanamahatana, C.A. Parameter-free motif discovery for time series data. In Proceedings of the 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2012), Hua Hin, Thailand, 16–18 May 2012; pp. 1–4. [Google Scholar] [CrossRef]
- Lam, H.; Calders, T.; Pham, N. Online Discovery of Top-k Similar Motifs in Time Series Data Read. In Proceedings of the 2011 SIAM International Conference on Data Mining (SDM11), Mesa, AZ, USA, 28–30 April 2011; pp. 1004–1015, ISBN 978-0-898719-92-5. [Google Scholar]
- Linardi, M.; Zhu, Y.; Palpanas, T.; Keogh, E. Matrix Profile X: VALMOD–Scalable Discovery of Variable-Length Motifs in Data Series. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD18), Houston, TX, USA, 10–15 June 2018; pp. 1053–1066. [Google Scholar] [CrossRef]
- Madrid, F.; Imani, S.; Mercer, R.; Zimmerman, Z.; Shakibay, N.; Mueen, A.; Keogh, E. Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile. In Proceedings of the IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 10–11 November 2019; Volume 1, pp. 175–182. [Google Scholar] [CrossRef]
- Somarajan, S.; Shankar, M.; Sharma, T.; Jeyanthi, R. Modelling and Analysis of Volatility in Time Series Data. In Soft Computing and Signal Processing (ICSCSP 2018). Part of the Advances in Intelligent Systems and Computing Book Series (AISC, Volume 898); Wang, J., Reddy, G., Prasad, V., Reddy, V., Eds.; Springer: Singapore, 2019; Volume 898, pp. 609–618. [Google Scholar] [CrossRef]
- The University of California Riverside (UCR) SAX. Available online: https://www.cs.ucr.edu/~eamonn/SAX.htm (accessed on 6 December 2021).
- Ruan, G.; Hanson, P.C.; Dugan, H.A.; Plale, B. Mining lake time series using symbolic representation. Ecol. Inform. 2017, 39, 10–22. [Google Scholar] [CrossRef] [Green Version]
- Shieh, J.; Keogh, E. ISAX: Indexing and mining terabyte sized time series. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; Volume 14, pp. 623–631. [Google Scholar]
- Schäfer, P.; Högqvist, M. SFA: A Symbolic Fourier Approximation and Index for Similarity Search in High Dimensional Datasets. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT), Berlin, Germany, 26–30 March 2012; Volume 1, pp. 516–527. [Google Scholar] [CrossRef]
- Amornbunchornvej, C.; Navaporn, S.; Anon, P.; Suttipong, T. Identifying Linear Models in Multi-Resolution Population Data Using Minimum Description Length Principle to Predict Household Income. ACM Trans. Knowl. Discov. Data 2021, 15, 1–30. [Google Scholar] [CrossRef]
- Grünwald, P.D. The Minimum Description Length Principle; MIT Press: Cambridge, MA, USA, 2007; ISBN 9780262072816. [Google Scholar] [CrossRef]
- Meegan, A.; Corbet, S.; Larkin, C. Financial market spillovers during the quantitative easing programmes of the global financial crisis (2007–2009) and the European debt crisis. J. Int. Financ. Mark. Inst. Money 2018, 56, 128–148. [Google Scholar] [CrossRef]
- Bracke, T.; Michael, F. The macro-financial factors behind the crisis: Global liquidity glut or global savings glut? N. Am. J. Econ. Financ. 2012, 23, 185–202. [Google Scholar] [CrossRef]
- Cartwright, E.; Crane, M.; Ruskin, H.J. Financial Time Series: Motif Discovery and Analysis Using VALMOD. In Proceedings of the International Conference on Computational Science, Faro, Portugal, 12–14 June 2019; pp. 771–778. [Google Scholar] [CrossRef] [Green Version]
- Cartwright, E.; Crane, M.; Ruskin, H.J. Financial Time Series: Market Analysis Techniques Based on Matrix Profiles. Eng. Proc. 2021, 5, 45. [Google Scholar] [CrossRef]
- Ferreira, P.G.; Azevedo, P.J. Evaluating deterministic motif significance measures in protein databases. Algorithms Mol. Biol. 2007, 2, 16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Open Power System Data. 2020. Data Package Time Series. Version 2020-10-06: Primary Data from Various Sources, for a Complete List. Available online: https://data.open-power-system-data.org//time_series/latest/ (accessed on 6 December 2021).
- Bloomberg S&P500 Index, Including Summary. Available online: https://www.bloomberg.com/quote/SPX:IND (accessed on 6 December 2021).
- Investopedia (b): Volatility Summary. Available online: https://www.investopedia.com/terms/v/volatility.asp (accessed on 6 December 2021).
- World Health Organisation Covid-19 Pandemic Timeline. Available online: https://www.who.int/news/item/29-06-2020-covidtimeline (accessed on 6 December 2021).
Raw Series Date | SAX Series Index | SAXVal | SAXValDiff | SymJoinNum | RawSeries Index |
---|---|---|---|---|---|
2 January 2009 | 97 | 5 | 1 | 3 | 254 |
7 January 2009 | 98 | 4 | −1 | 1 | 257 |
… | … | … | … | … | … |
28 January 2009 | 104 | 4 | 1 | 1 | 271 |
29 January 2009 | 105 | 3 | −1 | 2 | 272 |
MDLSAXSeriesIdx | SAXValDiffTotal | SymJoinNumTotal | SAXValAmplitude |
---|---|---|---|
1 | 25 | 12 | 15 |
2 | 26 | 11 | 14 |
3 | 26 | 10 | 17 |
… | … | … | … |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cartwright, E.; Crane, M.; Ruskin, H.J. Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile. Forecasting 2022, 4, 219-237. https://doi.org/10.3390/forecast4010013
Cartwright E, Crane M, Ruskin HJ. Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile. Forecasting. 2022; 4(1):219-237. https://doi.org/10.3390/forecast4010013
Chicago/Turabian StyleCartwright, Eoin, Martin Crane, and Heather J. Ruskin. 2022. "Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile" Forecasting 4, no. 1: 219-237. https://doi.org/10.3390/forecast4010013
APA StyleCartwright, E., Crane, M., & Ruskin, H. J. (2022). Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile. Forecasting, 4(1), 219-237. https://doi.org/10.3390/forecast4010013