Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases
Abstract
:1. Introduction
- (1)
- Zaki et al. [16] first discussed the importance of finding frequent patterns in columnar databases. Besides, a depth-first search algorithm, called Equivalence CLass Transformation (ECLAT), was also described to find frequent patterns in a columnar database. Unfortunately, this algorithm cannot be directly used to find periodic-frequent patterns in a columnar temporal database. It is because the ECLAT algorithm completely disregards the temporal occurrence information of an item in the database.
- (2)
- The space of items in a database gives rise to an itemset lattice. The size of this lattice is where n represents the total number of items in a database. This lattice represents the search space for finding interesting patterns. Reducing this vast search space is a challenging task in pattern mining.
- This paper proposes a novel algorithm, called PF-ECLAT, to find periodic-frequent patterns in a columnar temporal database.
- To the best of our knowledge, this is the first algorithm that aims to find periodic-frequent patterns in a columnar temporal database. A key advantage of this algorithm over the state-of-the-art algorithms is that it can also be employed to find periodic-frequent patterns in a horizontal database.
- Experimental results on synthetic and real-world databases demonstrate that our algorithm is memory and runtime efficient and highly scalable.
- Finally, our algorithm’s usefulness was demonstrated with two case studies. The first case study is air pollution analytics, where the proposed algorithm was used to identify geographical areas in which people were regularly exposed to harmful air pollutants in the whole of Japan. The second case study is traffic congestion analytics, where our algorithm was employed to find the set of road segments in which congestion was regularly observed in a transportation network.
2. Related Work
2.1. Frequent Pattern Mining
2.2. Periodic-Frequent Pattern Mining
3. Periodic-Frequent Pattern Model
4. Proposed Algorithm
4.1. PF-ECLAT Algorithm
4.1.1. Finding One Length Periodic-Frequent Patterns
Algorithm 1 PeriodicFrequentItems(Row database (), minimum support (), maximum periodicity () |
|
4.1.2. Finding Periodic-Frequent Patterns Using PFP-List
Algorithm 2 PF-ECLAT(PFP-List) |
|
5. Experimental Results
5.1. Experimental Setup
5.2. Evaluation of PFP-Growth, PFP-Growth++, PS-Growth, and PF-ECLAT Algorithms by Varying Constraint
5.3. Evaluation of PFP-Growth, PFP-Growth++, PS-Growth, and PF-ECLAT Algorithms by Varying Constraint
5.4. Scalability Test
5.5. A Case Study 1: Finding Areas Where People Have Been Regularly Exposed to Hazardous Levels of PM2.5 Pollutant
5.6. A Case Study 2: Traffic Congestion Analytics
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- MySQL. Available online: https://www.mysql.com/ (accessed on 10 March 2021).
- PostGres. Available online: https://www.postgresql.org/ (accessed on 10 March 2021).
- SnowFlake. Available online: https://www.snowflake.com/ (accessed on 10 March 2021).
- BigQuery. Available online: https://cloud.google.com/bigquery (accessed on 10 March 2021).
- Brijs, T.; Swinnen, G.; Vanhoof, K.; Wets, G. Using Association Rules for Product Assortment Decisions: A Case Study. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999; pp. 254–260. [Google Scholar]
- Kiran, R.U.; Shrivastava, S.; Fournier-Viger, P.; Zettsu, K.; Toyoda, M.; Kitsuregawa, M. Discovering Frequent Spatial Patterns in Very Large Spatiotemporal Databases. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL ’20), Seattle, WA, USA, 3–6 November 2020; Lu, C., Wang, F., Trajcevski, G., Huang, Y., Newsam, S.D., Xiong, L., Eds.; ACM: New York, NY, USA, 2020; pp. 445–448. [Google Scholar]
- Tran-The, H.; Zettsu, K. Discovering co-occurrence patterns of heterogeneous events from unevenly-distributed spatiotemporal data. In Proceedings of the 2017 IEEE International Conference on Big Data (BigData 2017), Boston, MA, USA, 11–14 December 2017; pp. 1006–1011. [Google Scholar]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 26–28 May 1993; pp. 207–216. [Google Scholar]
- Han, J.; Cheng, H.; Xin, D.; Yan, X. Frequent Pattern Mining: Current Status and Future Directions. Data Min. Knowl. Discov. 2007, 15, 55–86. [Google Scholar] [CrossRef] [Green Version]
- Aggarwal, C.C. Applications of Frequent Pattern Mining. In Frequent Pattern Mining; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 443–467. [Google Scholar]
- Fournier-Viger, P.; Lin, J.C.W.; Kiran, R.U.; Koh, Y.S. A Survey of Sequential Pattern Mining. Data Sci. Pattern Recognit. 2017, 1, 54–77. [Google Scholar]
- Luna, J.M.; Fournier-Viger, P.; Ventura, S. Frequent itemset mining: A 25 years review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1329. [Google Scholar] [CrossRef]
- Tanbeer, S.K.; Ahmed, C.F.; Jeong, B.S.; Lee, Y.K. Discovering Periodic-Frequent Patterns in Transactional Databases. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2009; pp. 242–253. [Google Scholar]
- Kiran, R.U.; Kitsuregawa, M. Novel Techniques to Reduce Search Space in Periodic-Frequent Pattern Mining. In Database Systems for Advanced Applications; Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 377–391. [Google Scholar]
- Anirudh, A.; Kiran, R.U.; Reddy, P.K.; Kitsuregawa, M. Memory efficient mining of periodic-frequent patterns in transactional databases. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence, Athens, Greece, 6–9 December 2016; pp. 1–8. [Google Scholar]
- Zaki, M.J. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 2000, 12, 372–390. [Google Scholar] [CrossRef] [Green Version]
- Ravikumar, P.; Likitha, P.; Kiran, R.U.; Watanobe, Y.; Zettsu, K. Towards Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases. In Proceedings of the 2021 International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems(IEA/AIE), Kuala Lumpur, Malaysia, 26–29 July 2021. accepted and to be presented. [Google Scholar]
- Han, J.; Pei, J.; Yin, Y.; Mao, R. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Min. Knowl. Discov. 2004, 8, 53–87. [Google Scholar] [CrossRef]
- Amphawan, K.; Lenca, P.; Surarerks, A. Mining Top-K Periodic-Frequent Pattern from Transactional Databases without Support Threshold. In International Conference on Advances in Information Technology; Springer: Berlin/Heidelberg, Germany, 2009; pp. 18–29. [Google Scholar]
- Kiran, R.U.; Reddy, P.K. Towards efficient mining of periodic-frequent patterns in transactional databases. In International Conference on Database and Expert Systems Applications; Springer: Berlin/Heidelberg, Germany, 2010; pp. 194–208. [Google Scholar]
- Amphawan, K.; Surarerks, A.; Lenca, P. Mining Periodic-Frequent Itemsets with Approximate Periodicity Using Interval Transaction-Ids List Tree. In Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, Phuket, Thailand, 9–10 January 2010; pp. 245–248. [Google Scholar] [CrossRef]
- Kiran, R.U.; Reddy, P.K. An Alternative Interestingness Measure for Mining Periodic-Frequent Patterns. In Proceedings of the 16th International Conference on Database Systems for Advanced Applications—Volume Part I (DASFAA’11); Springer: Berlin/Heidelberg, Germany, 2011; pp. 183–192. [Google Scholar]
- Rashid, M.M.; Karim, M.R.; Jeong, B.S.; Choi, H.J. Efficient mining regularly frequent patterns in transactional databases. In International Conference on Database Systems for Advanced Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 258–271. [Google Scholar]
- Fournier-Viger, P. SPMF: A Java Open-Source Data Mining Library. 2020. Available online: http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php (accessed on 4 June 2020).
- National Center for Atmospheric Research, University Corporation for Atmospheric Research. Standardized Precipitation Index (SPI) for Global Land Surface (1949–2012); National Center for Atmospheric Research, University Corporation for Atmospheric Research: Boulder, CO, USA, 2013. [Google Scholar]
- JARTIC. JApan Road Traffic Information Center. 2020. Available online: https://www.jartic.or.jp (accessed on 11 November 2020).
- Times, T.J. Air Pollution Deaths in Japan. 2019. Available online: https://www.japantimes.co.jp/life/2019/05/11/environment/reading-air-tokyo-still-work-air-pollution (accessed on 12 December 2020).
- Ministry of the Environment Government of Japan. SORAMAME. Available online: http://soramame.taiki.go.jp/ (accessed on 12 December 2020).
- Kiran, R.U. PAttern MIning-Python Kit (PAMI-PyKit). 2020. Available online: https://github.com/udayRage/pami_pykit/tree/master/traditional/Eclat-pfp (accessed on 4 March 2021).
ts | Items | ts | Items |
---|---|---|---|
1 | abcf | 6 | abcd |
2 | bd | 7 | ab |
3 | abcd | 8 | cd |
4 | abce | 9 | abcd |
5 | cef | 10 | bcf |
ts | Items | ts | Items | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | b | c | d | e | f | a | b | c | d | e | f | ||
1 | 1 | 1 | 1 | 0 | 0 | 1 | 6 | 1 | 1 | 1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | 7 | 1 | 1 | 0 | 0 | 0 | 0 |
3 | 1 | 1 | 1 | 1 | 0 | 0 | 8 | 0 | 0 | 1 | 1 | 0 | 0 |
4 | 1 | 1 | 1 | 0 | 1 | 0 | 9 | 1 | 1 | 1 | 1 | 0 | 0 |
5 | 0 | 0 | 1 | 0 | 1 | 1 | 10 | 0 | 1 | 1 | 0 | 0 | 1 |
Item | TS-List |
---|---|
a | 1, 3, 4, 6, 7, 9 |
b | 1, 2, 3, 4, 6, 7, 9, 10 |
c | 1, 3, 4, 5, 6, 8, 9, 10 |
d | 2, 3, 6, 8, 9 |
e | 4, 5 |
f | 1, 5, 10 |
S. No | Database | Type | Nature | Transaction Length | Total Transactions | ||
---|---|---|---|---|---|---|---|
Min. | Avg. | Max. | |||||
1 | BMS-WebView-1 | Real | Sparse | 1 | 3 | 267 | 59,602 |
2 | Pollution | Real | Dense | 11 | 460 | 971 | 720 |
3 | Drought | Real | Dense | 6289 | 8341 | 10,122 | 766 |
4 | Congestion | Real | Sparse | 1 | 58 | 337 | 8928 |
5 | BMS-WebView-2 | Real | Sparse | 2 | 5 | 161 | 77,512 |
6 | T10I4D100K | Synthetic | Sparse | 2 | 11 | 29 | 100,000 |
7 | Kosarak | Real | Sparse | 2 | 9 | 2,499 | 990,000 |
Short Biography of Authors
Penugonda Ravikumar is currently pursuing a Ph.D. in Computer and information systems at the University of Aizu, Aizu Wakamatsu, Fukushima, Japan on a deputation basis. He is an Assistant Professor in computer science and engineering at the IIIT – RK Valley, Rajiv Gandhi University of Knowledge Technologies, Andhra Pradesh, India. He received his Master of Engineering degree in computer science from the Indian Institute of Science, Bangalore, Karnataka, India. His current research interests include data mining, air pollution data analytics, traffic congestion data analytics, recommender systems, and time series classification. He has published several papers in reputed international conferences, such as IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE International Conference on Big Data(IEEE BigData), IEEE Symposium on Computational Intelligence and Data Mining (CIDM), International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems(IEA/AIE), International Conference on Soft Computing and Machine Intelligence (ISCMI). | |
Palla Likhitha is pursuing B.Tech in Computer science and engineering at the IIIT – RK Valley, Rajiv Gandhi University of Knowledge Technologies, Andhra Pradesh, India. She published papers in IEEE BIG DATA 2020 and International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems(IEA/AIE). | |
Bathala Venus Vikranth Raj is pursuing B.Tech in Computer science and engineering at the IIIT – RK Valley, Rajiv Gandhi University of Knowledge Technologies, Andhra Pradesh, India. At present, he is working on periodic-frequent pattern mining and spatial pattern mining. | |
Rage Uday Kiran is currently working as an Associate Professor at the University of Aizu, Aizu Wakamatsu, Fukushima, Japan. He also works as a researcher at the University of Tokyo, Tokyo, Japan. He received his PhD degree in computer science from International Institute of Information Technology, Hyderabad, Telangana, India. His current research interests include data mining, parallel computation, air pollution data analytics, traffic congestion data analytics, recommender systems and ICTs for Agriculture. He has published over 50 papers in refereed journals and international conferences, such as The Conference on Information and Knowledge Management (CIKM), International Conference on Extending Database Technology (EDBT), International Conference on Scientific and Statistical Database Management (SSDBM), The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Database Systems for Advanced Applications (DASFAA), and International Conference on Database and Expert Systems Applications (DEXA). | |
Yutaka Watanobe is currently a senior associate professor in the School of Computer Science and Engineering, University of Aizu, Japan. His research interests include visual programming language, data mining, and cloud robotics. | |
Koji Zettsu is a Director General of Big Data Integration Research Center of National Institute of Information and Communications Technology (NICT). He has been doing research and development of data analytics technology in NICT, and now leading Real Space Information Analytics Project since 2016 to implement smart data platform based on data mining and AI. For promoting industry-academia-government collaboration on the platform, he is also a leader of Cross-Data Collaboration Project of Smart IoT Acceleration Forum in Japan. He received Ph.D. in Informatics from Kyoto University in 2005. His research interests are database systems, data mining, information retrieval and software engineering. He has serviced on numerous academic societies, conference committees and working groups. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ravikumar, P.; Likhitha, P.; Venus Vikranth Raj, B.; Uday Kiran, R.; Watanobe, Y.; Zettsu, K. Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases. Electronics 2021, 10, 1478. https://doi.org/10.3390/electronics10121478
Ravikumar P, Likhitha P, Venus Vikranth Raj B, Uday Kiran R, Watanobe Y, Zettsu K. Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases. Electronics. 2021; 10(12):1478. https://doi.org/10.3390/electronics10121478
Chicago/Turabian StyleRavikumar, Penugonda, Palla Likhitha, Bathala Venus Vikranth Raj, Rage Uday Kiran, Yutaka Watanobe, and Koji Zettsu. 2021. "Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases" Electronics 10, no. 12: 1478. https://doi.org/10.3390/electronics10121478
APA StyleRavikumar, P., Likhitha, P., Venus Vikranth Raj, B., Uday Kiran, R., Watanobe, Y., & Zettsu, K. (2021). Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases. Electronics, 10(12), 1478. https://doi.org/10.3390/electronics10121478