Requirements and Trade-Offs of Compression Techniques in Key–Value Stores: A Survey
Abstract
:1. Introduction
2. Background
2.1. Compression in General
2.2. Compression Techniques Used by Key–Value Stores
2.2.1. Snappy
2.2.2. Zstd
2.2.3. LZ4 and LZ4HC
2.2.4. Gzip, Zlib, and Brotli
2.2.5. LZMA
2.2.6. LZO
2.3. Literature Review
3. Issues of Compression in Key–Value Stores
3.1. Internal Structure and Operations of Key–Value Stores
3.2. Compression in Key–Value Stores
3.3. Key Factors That Affect Compression in Key–Value Stores
4. Analysis
4.1. Compression Ratio
4.2. Impact on Compaction
4.3. Performance of Key–Value Stores
4.4. Impact on Resource Utilization
5. Lessons and Suggestions
5.1. Lessons
5.2. Suggestions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sayood, K. Introduction to Data Compression, 5th ed.; Morgan Kaufmann: Cambridge, MA, USA, 2018; ISBN 978-0-12-809474-4. [Google Scholar]
- Salomon, D. Data Compression: The Complete Reference, 4th ed.; Springer: London, UK, 2007; ISBN 978-1-84628-602-5. [Google Scholar]
- Jayasankar, U.; Thirumal, V.; Ponnurangam, D. A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. J. King Saud Univ. Comput. Inf. Sci. 2021, 33, 119–140. [Google Scholar] [CrossRef]
- Kleppmann, M. Designing Data-Intensive Applications: The Big Ideas behind Reliable, Scalable, and Maintainable Systems; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017; ISBN 978-1-449-37332-0. [Google Scholar]
- Ramadhan, A.R.; Choi, M.; Chung, Y.; Choi, J. An Empirical Study of Segmented Linear Regression Search in LevelDB. Electronics 2023, 12, 1018. [Google Scholar] [CrossRef]
- LevelDB: A Fast Key-Value Storage Library Written at Google. Available online: https://github.com/google/leveldb (accessed on 30 July 2023).
- Dong, S.; Kryczka, A.; Jin, Y.; Stumm, M. Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST’21), Online Conference, 23–25 February 2021; pp. 33–49. [Google Scholar]
- Elhemali, M.; Gallagher, N.; Gordon, N.; Idziorek, J.; Krog, R.; Lazier, C.; Mo, E.; Mritunjai, A.; Perianayagam, S.; Rath, T.; et al. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service. In Proceedings of the 2022 USENIX Annual Technical Conference (ATC’22), Carlsbad, CA, USA, 11–13 July 2022; pp. 1037–1048. [Google Scholar]
- Sumbaly, R.; Kreps, J.; Gao, L.; Feinberg, A.; Soman, C.; Shah, S. Serving Large-scale Batch Computed Data with Project Voldemort. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), San Jose, CA, USA, 14–17 February 2012. [Google Scholar]
- Cao, Z.; Dong, S.; Vemuri, S.; Du, D. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20), Santa Clara, CA, USA, 25–27 February 2020; pp. 209–223. [Google Scholar]
- Dong, S.; Callaghan, M.; Galanis, L.; Borthakur, D.; Savor, T.; Strum, M. Optimizing Space Amplification in RocksDB. In Proceedings of the CIDR, Chaminade, CA, USA, 8–11 January 2017. [Google Scholar]
- Kim, J.; Vetter, J.S. Implementing efficient data compression and encryption in a persistent key-value store for HPC. Int. J. High Perform. Comput. Appl. 2019, 33, 1098–1112. [Google Scholar] [CrossRef]
- Aghav, S. Database compression techniques for performance optimization. In Proceedings of the 2010 2nd International Conference on Computer Engineering and Technology, Chengdu, China, 16–19 April 2010. [Google Scholar]
- Snappy: A Fast Compressor/Decompressor. Available online: https://github.com/google/snappy (accessed on 30 July 2023).
- Zstandard—Fast Real-Time Compression Algorithm. Available online: https://github.com/facebook/zstd (accessed on 30 July 2023).
- LZ4: Extremely Fast Compression Algorithm. Available online: https://github.com/lz4/lz4 (accessed on 30 July 2023).
- Apache Cassandra: Open Source NoSQL Database. Available online: https://cassandra.apache.org/ (accessed on 30 July 2023).
- WiredTiger Storage Engine. Available online: https://www.mongodb.com/docs/manual/core/wiredtiger/ (accessed on 30 July 2023).
- Welcome to Apache HBaseTM. Available online: https://hbase.apache.org/ (accessed on 30 July 2023).
- O’Neil, P.; Cheng, E.; Gawlick, D.; O’Neil, E. The log-structured merge-tree (LSM-tree). Acta Inform. 1996, 33, 351–385. [Google Scholar] [CrossRef]
- Ni, J.; Li, J.; McAuley, J. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 188–197. [Google Scholar]
- Yang, J.; Yue, Y.; Rashmi, K.V. A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter. ACM Trans. Storage 2021, 17, 1–35. [Google Scholar] [CrossRef]
- Moffat, A. Huffman Coding. ACM Comput. Surv. (CSUR) 2019, 52, 1–35. [Google Scholar] [CrossRef]
- Pu, I.M. Fundamental Data Compression; Butterworth-Heinemann: Oxford, UK, 2006; ISBN 978-0-7506-6310-6. [Google Scholar]
- Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
- zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. Available online: https://zlib.net/ (accessed on 30 July 2023).
- Alakuijala, J.; Farruggia, A.; Ferragina, P.; Kliuchnikov, E.; Obryk, R.; Szabadka, Z.; Vandevenne, L. Brotli: A general-purpose data compressor. ACM Trans. Inf. Syst. (TOIS) 2017, 37, 1–30. [Google Scholar] [CrossRef]
- GNU Gzip: General File (de)Compression. Available online: https://www.gnu.org/software/gzip/manual/gzip.html (accessed on 30 July 2023).
- LZMA: What Is LZMA Compression? Available online: https://www.winzip.com/en/learn/tips/what-is-lzma/ (accessed on 30 July 2023).
- LZO. Available online: http://www.oberhumer.com/opensource/lzo/ (accessed on 30 July 2023).
- Oswal, S.; Singh, A.; Kumari, K. Deflate compression algorithm. Int. J. Eng. Res. Gen. Sci. 2016, 4, 430–436. [Google Scholar]
- Kodituwakku, S.; Amarasinghe, U. Comparison of lossless data compression algorithms for text data. Indian J. Comput. Sci. Eng. 2010, 1, 416–425. [Google Scholar]
- Rana, K.; Thakur, S. Data compression algorithm for computer vision applications: A survey. In Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 5–6 May 2017; pp. 1214–1219. [Google Scholar]
- Faria, L.N.; Fonseca, L.M.; Costa, M.H. Performance evaluation of data compression systems applied to satellite imagery. J. Electr. Comput. Eng. 2012, 2012, 18. [Google Scholar] [CrossRef]
- Srisooksai, T.; Keamarungsi, K.; Lamsrichan, P.; Araki, K. Practical data compression in wireless sensor networks: A survey. J. Netw. Comput. Appl. 2012, 35, 37–39. [Google Scholar] [CrossRef]
- Vaidya, M.; Walia, E.S.; Gupta, A. Data compression using Shannon-fano algorithm implemented by VHDL. In Proceedings of the 2014 International Conference on Advances in Engineering & Technology Research (ICAETR-2014), Unnao, Kanpur, India, 1–2 August 2014. [Google Scholar]
- Chiosa, M.; Maschi, F.; Müller, I.; Alonso, G.; May, N. Hardware acceleration of compression and encryption in SAP HANA. Proc. Vldb Endow. 2022, 15, 3277–3291. [Google Scholar] [CrossRef]
- Mittal, S.; Vetter, J.S. A survey of architectural approaches for data compression in cache and main memory systems. IEEE Trans. Parallel Distrib. 2015, 27, 1524–1536. [Google Scholar] [CrossRef]
- Kimura, H.; Narasayya, V.; Syamala, M. Compression aware physical database design. In Proceedings of the 37th International Conference on Very Large Data Bases 2011 (VLDB 2011), Seattle, WA, USA, 29 August–3 September 2011. [Google Scholar]
- Abadi, D.; Madden, S.; Ferreira, M. Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD ’06), Chicago, IL, USA, 27–29 June 2006; pp. 671–682. [Google Scholar]
- Binnig, C.; Hildenbrand, S.; Färber, F. Dictionary-based order-preserving string compression for main memory column stores. In Proceedings of the International Conference on Management of Data (SIGMOD/PODS ’09), Providence, RI, USA, 29 June 2009–2 July 2009; pp. 283–296. [Google Scholar]
- Mladenova, T.; Kalmukov, Y.; Marinov, M.; Valova, I. Impact of Data Compression on the Performance of Column-oriented Data Stores. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 416–421. [Google Scholar] [CrossRef]
- Müller, I.; Ratsch, C.; Faerber, F. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems. In Proceedings of the 17th International Conference on Extending Database Technology (EDBT), Athens, Greece, 24–28 March 2014. [Google Scholar]
- Ma, L.; Xie, R.; Zhang, T. ZipKV: In-Memory Key-Value Store with Built-In Data Compression. In Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management, Orlando, FL, USA, 18 June 2023; pp. 150–162. [Google Scholar]
- Zhang, F.; Wan, W.; Zhang, C.; Zhai, J.; Chai, Y.; Li, H.; Du, X. CompressDB: Enabling efficient compressed data direct processing for various databases. In Proceedings of the 2022 International Conference on Management of Data, Philadelphia, PA, USA, 12–17 June 2022; pp. 1655–1669. [Google Scholar]
- Guler, B.; Ozkasap, O. Compressed incremental checkpointing for efficient replicated key-value stores. In Proceedings of the 2017 IEEE Symposium on Computers and Communications (ISCC), Heraklion, Greece, 3–6 July 2017; pp. 76–81. [Google Scholar]
- Jia, Y.; Shao, Z.; Chen, F. SlimCache: An efficient data compression scheme for flash-based key-value caching. ACM Trans. 2020, 16, 1–34. [Google Scholar] [CrossRef]
- Jin, H.; Choi, W.G.; Choi, J.; Sung, H.; Park, S. Improvement of RocksDB Performance via Large-Scale Parameter Analysis and Optimization. J. Inf. Process. Syst. 2022, 18, 374–388. [Google Scholar] [CrossRef]
- Tkachenko, V. Evaluating Database Compression Methods: Update. Available online: https://www.percona.com/blog/evaluating-database-compression-methods-update/ (accessed on 30 July 2023).
- Skibiński, P. Lzbench. Available online: https://github.com/inikep/lzbench (accessed on 30 July 2023).
- Zhang, Z.; Yue, Y.; He, B.; Xiong, J.; Chen, M.; Zhang, L.; Sun, N. Pipelined compaction for the LSM-tree. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, 19–23 May 2014; pp. 777–786. [Google Scholar]
- Lim, H.; Andersen, D.G.; Kaminsky, M. Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16), Santa Clara, CA, USA, 22–25 February 2016; pp. 149–166. [Google Scholar]
- Balmau, O.; Dinu, F.; Zwaenepoel, W.; Gupta, K.; Chandhiramoorthi, R.; Didona, D. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC’19), Renton, WA, USA, 10–12 July 2019; pp. 753–766. [Google Scholar]
- Kim, J.; Lee, S.; Vetter, J.S. PapyrusKV: A high-performance parallel key-value store for distributed NVM architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 12–17 November 2017; pp. 1–14. [Google Scholar]
- Yang, J.; Kim, J.; Hoseinzadeh, M.; Izraelevitz, J.; Swanson, S. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20), Santa Clara, CA, USA, 25–27 February 2020; pp. 169–182. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jaranilla, C.; Choi, J. Requirements and Trade-Offs of Compression Techniques in Key–Value Stores: A Survey. Electronics 2023, 12, 4280. https://doi.org/10.3390/electronics12204280
Jaranilla C, Choi J. Requirements and Trade-Offs of Compression Techniques in Key–Value Stores: A Survey. Electronics. 2023; 12(20):4280. https://doi.org/10.3390/electronics12204280
Chicago/Turabian StyleJaranilla, Charles, and Jongmoo Choi. 2023. "Requirements and Trade-Offs of Compression Techniques in Key–Value Stores: A Survey" Electronics 12, no. 20: 4280. https://doi.org/10.3390/electronics12204280
APA StyleJaranilla, C., & Choi, J. (2023). Requirements and Trade-Offs of Compression Techniques in Key–Value Stores: A Survey. Electronics, 12(20), 4280. https://doi.org/10.3390/electronics12204280