Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding
Abstract
:1. Introduction
2. Previous Works
3. Proposed Method
Algorithm 1: The proposed encoding procedure |
Algorithm 2: The proposed decoding procedure |
4. Experimental Results and Analysis
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Northeastern University Graduate Programs. How Much Data Is Produced Every Day? 2020. Available online: https://www.northeastern.edu/graduate/blog/how-much-data-produced-every-day/ (accessed on 17 September 2020).
- Walker, B. Every day big data statistics—2.5 quintillion bytes of data created daily. VCloudNews. 2015. Available online: https://www.dihuni.com/2020/04/10/every-day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily/ (accessed on 10 September 2020).
- Blog.microfocus.com. How Much Data Is Created on The Internet Each Day? Micro Focus Blog. 2020. Available online: https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day/ (accessed on 18 May 2020).
- Pu, I.M. Fundamental Data Compression; Butterworth-Heinemann: Oxford, UK, 2005. [Google Scholar]
- Salomon, D.; Motta, G. Handbook of Data Compression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Porwal, S.; Chaudhary, Y.; Joshi, J.; Jain, M. Data compression methodologies for lossless data and comparison between algorithms. Int. J. Eng. Sci. Innov. Technol. (IJESIT) 2013, 2, 142–147. [Google Scholar]
- Sayood, K. Introduction to Data Compression; Morgan Kaufmann: Burlington, MA, USA, 2017. [Google Scholar]
- Rahman, M.A.; Rabbi, M.F.; Rahman, M.M.; Islam, M.M.; Islam, M.R. Histogram modification based lossy image compression scheme using Huffman coding. In Proceedings of the 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), Dhaka, Bangladesh, 13–15 September 2018; pp. 279–284. [Google Scholar]
- Rahman, M.A.; Islam, S.M.S.; Shin, J.; Islam, M.R. Histogram Alternation Based Digital Image Compression using Base-2 Coding. In Proceedings of the 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, Australia, 10–13 December 2018; pp. 1–8. [Google Scholar]
- Sadchenko, A.; Kushnirenko, O.; Plachinda, O. Fast lossy compression algorithm for medical images. In Proceedings of the 2016 International Conference on Electronics and Information Technology (EIT), Odessa, Ukraine, 23–27 May 2016; pp. 1–4. [Google Scholar]
- Pandey, M.; Shrivastava, S.; Pandey, S.; Shridevi, S. An Enhanced Data Compression Algorithm. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Tamil Nadu, India, 24–25 February 2020; pp. 1–4. [Google Scholar]
- Oswald, C.; Sivaselvan, B. An optimal text compression algorithm based on frequent pattern mining. J. Ambient. Intell. Humaniz. Comput. 2018, 9, 803–822. [Google Scholar] [CrossRef]
- Portell, J.; Iudica, R.; García-Berro, E.; Villafranca, A.G.; Artigues, G. FAPEC, a versatile and efficient data compressor for space missions. Int. J. Remote Sens. 2018, 39, 2022–2042. [Google Scholar] [CrossRef]
- Rahman, M.; Hamada, M. Lossless image compression techniques: A state-of-the-art survey. Symmetry 2019, 11, 1274. [Google Scholar] [CrossRef] [Green Version]
- Rahim, R. Combination of the Blowfish and Lempel-Ziv-Welch Algorithms for Text Compression; OSF Storage; STMIK Triguna Dharma, Universiti Malaysia Perlis; Perlis, Malaysia, 2017. [Google Scholar]
- Gupta, A.; Bansal, A.; Khanduja, V. Modern lossless compression techniques: Review, comparison and analysis. In Proceedings of the 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 22–24 February 2017; pp. 1–8. [Google Scholar]
- Rahman, M.A.; Hamada, M. A Semi-Lossless Image Compression Procedure using a Lossless Mode of JPEG. In Proceedings of the 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 1–4 October 2019; pp. 143–148. [Google Scholar]
- Huffman, D.A. A method for the construction of minimum-redundancy codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
- Welch, T.A. A technique for high-performance data compression. Computer 1984, 17, 8–19. [Google Scholar] [CrossRef]
- Storer, J.A. (Ed.) Image and Text Compression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 176. [Google Scholar]
- Salomon, D. A Concise Introduction to Data Compression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Nelson, M.; Gailly, J.L. The Data Compression Book, 2nd ed.; M & T Books: New York, NY, USA, 1995. [Google Scholar]
- En.wikipedia.org. LZ77 And LZ78. 2020. Available online: https://en.wikipedia.org/wiki/LZ77_and_LZ78 (accessed on 27 May 2020).
- 7-zip.org. 7Z Format. 2020. Available online: https://www.7-zip.org/7z.html (accessed on 7 August 2020).
- Patel, R.A.; Zhang, Y.; Mak, J.; Davidson, A.; Owens, J.D. Parallel lossless data compression on the GPU. In Proceedings of the 2012 Innovative Parallel Computing (InPar), San Jose, CA, USA, 13–14 May 2012; pp. 1–9. [Google Scholar]
- Mahoney, M. Large Text Compression Benchmark. Mattmahoney.net. 2020. Available online: http://mattmahoney.net/dc/text.html (accessed on 17 September 2020).
- Mahoney, M. Data Compression Programs. Mattmahoney.net. 2020. Available online: http://www.mattmahoney.net/dc/ (accessed on 17 September 2020).
- Alakuijala, J.; Szabadka, Z. Brotli compressed data format. Int. Eng. Task Force 2016, 128. [Google Scholar]
- Theregister.com. Google’s New Squeeze: Brotli Compression Open-Sourced. 2020. Available online: https://www.theregister.com/2015/09/23/googles_brotli_compression_opensourced (accessed on 7 August 2020).
- Alakuijala, J.; Kliuchnikov, E.; Szabadka, Z.; Vandevenne, L. Comparison Of Brotli, Deflate, Zopfli, LZMA, LZHAM And Bzip2 Compression Algorithms; Google, Inc.: Mountain View, CA, USA, 2015; p. 6. Available online: https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf (accessed on 17 September 2020).
- Larkin, H. Word indexing for mobile device data representations. In Proceedings of the 7th IEEE International Conference on Computer and Information Technology (CIT 2007), Aizu-Wakamatsu, Japan, 16–19 October 2007; pp. 399–404. [Google Scholar]
- Burrows, M.; Wheeler, D.J. A Block-Sorting Lossless Data Compression Algorithm; Systems Research Center: Palo Alto, CA, USA, 1994. [Google Scholar]
- En.wikipedia.org. Burrows–Wheeler Transform. 2020. Available online: https://en.wikipedia.org/wiki/Burrows_Wheeler_transform (accessed on 27 May 2020).
- El-Henawy, I.M.; Mohamed, E.R.; Lashin, N.A. A hybrid technique for data Compression. Int. J. Digit. Content Technol. Its Appl. 2015, 9, 11. [Google Scholar]
- Kaur, H.; Jindal, B. Lossless text data compression using modified Huffman Coding-A review. In Proceedings of the International Conference on Technologies for Sustainability-Engineering, Information Technology, Management and the Environment, Punjab, India, 25 November 2015; pp. 1017–1025. [Google Scholar]
- Todorov, V.T.; Kountchev, R.K.; Milanova, M.G.; Kountcheva, R.A.; Ford, C.W., Jr. University of Arkansas. Method and Apparatus for Lossless Run-Length Data Encoding. U.S. Patent 7,365,658, 29 April, 2008. [Google Scholar]
- Howard, P.G.; Vitter, J.S. New methods for lossless image compression using arithmetic coding. Inf. Process. Manag. 1992, 28, 765–779. [Google Scholar] [CrossRef]
- Awan, F.S.; Mukherjee, A. LIPT: A lossless text transform to improve compression. In Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, NV, USA, 2–4 April 2001; pp. 452–460. [Google Scholar]
- Manzini, G. The Burrows-Wheeler transform: Theory and practice. In International Symposium on Mathematical Foundations of Computer Science; Springer: Berlin/Heidelberg, Germany, 1999; pp. 34–47. [Google Scholar]
- Adjeroh, D.; Bell, T.; Mukherjee, A. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Corpus.canterbury.ac.nz. The Canterbury Corpus. 2020. Available online: http://corpus.canterbury.ac.nz/ (accessed on 30 May 2020).
- Saracevic, M.; Adamovic, S.; Bisevac, E. Applications of Catalan numbers and Lattice Path combinatorial problem in cryptography. Acta Polytech. Hung. 2018, 15, 91–110. [Google Scholar]
- Saracevic, M.; Adamovic, S.; Miskovic, V.; Macek, N.; Sarac, M. A novel approach to steganography based on the properties of Catalan numbers and Dyck words. In Future Generation Computer Systems; Elsevier: Amsterdam, The Netherlands, 2019; Volume 100, pp. 186–197. [Google Scholar]
Texts | PAQ8n | Deflate | Bzip2 | Gzip | LZMA | LZW | Brotli | Proposed |
---|---|---|---|---|---|---|---|---|
1 | 1.582 | 1.548 | 1.335 | 1.455 | 1.288 | 1.313 | 1.608 | 1.924 |
2 | 1.497 | 1.427 | 1.226 | 1.394 | 1.214 | 1.283 | 1.544 | 1.935 |
3 | 1.745 | 1.655 | 1.46 | 1.574 | 1.338 | 1.399 | 1.692 | 1.925 |
4 | 1.523 | 1.463 | 1.261 | 1.382 | 1.2 | 1.268 | 1.531 | 1.899 |
5 | 1.493 | 1.408 | 1.228 | 1.39 | 1.195 | 1.17 | 1.625 | 1.949 |
6 | 1.242 | 1.228 | 1.051 | 1.199 | 1.057 | 1.036 | 1.25 | 1.429 |
7 | 1.154 | 1.04 | 1.026 | 1.061 | 1 | 0.946 | 1.287 | 1.448 |
8 | 1.566 | 1.43 | 1.316 | 1.465 | 1.298 | 1.254 | 1.783 | 1.893 |
9 | 1.295 | 1.265 | 1.092 | 1.219 | 1.05 | 1.275 | 1.38 | 1.536 |
10 | 1.495 | 1.371 | 1.307 | 1.419 | 1.216 | 1.174 | 1.511 | 1.629 |
11 | 1.455 | 1.309 | 1.219 | 1.373 | 1.168 | 1.134 | 1.466 | 1.632 |
12 | 1.497 | 1.306 | 1.249 | 1.37 | 1.222 | 1.209 | 1.58 | 1.773 |
13 | 1.369 | 1.201 | 1.126 | 1.25 | 1.097 | 1.092 | 1.493 | 1.66 |
14 | 1.595 | 1.407 | 1.336 | 1.462 | 1.321 | 1.305 | 1.637 | 1.773 |
15 | 1.559 | 1.302 | 1.243 | 1.38 | 1.249 | 1.227 | 1.492 | 1.788 |
16 | 2.401 | 2.082 | 2.214 | 2.121 | 1.888 | 1.559 | 2.269 | 2.466 |
17 | 1.38 | 1.211 | 1.353 | 1.302 | 1.113 | 1.103 | 1.428 | 1.903 |
18 | 1.755 | 1.537 | 1.477 | 1.585 | 1.401 | 1.394 | 1.782 | 1.931 |
19 | 1.507 | 1.37 | 1.261 | 1.417 | 1.247 | 1.234 | 1.542 | 1.815 |
20 | 2.02 | 1.744 | 2.01 | 1.783 | 1.596 | 1.43 | 1.941 | 2.033 |
Average | 1.643 | 1.486 | 1.418 | 1.504 | 1.325 | 1.288 | 1.667 | 1.884 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rahman, M.A.; Hamada, M. Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding. Symmetry 2020, 12, 1654. https://doi.org/10.3390/sym12101654
Rahman MA, Hamada M. Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding. Symmetry. 2020; 12(10):1654. https://doi.org/10.3390/sym12101654
Chicago/Turabian StyleRahman, Md. Atiqur, and Mohamed Hamada. 2020. "Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding" Symmetry 12, no. 10: 1654. https://doi.org/10.3390/sym12101654