A Comparison of Machine Learning-Based and Conventional Technologies for Video Compression
Abstract
:1. Introduction
- This research applies classical methods and machine learning algorithms for video compression with minimal quality degradation;
- Six methods are compared: H.265, VP9, AV1, CNN, RNN, and DAE based on several metrics, including compression ratio, computational complexity, visual quality, and subjective user experience;
- Through extensive testing, we analyze the trade-offs between the performance indicators for each algorithm, as well as their suitability for different cases.
2. Related Work
3. Materials and Methods
3.1. Video Compression
3.2. Popular Codecs for Video Encoding
3.3. Machine Learning Algorithms for Video Compression
- CNN
- RNN
- DAE
3.4. Evaluation Metrics
4. Results
4.1. Search and Description of the Dataset
4.2. Implementation of Algorithms
- constant QP = 20;
- fixed GOP size of 14 frames without B-frames.
- FFmpeg: A multimedia framework that allowed us to decode and encode video files using the VP9 codec.
- PyAV: A Python wrapper for Ffmpeg that provides a user-friendly interface for manipulating video streams.
- NumPy: for efficient array operations and calculations.
- OpenCV: for reading and writing video files, as well as some image processing tasks.
- PSNR: Peak signal-to-noise ratio, which measures the quality of compressed video compared to the original.
- SSIM: Structural similarity index, which measures the similarity between the compressed video and the original.
- Bitrate: The average number of bits used to represent each frame in a compressed video.
- Convolutional Layers (extract features from the input video clip using 3 × 3 filters).
- Max Pooling Layers (reduce the dimensionality of the extracted features).
- Fully Connected Layers (1024 neurons) (further process the features and compress the input data to a lower dimensionality).
- Fully Connected Layer (encoded representation) (represents the compressed form of the input video clip.
- Fully Connected Layers (1024 neurons) (process the encoded representation.
- Fully Connected Layer (1024 neurons) (further process the features).
- Fully Connected Layer (output layer) (reconstruct the original video clip).
- Mirrors encoder architecture (this implies that the decoder essentially reverses the operations performed by the encoder to reconstruct the original video clip).
- The sequence length is 20 frames for each fixed video segment.
- The number of layers is equal to 3 layers of RNN coding.
- Each layer has 100 hidden blocks.
- Segmentation of the input signal (each video is divided into segments of fixed length).
- Frame sequence encoding (the RNN encoder processes each segment as a frame sequence).
- Feature extraction (layers of the RNN encoder encode the frame sequences into a fixed-length vector representation).
- Compression (a fully connected layer compresses the encoded representations to a smaller size).
- Decoding (RNN decoder layers receive the encoded images).
- Sequence recovery (RNN decoder layers recover the compressed video sequence).
- Output generation (the output layer generates the reconstructed video).
- Input Processing (the input video undergoes pre-processing steps such as resizing, cropping, and normalization).
- Feature Extraction (encoder layers extract features from the pre-processed input video).
- Dimension Reduction (encoder layers progressively reduce the dimensionality of the extracted features).
- Compression (the fully connected layer compresses the encoded representations into a low-dimensional representation).
- Decoding (decoder layers receive the low-dimensional representations).
- Feature Expansion (decoder layers expand the low-dimensional representations back to higher dimensions).
- Reconstruction (decoder layers reconstruct the original input video).
- Output Generation (the output layer generates the reconstructed video).
- The number of encoder layers is two.
- The number of decoder layers is also equal to two.
5. Discussion of Results
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, W.G.; Yu, R.; Wang, X. Neural Network-Based Video Compression Artifact Reduction Using Temporal Correlation and Sparsity Prior Predictions. IEEE Access 2020, 8, 162479–162490. [Google Scholar] [CrossRef]
- Havrysh, B.; Tymchenko, O.; Izonin, I. Modification of the LSB Implementation Method of Digital Watermarks. In Advances in Artificial Systems for Logistics Engineering. ICAILE 2022; Lecture Notes on Data Engineering and Communications Technologies; Hu, Z., Zhang, Q., Petoukhov, S., He, M., Eds.; Springer: Cham, Switzerland, 2022; Volume 135, pp. 101–111. [Google Scholar] [CrossRef]
- Kovtun, V.; Izonin, I.; Gregus, M. Model of functioning of the centralized wireless information ecosystem focused on multimedia streaming. Egypt. Inform. J. 2022, 23, 89–96. [Google Scholar] [CrossRef]
- Coding of Moving Video: High Efficiency Video Coding (HEVC) ITU-T Recommendation H.265. Available online: https://handle.itu.int/11.1002/1000/14107 (accessed on 1 May 2023).
- Shilpa, B.; Budati, A.K.; Rao, L.K.; Goyal, S.B. Deep learning based optimised data transmission over 5G networks with Lagrangian encoder. Comput. Electr. Eng. 2022, 102, 108164. [Google Scholar] [CrossRef]
- Said, A. Machine learning for media compression: Challenges and opportunities. APSIPA Trans. Signal Inf. Process. 2018, 7, e8. [Google Scholar] [CrossRef]
- Bidwe, R.V.; Mishra, S.; Patil, S.; Shaw, K.; Vora, D.R.; Kotecha, K.; Zope, B. Deep Learning Approaches for Video Compression: A Bibliometric Analysis. Big Data Cogn. Comput. 2022, 6, 44. [Google Scholar] [CrossRef]
- Zhang, Y.; Kwong, S.; Wang, S. Machine learning based video coding optimizations: A survey. Inf. Sci. 2020, 506, 395–423. [Google Scholar] [CrossRef]
- Zhou, M.; Wei, X.; Kwong, S.; Jia, W.; Fang, B. Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC. IEEE Trans. Multimed. 2021, 23, 1106–1121. [Google Scholar] [CrossRef]
- Ji, K.D.; Hlavacs, H. Deep Learning Based Video Compression. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering; Springer International Publishing: Cham, Switzerland, 2022; pp. 127–141. [Google Scholar]
- Hoang, T.M.; Zhou, J. Recent trending on learning based video compression: A survey. Cogn. Robot. 2021, 1, 145–158. [Google Scholar] [CrossRef]
- Dong, C.; Deng, Y.; Loy, C.C.; Tang, X. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 576–584. [Google Scholar] [CrossRef]
- Shao, H.; Liu, B.; Li, Z.; Yan, C.; Sun, Y.; Wang, T. A High-Throughput Processor for GDN-Based Deep Learning Image Compres-sion. Electronics 2023, 12, 2289. [Google Scholar] [CrossRef]
- Joy, H.K.; Kounte, M.R.; Chandrasekhar, A.; Paul, M. Deep Learning Based Video Compression Techniques with Future Research Issues. Wirel. Pers. Commun. 2023, 131, 2599–2625. [Google Scholar] [CrossRef]
- Mochurad, L.; Dereviannyi, A.; Antoniv, U. Classification of X-ray Images of the Chest Using Convolutional Neural Networks. IDDM 2021 Informatics & Data-Driven Medicine. In Proceedings of the 4th International Conference on Informatics & Data-Driven Medicine, Valencia, Spain, 19–21 November 2021; pp. 269–282. [Google Scholar]
- Zhai, D.; Zhang, X.; Li, X.; Xing, X.; Zhou, Y.; Ma, C. Object detection methods on compressed domain videos: An overview, comparative analysis, and new directions. Measurement 2023, 207, 112371. [Google Scholar] [CrossRef]
- Khuhawar, F.Y.; Bari, I.; Ijaz, A.; Iqbal, A.; Gillani, F.; Hayat, M. Comparative analysis of lossy image compression algorithms. Pak. J. Sci. Res. 2023, 3, 136–147. [Google Scholar]
- Brown, A.J.; Baburin, A.S. System and Method for Digital Video Management. United. States patent US 7,859,571, 28 December 2010. [Google Scholar]
- Ameres, E.L.; Bankoski, J.; Grange, A.W.; Murphy, T.; Wilkins, P.G.; Xu, Y. Video Compression and Encoding Method. United. States Patent US 7,499,492, 3 March 2009. [Google Scholar]
- Wiseman, Y. Video Compression Prototype for Autonomous Vehicles. Smart Cities 2024, 7, 758–771. [Google Scholar] [CrossRef]
- Klink, J.; Uhl, T. Video Quality Assessment: Some Remarks on Selected Objective Metrics. In Proceedings of the International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, 17–19 September 2020; pp. 1–6. [Google Scholar]
- Grois, D.; Nguyen, T.; Marpe, D. Coding efficiency comparison of AV1/VP9, H.265/MPEG-HEVC, and H.264/MPEG-AVC encoders. In Proceedings of the 2016 Picture Coding Symposium (PCS), Nuremberg, Germany, 4–7 December 2016; pp. 1–5. [Google Scholar]
- Mukherjee, D.; Bankoski, J.; Grange, A.; Han, J.; Koleszar, J.; Wilkins, P.; Xu, Y.; Bultje, R. The latest open-source video codec VP9—An overview and preliminary results. In Proceedings of the 2013 Picture Coding Symposium (PCS), San Jose, CA, USA, 8–11 December 2013; pp. 390–393. [Google Scholar]
- Yasin, H.M.; Abdulazeez, A.M. Image Compression Based on Deep Learning: A Review. Asian J. Res. Comput. Sci. 2021, 8, 62–76. [Google Scholar] [CrossRef]
- Nandi, U. Fractal image compression with adaptive quadtree partitioning and non-linear affine map. Multimed. Tools Appl. 2020, 79, 26345–26368. [Google Scholar] [CrossRef]
- Mochurad, L. Canny Edge Detection Analysis Based on Parallel Algorithm, Constructed Complexity Scale and CUDA. Comput. Inform. 2022, 41, 957–980. [Google Scholar] [CrossRef]
- Bykov, M.M.; Kovtun, V.V.; Kobylyanska, I.M.; Wójcik, W.; Smailova, S. Improvement of the learning process of the automated speaker recognition system for critical use with HMM-DNN component. In Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments; SPIE: Bellingham, WA, USA, 2019. [Google Scholar] [CrossRef]
- Zhu, S.; Liu, C.; Xu, Z. High-Definition Video Compression System Based on Perception Guidance of Salient Information of a Convolutional Neural Network and HEVC Compression Domain. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1946–1959. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 2018, 156, 312–322. [Google Scholar] [CrossRef]
- Albahar, M. A Survey on Deep Learning and Its Impact on Agriculture: Challenges and Opportunities. Agriculture 2023, 13, 540. [Google Scholar] [CrossRef]
- Hu, Y.; Yang, W.; Xia, S.; Cheng, W.H.; Liu, J. Enhanced intra prediction with recurrent neural network in video coding. In Proceedings of the 2018 Data Compression Conference, Snowbird, UT, USA, 27–30 March 2018; p. 413. [Google Scholar]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
- Habibian, A.; Rozendaal, T.V.; Tomczak, J.M.; Cohen, T.S. Video Compression with Rate-Distortion Autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 7032–7041. [Google Scholar] [CrossRef]
- Toderici, G.; O’Malley, S.M.; Hwang, S.J.; Vincent, D.; Minnen, D.; Baluja, S.; Covell, M.; Sukthankar, R. Variable Rate Image Compression with Recurrent Neural Networks. arXiv 2015, arXiv:1511.06085. [Google Scholar]
- Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
- Setiadi, D.R.I.M. PSNR vs. SSIM: Imperceptibility quality assessment for image steganography. Multimed. Tools Appl. 2021, 80, 8423–8444. [Google Scholar] [CrossRef]
- YouTube. YOUTUBE UGC Dataset. 2021. Available online: https://media.withyoutube.com/ (accessed on 13 April 2024).
- Singhal, A. Introducing the Knowledge Graph: Things, Not Strings. 2012. Available online: https://blog.google/products/search/introducing-knowledge-graph-things-not/ (accessed on 13 April 2024).
- Winkler, S. Analysis of Public Image and Video Databases for Quality Assessment. IEEE J. Sel. Top. Signal Process. 2012, 6, 616–625. [Google Scholar] [CrossRef]
- Verma, A.; Pedrosa, L.; Korupolu, M.; Oppenheimer, D.; Tune, E.; Wilkes, J. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys 1‘5), Association for Computing Machinery, New York, NY, USA, 21–24 April 2015; Article number 18. pp. 1–17. [Google Scholar] [CrossRef]
Method Name | Average Value PSNR (dB) | Average Value SSIM |
---|---|---|
H.265 | 39.17 | 0.78 |
VP9 | 38.19 | 0.79 |
AV1 | 38.72 | 0.80 |
CNN | 40.00 | 0.81 |
RNN | 42.05 | 0.82 |
DAE | 41.15 | 0.82 |
Video Material | H.265 | VP9 | AV1 | CNN | RNN | DAE | |
---|---|---|---|---|---|---|---|
Number | Initial Size (Mb) | Video File Size after Compression (Mb) | |||||
1 | 50 | 13.64 | 13.32 | 13.34 | 12.31 | 9.80 | 10.71 |
2 | 87 | 21.44 | 21.12 | 21.14 | 20.11 | 15.91 | 18.42 |
3 | 120 | 27.84 | 27.52 | 26.54 | 26.51 | 20.91 | 24.32 |
4 | 75 | 18.14 | 17.82 | 17.10 | 16.92 | 13.82 | 15.51 |
5 | 56 | 14.04 | 13.72 | 13.04 | 12.71 | 10.13 | 11.61 |
6 | 94 | 22.94 | 22.62 | 21.64 | 21.62 | 17.12 | 19.70 |
Initial Size (Mb) | CNN | RNN | DAE |
---|---|---|---|
50 | 1.611 | 0.928 | 0.074 |
87 | 2.812 | 1.632 | 0.129 |
120 | 3.841 | 2.243 | 0.176 |
75 | 2.410 | 1.408 | 0.109 |
56 | 1.792 | 1.056 | 0.082 |
94 | 3.008 | 1.762 | 0.141 |
Method Name | Advantages of Using the Method | Disadvantages of Using the Method |
---|---|---|
H.265 | A significant improvement in compression efficiency over its predecessor, H.264. It is supported by many devices and software, making it a widely accepted standard. It supports different video resolutions and frame rates. | In some cases, it may not be as effective as newer codecs such as VP9 and AV1. |
VP9 | It is open source and does not require royalties, which makes it an attractive option for many companies. Supports high-resolution video and a high frame rate. | It requires more computational resources to encode and decode compared to some other codecs. It is not as widespread as some other codecs, such as H.265. |
AV1 | It provides significantly better compression efficiency compared to older codecs like H.264 and even newer codecs like H.265 and VP9. It is open source and does not require royalties, which makes it an attractive option for many companies. It supports different video resolutions and frame rates. | It requires more computational resources to encode and decode compared to some other codecs. It is not as common as some other codecs, such as H.265. |
CNN | High compression efficiency due to the ability to extract spatial features from video frames. Relatively high coding and decoding speed compared to other deep learning methods. It can work with different resolutions and frame rates. | It may require large computational resources for training and deployment. It is not as effective at detecting temporal dependencies in video as RNN. |
RNN | It can effectively capture temporal dependencies in the video, treating each frame as a sequence. It can process video of variable length. It allows you to achieve high compression efficiency with good visual quality. | It may require more time for training and computing resources than CNN. It can be responsive to the length of the input sequence and the choice of hyperparameters. |
DAE | It allows you to achieve high compression efficiency with good visual quality. It can process video of variable length. It can be faster and more computationally efficient than other deep learning methods. | It may require more training data and computing resources compared to other methods. It may not be as effective in detecting temporal dependencies in video compared to RNN. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mochurad, L. A Comparison of Machine Learning-Based and Conventional Technologies for Video Compression. Technologies 2024, 12, 52. https://doi.org/10.3390/technologies12040052
Mochurad L. A Comparison of Machine Learning-Based and Conventional Technologies for Video Compression. Technologies. 2024; 12(4):52. https://doi.org/10.3390/technologies12040052
Chicago/Turabian StyleMochurad, Lesia. 2024. "A Comparison of Machine Learning-Based and Conventional Technologies for Video Compression" Technologies 12, no. 4: 52. https://doi.org/10.3390/technologies12040052