Learning Adaptive Quantization Parameter for Consistent Quality Oriented Video Coding
Abstract
:1. Introduction
2. Background Works
2.1. RDO Modeling
2.2. Perceptual RDO for Video Quality Consistency
3. Proposed Method
3.1. Overall Coding Framework
3.2. VMAF-Based RDO Modeling
3.3. CNN Model for QP Map Prediction
3.3.1. Dataset Collecting and Labeling
3.3.2. Training CNN Model
- Preprocessing layers: The pixels of input MB 16 × 16 are preprocessed by converting into grayscale and then normalized to values between 0 and 1.
- Convolutional layers: The output of the preprocessing layers is convolutionalized by kernels 4 × 4 at the first convolutional layer and kernels 2 × 2 to extract higher-level features. In addition, the batch normalization layer is used to normalize the feature map to stabilize the learning process and reduce the number of epochs. After the convolutional layers, the pooling layer is added to reduce the size of each feature map. Moreover, the dropout layer is used to drop features randomly with probabilities 20%.
- Fully connected layers: The feature maps at the output of the convolutional layers are concatenated and then flattened into a column vector. Then, the column vectors are fed to three fully connected layers that compile the features extracted to form the final output as QP value. Because the target VMAF score is a requirement for the output reconstructed video, a target VMAF score is supplemented as an external feature in the feature vectors for fully connected layers.
4. Performance Evaluation
4.1. Test Methodology
4.2. RD Performance Evaluation and Discussion
4.3. Quality Level Expectation Assessment
4.4. Quality Consistency Evaluation
4.5. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Brunnström, K.; Beker, S.A.; de Moor, K.; Dooms, A.; Egger, S.; Garcia, M.-N.; Hossfeld, T.; Jumisko-Pyykkö, S.; Keimel, C.; Larabi, M.-C.; et al. Qualinet White Paper on Definitions of Quality of Experience. 2013. hal-00977812. Available online: https://hal.science/hal-00977812/document (accessed on 10 August 2023).
- Hoßfeld, T.; Seufert, M.; Sieber, C.; Zinner, T. Assessing effect sizes of influence factors towards a QoE model for HTTP adaptive streaming. In Proceedings of the 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX), Singapore, 18–20 September 2014; pp. 111–116. [Google Scholar] [CrossRef]
- Chen, X.; Hwang, J.N.; Meng, D.; Lee, K.H.; Queiroz, R.L.D.; Yeh, F.M. A quality-of-content-based joint source and channel coding for human detections in a mobile surveillance cloud. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 19–31. [Google Scholar] [CrossRef]
- Milani, S.; Bernardini, R.; Rinaldo, R. A saliency-based rate control for people detection in video. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 2016–2020. [Google Scholar] [CrossRef]
- He, Z.; Zeng, W.; Chen, C.W. Low-pass filtering of rate-distortion functions for quality smoothing in real-time video communication. IEEE Trans. Circuits Syst. Video Technol. 2005, 15, 973–981. [Google Scholar] [CrossRef]
- Xie, B.; Zeng, W. A sequence-based rate control framework for consistent quality real-time video. IEEE Trans. Circuits Syst. Video Technol. 2006, 16, 56–71. [Google Scholar] [CrossRef]
- Xu, L.; Li, S.; Ngan, K.N.; Ma, L. Consistent visual quality control in video coding. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 975–989. [Google Scholar] [CrossRef]
- Trieu Duong, D.; Phi Cong, H.; Hoang Van, X. A Novel Consistent Quality Driven for JEM Based Distributed Video Coding. Algorithms 2019, 12, 130. [Google Scholar] [CrossRef]
- Cai, Q.; Chen, Z.; Wu, D.O.; Huang, B. Real-time constant objective quality video coding strategy in high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 2215–2228. [Google Scholar] [CrossRef]
- Seo, C.W.; Moon, J.H.; Han, J.K. Rate control for consistent objective quality in high efficiency video coding. IEEE Trans. Image Process. 2013, 22, 2442–2454. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.-Y.; Su, P.-C. A Content-Adaptive Distortion–Quantization Model for H.264/AVC and its Applications. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 113–126. [Google Scholar] [CrossRef]
- Vito, F.D.; Martin, J.C.D. PSNR control for GOP-level constant quality in H.264 video coding. In Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece, 18–21 December 2005; pp. 612–617. [Google Scholar] [CrossRef]
- Li, Z.; Aaron, A.; Katsavounidis, A.; Moorthy, I.; Manohara, M. Toward a Practical Perceptual Video Quality Metric. Netflix Blog. 2016. Available online: http://techblog.netflix.com/2016/06/toward-practical-perceptual-video.html (accessed on 11 August 2023).
- Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Zhang, F.; Ma, L.; Ngan, K.N. Image quality assessment by separately evaluating detail losses and additive impairments. IEEE Trans. Multimed. 2011, 13, 935–949. [Google Scholar] [CrossRef]
- Rassool, R. VMAF reproducibility: Validating a perceptual practical video quality metric. In Proceedings of the 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Cagliari, Italy, 7–9 June 2017; pp. 1–2. [Google Scholar] [CrossRef]
- Lee, C.; Woo, S.; Baek, S.; Han, J.; Chae, J.; Rim, J. Comparison of objective quality models for adaptive bit-streaming services. In Proceedings of the 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, Cyprus, 27–30 August 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Barman, N.; Schmidt, S.; Zadtootaghaj, S.; Martini, M.G.; Möller, S. An evaluation of video ality assessment metrics for passive gaming video streaming. In Proceedings of the 23rd Packet Video Workshop, Amsterdam, The Netherlands, 12–15 June 2018; pp. 7–12. [Google Scholar] [CrossRef]
- Deng, S.; Han, J.; Xu, Y. VMAF Based Rate-Distortion Optimization for Video Coding. In Proceedings of the 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 21–24 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Luo, Z.; Zhu, C.; Huang, Y.; Xie, R.; Song, L.; Kuo, C.-C.J. VMAF Oriented Perceptual Coding Based on Piecewise Metric Coupling. IEEE Trans. Image Process. 2021, 30, 5109–5121. [Google Scholar] [CrossRef] [PubMed]
- Marzuki, I.; Sim, D. Perceptual adaptive quantization parameter selection using deep convolutional features for HEVC encoder. IEEE Access 2020, 8, 37052–37065. [Google Scholar] [CrossRef]
- Alam, M.M.; Nguyen, T.D.; Hagan, M.T.; Chandler, D.M. A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. Appl. Digit. Image Process. 2015, 9599, 959918. [Google Scholar] [CrossRef]
- Vu, T.H.; Cong, H.P.; Sisouvong, T.; HoangVan, X.; NguyenQuang, S.; DoNgoc, M. VMAF based quantization parameter prediction model for low resolution video coding. In Proceedings of the 2022 International Conference on Advanced Technologies for Communications (ATC), Ha Noi, Vietnam, 20–22 October 2022; pp. 364–368. [Google Scholar] [CrossRef]
- Wiegand, T.; Sullivan, G.; Luthra, A.; Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264ISO/IEC 14 496-10 AVC). 2003, pp. 7–14. Available online: http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT-G050.pdf (accessed on 11 August 2023).
- Sullivan, G.J.; Wiegand, T. Rate-distortion optimization for: Video compression. IEEE Signal Process. Mag. 1998, 15, 74–90. [Google Scholar] [CrossRef]
- Yang, C.-L.; Leung, R.-K.; Po, L.-M.; Mai, Z.-Y. An SSIM-optimal H.264/AVC inter frame encoder. In Proceedings of the 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, 20–22 November 2009; pp. 291–295. [Google Scholar] [CrossRef]
- Wang, X.; Su, L.; Huang, Q.; Liu, C. Visual perception based Lagrangian rate distortion optimization for video coding. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 1653–1656. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Tong, X.; Zhu, C.; Xie, R.; Xiong, J.; Song, L. A VMAF Directed Perceptual Rate Distortion Optimization for Video Coding. In Proceedings of the 2020 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Paris, France, 27–29 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Zhu, C.; Huang, Y.; Xie, R.; Song, L. HEVC VMAF-oriented Perceptual Rate Distortion Optimization using CNN. In Proceedings of the 2021 Picture Coding Symposium (PCS), Bristol, UK, 29 June–2 July 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Xiph.org. Xiph.org Video Test Media. 2017. Available online: https://media.xiph.org/video/derf/ (accessed on 10 September 2023).
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- Bjontegaard, G. Calculation of Average PSNR Differences between RD-Curves. 2001. Available online: https://api.semanticscholar.org/CorpusID:61598325 (accessed on 10 September 2023).
- x265 Documentation. Available online: https://x265.readthedocs.io/en/master/ (accessed on 10 September 2023).
- ISO/IEC 23090-3; Versatile Video Coding. ISO: Geneva, Switzerland, 2020.
Video Sequences | R-Squared of | R-Squared of |
---|---|---|
Hall | 0.95 | 0.93 |
City | 0.96 | 0.90 |
Foreman | 0.90 | 0.90 |
Crew | 0.84 | 0.94 |
Four-people | 0.92 | 0.94 |
Ice | 0.94 | 0.91 |
Kris | 0.89 | 0.91 |
Mobile | 0.99 | 0.88 |
Soccer | 0.91 | 0.97 |
Waterfall | 0.98 | 0.83 |
Average | 0.93 | 0.91 |
Video | ||||||||
---|---|---|---|---|---|---|---|---|
a1 | b1 | c1 | d1 | a2 | b2 | c2 | d2 | |
Hall | −0.78 | 102.06 | −4632.9 | 74,387 | 0.0001 | −0.008 | 0.2214 | −0.9966 |
City | −0.51 | 94.62 | −5625.4 | 108,880 | 0.0006 | −0.054 | 1.595 | −14.522 |
Foreman | −0.87 | 118.40 | −5492.3 | 87,700 | 0.0002 | −0.025 | 1.193 | −2.5509 |
Crew | −3.29 | 416.98 | −17,942 | 26,502 | 0.0002 | −0.02 | 1.179 | −4.4885 |
Four-people | −0.86 | 175.53 | −6968 | 440,751 | 0.0002 | −0.014 | 1.106 | −2.7634 |
Ice | −3.53 | 308.97 | −5563 | 185,914 | 0.0002 | −0.014 | 0.277 | −2.4193 |
Kris | −2.81 | 187.33 | −2901.1 | 342,303 | 0.0001 | −0.014 | 0.344 | −2.2143 |
Mobile | −0.60 | 102.02 | −6010.9 | 141,990 | 0.0003 | −0.03 | 0.6236 | −8.5126 |
Soccer | −0.81 | 206.00 | −14,340 | 30,425 | 0.0003 | −0.034 | 0.6596 | −5.1124 |
Waterfall | −0.84 | 140.77 | −5980.7 | 140,316 | 0.0004 | −0.036 | 1.0176 | −8.4162 |
Average | −1.49 | 185.3 | −7716.02 | 157,916.8 | 0.0003 | −0.03 | 0.8216 | −5.1996 |
Layer (Type) | Output Size | Number of Parameters | Activation Function |
---|---|---|---|
Convolution 1 | 16 × 16 × 32 | 544 | Relu |
Batch Normalization 1 | 16 × 16 × 32 | 128 | |
Convolution 2 | 16 × 16 × 32 | 16,416 | Relu |
Batch Normalization 2 | 16 × 16 × 32 | 128 | |
Max Pooling 1 | 8 × 8 × 32 | 0 | |
Convolution 3 | 8 × 8 × 64 | 8256 | Relu |
Batch Normalization 3 | 8 × 8 × 64 | 256 | |
Convolution 4 | 8 × 8 × 64 | 16,448 | Relu |
Batch Normalization 4 | 8 × 8 × 64 | 256 | |
Max Pooling 2 | 4 × 4 × 64 | 0 | |
Convolution 5 | 4 × 4 × 128 | 32,896 | Relu |
Batch Normalization 5 | 4 × 4 × 128 | 512 | |
Convolution 6 | 4 × 4 × 128 | 65,664 | Relu |
Batch Normalization 6 | 4 × 4 × 128 | 512 | |
Max Pooling 3 | 2 × 2 × 128 | 0 | |
Fully connected 1 | 1024 | 526,336 | Relu |
Fully connected 2 | 512 | 524,800 | Relu |
Fully connected 3 | 192 | 98,496 | Relu |
Linear 4 | 1 | 193 | Linear |
Total | 1,291,841 |
Video-Sequence | crf | x.264 codec | CADQ | Our LAQP | LAQP vs. x.264 | LAQP vs. CADQ | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
BR_ x.264 | VMAF_ x.264 | BR_ CADQ | VMAF_ CADQ | BR_ LAQP | VMAF_ LAQP | BD- Rate | BD- VMAF | BD- Rate | BD- VMAF | ||
Coastguard 352 × 288 | 29 | 339.26 | 96 | 469.73 | 92 | 524.46 | 100 | −2.15 | 0.53 | −16.21 | 3.55 |
32 | 241.02 | 84 | 280.23 | 85 | 275.41 | 90 | |||||
35 | 110.28 | 73 | 153.42 | 76 | 130.91 | 75 | |||||
37 | 75.22 | 65 | 102.87 | 69 | 80.09 | 64 | |||||
Container 352 × 288 | 29 | 99.51 | 100 | 142.14 | 100 | 98.56 | 100 | 1.28 | 0.72 | −27.65 | 4.11 |
32 | 63.21 | 99 | 81.15 | 96 | 63.6 | 99 | |||||
35 | 43.56 | 93 | 50.5 | 92 | 44.85 | 95 | |||||
37 | 34.9 | 87 | 39.19 | 87 | 35.52 | 89 | |||||
Silent 352 × 288 | 29 | 131.84 | 100 | 143.89 | 100 | 107.64 | 100 | −4.08 | 4.28 | −1.60 | 2.20 |
32 | 91.84 | 98 | 94.58 | 98 | 85.24 | 100 | |||||
35 | 63.91 | 88 | 60.5 | 91 | 61.45 | 93 | |||||
37 | 50.22 | 80 | 46.85 | 84 | 45.23 | 85 | |||||
Tempete 352 × 288 | 29 | 283.87 | 98 | 382.01 | 98 | 306.59 | 100 | −4.74 | 0.89 | −7.88 | 1.57 |
32 | 187.94 | 91 | 217.23 | 92 | 217.2 | 95 | |||||
35 | 126.62 | 80 | 123.33 | 80 | 118.57 | 79 | |||||
37 | 98.92 | 72 | 88.69 | 72 | 83.34 | 71 | |||||
Crew 1280 × 720 | 29 | 502.26 | 97 | 518.7 | 98 | 537.59 | 97 | −4.64 | 1.44 | −3.37 | 1.09 |
32 | 348.33 | 88 | 313.61 | 95 | 376.43 | 91 | |||||
35 | 245.98 | 77 | 254.21 | 81 | 249.18 | 80 | |||||
37 | 194.04 | 67 | 195.07 | 71 | 185.69 | 69 | |||||
Vidyo3 1280 × 720 | 29 | 512.69 | 100 | 499.98 | 100 | 495.39 | 100 | −5.82 | 1.65 | −3.51 | 0.41 |
32 | 362.5 | 97 | 398.7 | 99 | 372.61 | 98 | |||||
35 | 255.7 | 88 | 253.45 | 90 | 241.93 | 90 | |||||
37 | 201.24 | 80 | 204.13 | 80 | 205.69 | 80 | |||||
Average | −3.36 | 1.59 | −10.03 | 2.16 |
Video Sequence | crf | x.264 | CADQ | LAQP |
---|---|---|---|---|
Coastguard | 29 | 5.57 | 6.68 | 0.59 |
32 | 4.98 | 6.18 | 4.83 | |
35 | 5.41 | 5.34 | 5.10 | |
37 | 4.24 | 5.81 | 4.51 | |
Container | 29 | 0.08 | 0.15 | 0.11 |
32 | 2.83 | 0.47 | 1.11 | |
35 | 4.49 | 0.88 | 1.41 | |
37 | 5.05 | 0.99 | 0.89 | |
Silent | 29 | 0.11 | 0.74 | 0.00 |
32 | 4.94 | 1.09 | 0.00 | |
35 | 12.94 | 1.59 | 1.34 | |
37 | 13.10 | 2.30 | 2.12 | |
Tempete | 29 | 2.94 | 1.58 | 0.3 |
32 | 6.93 | 3.57 | 1.59 | |
35 | 5.84 | 4.71 | 3.67 | |
37 | 6.22 | 5.43 | 3.09 | |
Crew | 29 | 10.15 | 15.67 | 10.60 |
32 | 20.53 | 18.73 | 12.04 | |
35 | 26.30 | 23.73 | 15.32 | |
37 | 37.39 | 26.90 | 18.89 | |
Vidyo3 | 29 | 1.24 | 1.53 | 1.19 |
32 | 2.99 | 2.92 | 2.44 | |
35 | 7.23 | 5.21 | 2.49 | |
37 | 5.12 | 3.56 | 2.76 | |
Average | 8.19 | 6.07 | 4.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vu, T.H.; Do, M.N.; Nguyen, S.Q.; PhiCong, H.; Sisouvong, T.; HoangVan, X. Learning Adaptive Quantization Parameter for Consistent Quality Oriented Video Coding. Electronics 2023, 12, 4905. https://doi.org/10.3390/electronics12244905
Vu TH, Do MN, Nguyen SQ, PhiCong H, Sisouvong T, HoangVan X. Learning Adaptive Quantization Parameter for Consistent Quality Oriented Video Coding. Electronics. 2023; 12(24):4905. https://doi.org/10.3390/electronics12244905
Chicago/Turabian StyleVu, Tien Huu, Minh Ngoc Do, Sang Quang Nguyen, Huy PhiCong, Thipphaphone Sisouvong, and Xiem HoangVan. 2023. "Learning Adaptive Quantization Parameter for Consistent Quality Oriented Video Coding" Electronics 12, no. 24: 4905. https://doi.org/10.3390/electronics12244905
APA StyleVu, T. H., Do, M. N., Nguyen, S. Q., PhiCong, H., Sisouvong, T., & HoangVan, X. (2023). Learning Adaptive Quantization Parameter for Consistent Quality Oriented Video Coding. Electronics, 12(24), 4905. https://doi.org/10.3390/electronics12244905