Multitask Learning Based Intra-Mode Decision Framework for Versatile Video Coding
Abstract
:1. Introduction
2. Intra-Prediction
2.1. New Intra-Coding Tools
2.2. IntraModeDecision
3. Related Works
Proposal | SL | ML/DL | Intra | Partitioning | Achieved Results | |
---|---|---|---|---|---|---|
BD-BR (%) | ||||||
[11] | × | × | 30.59 | 0.86 | ||
[6] | × | × | 12.00 | 0.40 | ||
[12] | × | × | 22.60–67.60 | 0.56–2.61 | ||
[13] | × | × | × | 46.00 | 0.91 | |
[7] | × | × | 7.00 | 0.09 | ||
[14] | × | × | 18.00–30.00 | 0.70 | ||
[15] | × | × | × | × | 54.91 | 0.93 |
[16] | × | × | × | × | 70.00 | 1.93 |
[17] | × | × | 46.60–69.80 | 0.86–2.57 |
4. Multitask Learning-Based Intra-Mode Decision Framework
4.1. Overall Presentation of the Proposed Framework
4.2. Dataset and Training Process
4.2.1. Training Dataset
4.2.2. Training Process
4.3. Experimental Setup
4.3.1. Performance of the MTL CNN
4.3.2. Complexity Reduction under VTM
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- cisco. Cisco Visual Networking Index: Forecast and Trends 2017–2022. 2018. Available online: https://cloud.report/whitepapers/ (accessed on 1 September 2020).
- JVET. AHG Report: Test Model Software Development (AHG3). Available online: https://jvet-experts.org/ (accessed on 1 March 2021).
- Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.R. Overview of the Versatile Video Coding (VVC) Standard and its Applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
- Zhang, K.; Chen, Y.W.; Zhang, L.; Chien, W.J.; Karczewicz, M. An improved framework of affine motion compensation in video coding. IEEE Trans. Image Process. 2018, 28, 1456–1469. [Google Scholar] [CrossRef]
- JVET. Algorithm description for Versatile Video Coding and Test Model 8. In Proceedings of the 17th Meeting ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Brussels, Belgium, 17 January 2020. [Google Scholar]
- Park, J.; Kim, B.; Jeon, B. Fast VVC intra prediction mode decision based on block shapes. In Proceedings of the Applications of Digital Image Processing XLIII; SPIE: Bellingham, WA, USA, 2020; Volume 11510, pp. 581–593. [Google Scholar]
- Liu, Z.; Dong, M.; Guan, X.H.; Zhang, M.; Wang, R. Fast ISP coding mode optimization algorithm based on CU texture complexity for VVC. EURASIP J. Image Video Process. 2021, 2021, 1–14. [Google Scholar] [CrossRef]
- Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Patel, D.; Lad, T.; Shah, D. Review on intra-prediction in high efficiency video coding (HEVC) standard. Int. J. Comput. Appl. 2015, 975, 12. [Google Scholar] [CrossRef]
- Sullivan, G.J.; Wiegand, T. Rate-distortion optimization for video compression. IEEE Signal Process. Mag. 1998, 15, 74–90. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Yu, L.; Wang, H.; Li, T.; Wang, S. A novel fast intra mode decision for versatile video coding. J. Vis. Commun. Image Represent. 2020, 71, 102849. [Google Scholar] [CrossRef]
- Li, Y.; Yang, G.; Song, Y.; Zhang, H.; Ding, X.; Zhang, D. Early Intra CU Size Decision for Versatile Video Coding Based on a Tunable Decision Model. IEEE Trans. Broadcast. 2021, 67, 710–720. [Google Scholar] [CrossRef]
- Cao, J.; Tang, N.; Wang, J.; Liang, F. Texture-Based Fast CU Size Decision and Intra Mode Decision Algorithm for VVC. In Proceedings of the International Conference on Multimedia Modeling, Daejeon, Republic of Korea, 5–8 January 2020; pp. 739–751. [Google Scholar]
- Ryu, S.; Kang, J. Machine learning-based fast angular prediction mode decision technique in video coding. IEEE Trans. Image Process. 2018, 27, 5525–5538. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Wang, Y.; Huang, L.; Jiang, B. Fast CU Partition and Intra Mode Decision Method for H. 266/VVC. IEEE Access 2020, 8, 117539–117550. [Google Scholar] [CrossRef]
- Yang, H.; Shen, L.; Dong, X.; Ding, Q.; An, P.; Jiang, G. Low complexity CTU partition structure decision and fast intra mode decision for versatile video coding. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1668–1682. [Google Scholar] [CrossRef]
- Tissier, A.; Hamidouche, W.; Mdalsi, S.B.D.; Vanne, J.; Galpin, F.; Menard, D. Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders. arXiv 2021, arXiv:2103.05319. [Google Scholar]
- Crawshaw, M. Multi-task learning with deep neural networks: A survey. arXiv 2020, arXiv:2009.09796. [Google Scholar]
- Zouidi, N.; Kessentini, A.; Belghith, F.; Masmoudi, N. Statistical analysis of the QTMT structure: Intra mode decision. In Proceedings of the 2020 IEEE 4th International Conference on Image Processing, Applications and Systems (IPAS), Genova, Italy, 9–11 December 2020; pp. 180–185. [Google Scholar]
- Saldanha, M.; Sanchez, G.; Marcon, C.; Agostini, L. Complexity Analysis Of VVC Intra Coding. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab, 25–28 October 2020; pp. 3119–3123. [Google Scholar]
- Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
- JVET. VVCSoftware_VTM 10.2 Reference Software. 2020. Available online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/ (accessed on 1 March 2021).
- Wieckowski, A.; Ma, J.; Schwarz, H.; Marpe, D.; Wiegand, T. Fast Partitioning Decision Strategies for The Upcoming Versatile Video Coding (VVC) Standard. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 4130–4134. [Google Scholar] [CrossRef]
- JVET. CE3: Intra Sub-Partitions Coding Mode. Available online: https://jvet-experts.org/ (accessed on 1 September 2020).
- JVET. CE3: Affine Linear Weighted Intra Prediction. Available online: https://jvet-experts.org/ (accessed on 1 September 2020).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 94–108. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- JVET. JVET common test conditions and software reference configurations for SDR videos. In Proceedings of the 14th Meeting: Joint Video Experts Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JVET-B1010, Geneva, Switzerland, 19–27 March 2019. [Google Scholar]
- Bjontegaard, G. Document VCEG-M33: Calculation of average PSNR differences between RD-curves. In Proceedings of the ITU-T VCEG Meeting, Austin, TX, USA, 2–4 April 2001. [Google Scholar]
- Hermann, T. Frugally Deep. 2018. Available online: https://github.com/Dobiasd/frugally-deep/ (accessed on 1 January 2022).
- Pfaff, J.; Filippov, A.; Liu, S.; Zhao, X.; Chen, J.; De-Luxán-Hernández, S.; Wiegand, T.; Rufitskiy, V.; Ramasubramonian, A.K.; Van der Auwera, G. Intra Prediction and Mode Coding in VVC. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3834–3847. [Google Scholar] [CrossRef]
- Saldanha, M.; Sanchez, G.; Marcon, C.; Agostini, L. Performance analysis of VVC intra coding. J. Vis. Commun. Image Represent. 2021, 79, 103202. [Google Scholar] [CrossRef]
Coding Tool | Default Threshold | Best Threshold | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | |
Regular | 0.72 | 0.97 | 0.82 | 0.71 | 0.99 | 0.83 |
MIP | 0.41 | 0.56 | 0.48 | 0.38 | 0.67 | 0.49 |
MRL | 0.64 | 0.00 | 0.01 | 0.22 | 0.39 | 0.28 |
DC | 0.19 | 0.39 | 0.26 | 0.19 | 0.40 | 0.26 |
planar | 0.35 | 0.61 | 0.45 | 0.33 | 0.73 | 0.45 |
Class | Video | Chen, Y. et al. [11], VTM2.0 | Park, J. et al. [6], VTM9.0 | Li, Y. et al. [7], VTM8.0 | Our Proposal, VTM10.2 | ||||
---|---|---|---|---|---|---|---|---|---|
(%) | BD-BR (%) | (%) | BD-BR (%) | (%) | BD-BR (%) | (%) | BD-BR (%) | ||
A1 | Campfire | 28.06 | 0.92 | 12.00 | 0.09 | - | - | 24.54 | 0.78 |
Tango2 | 23.39 | 0.93 | 11.00 | 0.09 | - | - | 23.13 | 0.98 | |
FoodMarket4 | 20.13 | 0.64 | 10.00 | 0.09 | - | - | 22.43 | 0.91 | |
Average | 23.86 | 0.83 | 11.00 | 0.09 | - | - | 23.37 | 0.89 | |
A2 | CatRobot1 | 26.89 | 0.94 | 12.00 | 0.30 | - | - | 23.43 | 1.13 |
DaylighRoad2 | 32.99 | 0.98 | 11.00 | 0.49 | - | - | 24.64 | 1.59 | |
ParkRunning3 | 20.32 | 0.67 | 9.00 | 0.07 | - | - | 20.63 | 0.59 | |
Average | 26.73 | 0.86 | 10.67 | 0.29 | - | - | 22.89 | 1.10 | |
B | MarketPlace | - | - | 12.00 | 0.13 | - | - | 25.99 | 1.02 |
RitualDance | - | - | 12.00 | 0.32 | - | - | 22.98 | 1.25 | |
Cactus | 29.47 | 0.54 | 13.00 | 0.49 | 6.00 | 0.14 | 27.11 | 1.36 | |
BasketBallDrive | 34.69 | 0.51 | 11.00 | 0.64 | 9.00 | 0.24 | 24.70 | 1.82 | |
BQTerrace | 37.17 | 0.44 | 12.00 | 0.48 | 4.00 | 0.01 | 26.94 | 0.49 | |
Average | 33.78 | 0.50 | 12.00 | 0.41 | 6.33 | 0.13 | 25.54 | 1.19 | |
C | RaceHorses | 43.69 | 0.56 | 12.00 | 0.37 | 6.00 | 0.07 | 29.87 | 1.03 |
BasketBallDrill | 41.28 | 0.36 | 16.00 | 1.02 | 11.00 | 0.30 | 28.12 | 1.52 | |
BQMall | 27.64 | 0.61 | 12.00 | 0.88 | 6.00 | 0.10 | 26.13 | 1.74 | |
PartyScene | 43.69 | 0.56 | 13.00 | 0.49 | 4.00 | 0.01 | 27.97 | 1.24 | |
Average | 39.34 | 0.50 | 13.25 | 0.69 | 6.75 | 0.48 | 28.02 | 1.38 | |
D | RaceHorses | 28.05 | 0.73 | 14.00 | 0.39 | 5.00 | 0.12 | 28.74 | 2.04 |
BQSquare | 30.08 | 0.61 | 12.00 | 0.57 | 8.00 | 0.18 | 27.82 | 1.45 | |
BlowingBubbles | 29.09 | 0.70 | 14.00 | 0.65 | 6.00 | 0.00 | 26.53 | 1.56 | |
BasketBallPass | 26.49 | 0.49 | 12.00 | 0.66 | 8.00 | 0.04 | 22.96 | 1.42 | |
Average | 28.43 | 0.90 | 13.00 | 0.57 | 6.75 | 0.085 | 26.51 | 1.62 | |
E | FourPeople | 26.32 | 0.66 | - | - | 7.00 | 0.17 | 23.63 | 1.73 |
Johny | 25.85 | 0.59 | - | - | 8.00 | 0.22 | 22.95 | 1.72 | |
KristenAndSara | 26.77 | 0.59 | - | - | 8.00 | 0.10 | 23.50 | 1.95 | |
Average | 26.31 | 0.61 | - | - | 7.67 | 0.16 | 23.36 | 1.80 | |
Average | 30.10 | 0.65 | 12.10 | 0.43 | 6.86 | 0.12 | 25.21 | 1.33 | |
F | ArenaOfValor | - | - | - | - | - | - | 24.09 | 1.61 |
BasketBallDrillText | 24.96 | 0.44 | - | - | - | - | 24.24 | 1.66 | |
SlideEditting | 32.33 | 0.84 | - | - | - | - | 17.67 | 1.89 | |
SlideShow | 32.50 | 0.66 | - | - | - | - | 20.78 | 1.92 | |
Average | 29.93 | 0.65 | - | - | - | 21.69 | 1.77 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zouidi, N.; Kessentini, A.; Hamidouche, W.; Masmoudi, N.; Menard, D. Multitask Learning Based Intra-Mode Decision Framework for Versatile Video Coding. Electronics 2022, 11, 4001. https://doi.org/10.3390/electronics11234001
Zouidi N, Kessentini A, Hamidouche W, Masmoudi N, Menard D. Multitask Learning Based Intra-Mode Decision Framework for Versatile Video Coding. Electronics. 2022; 11(23):4001. https://doi.org/10.3390/electronics11234001
Chicago/Turabian StyleZouidi, Naima, Amina Kessentini, Wassim Hamidouche, Nouri Masmoudi, and Daniel Menard. 2022. "Multitask Learning Based Intra-Mode Decision Framework for Versatile Video Coding" Electronics 11, no. 23: 4001. https://doi.org/10.3390/electronics11234001
APA StyleZouidi, N., Kessentini, A., Hamidouche, W., Masmoudi, N., & Menard, D. (2022). Multitask Learning Based Intra-Mode Decision Framework for Versatile Video Coding. Electronics, 11(23), 4001. https://doi.org/10.3390/electronics11234001