Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Extreme Learning Machine-Enabled Coding Unit Partitioning Algorithm for Versatile Video Coding

Information 2023, 14(9), 494; https://doi.org/10.3390/info14090494

by Xiantao Jiang^1,*

, Mo Xiang¹, Jiayuan Jin¹ and Tian Song²

Reviewer 1:

Kai Zeng

Reviewer 2:

Ionut Schiopu

Reviewer 3: Anonymous

Information 2023, 14(9), 494; https://doi.org/10.3390/info14090494

Submission received: 7 July 2023 / Revised: 31 August 2023 / Accepted: 1 September 2023 / Published: 7 September 2023

Round 1

Reviewer 1 Report

This paper introduced a VVC encoding speed up method by using extreme learning machine (ELM) for mode selection acceleration. The proposed idea is interesting and fully supported by the reported experimental results. I just have a few questions as below to clarify some details.

- In Section 3, the ELM classifier was trained on the first frame of each GOP. Is that mean there is a training session required for encoding each GOP? What's the GOP size used in the experiment? What's the time required for the training? I believe this may has big impact to the encoding time saving. The standard testing sequences are usually 10s long, and there is usually just 1 GOP in the RA and LD encoding. In reality, what would be the encoding time saving if the testing sequence is 10mins and the GOP size is 4s?

- Does the ELM based partition apply to all the levels of CTU partitioning?

- In Fig. 1, the TTV and TTH CU partitioning illustration is wrong. It should be three even partitions. Please correct.

N/A

Author Response

REVIEWER 1

General Comments: This paper introduced a VVC encoding speed up method by using extreme learning machine (ELM) for mode selection acceleration. The proposed idea is interesting and fully supported by the reported experimental results. I just have a few questions as below to clarify some details.

Response: Thank you very much for giving us many useful comments and suggestions. In the revised version, we have tried our best to address all the issues, and all the text modifications/additions appear in red. Our responses to the comments are given below.

Comment #1: In Section 3, the ELM classifier was trained on the first frame of each GOP. Is that mean there is a training session required for encoding each GOP? What's the GOP size used in the experiment? What's the time required for the training? I believe this may has big impact to the encoding time saving. The standard testing sequences are usually 10s long, and there is usually just 1 GOP in the RA and LD encoding. In reality, what would be the encoding time saving if the testing sequence is 10mins and the GOP size is 4s?

Response: Thank you for your insightful questions and comments regarding the training process and its implications on the encoding time savings. In Section 3 of the manuscript, we indeed trained the ELM classifier on the first frame of each GOP (Group of Pictures). This training approach enables us to capture the initial characteristics of each GOP and inform the subsequent partitioning decisions.

Regarding your concerns, we understand the potential impact of the training session on the overall encoding time. To clarify, the training session is required once for each GOP, which does introduce an additional computational overhead. However, this upfront investment in training can lead to more efficient encoding decisions throughout the GOP, potentially saving time during the overall encoding process.

The GOP size used in our experiments was 32. Moreover, the time required for the training is little. The reason is that: (1) ELM is a single-layer feedforward neural network that trains rapidly. Compared to traditional iterative training algorithms, ELM requires a single weight initialization and output layer weight computation, resulting in faster training times. (2) ELM has low memory requirements, as it only needs to store weights and biases between the input and hidden layers. This makes ELM suitable for large-scale datasets and resource-constrained applications. Therefore, the impact of the time required for the training on experimental results is small.

In response to your example scenario of a 10-minute testing sequence with a GOP size of 4 seconds, we acknowledge that real-world encoding scenarios can vary significantly from the standard testing conditions. We will include a discussion on the potential encoding time savings for longer sequences and different GOP sizes.

As you suggested, the details about the impact of the time required for the training on experimental results have been described in the manuscript on paragraph 1 of Section 3.2.

Comment #2: Does the ELM based partition apply to all the levels of CTU partitioning?

Response: Thank you for your inquiry. Yes, the ELM (Extreme Learning Machine) based partitioning approach is implemented across all levels of CTU (Coding Tree Unit) partitioning. This method optimizes partition size determination within the coding tree, ensuring consistent application throughout various partitioning levels. We have addressed this clarification in the manuscript on paragraph 7 of Section 3.2, underlining the comprehensive usage of ELM-based partitioning across all CTU partitioning levels.

Comment #3: In Fig. 1, the TTV and TTH CU partitioning illustration is wrong. It should be three even partitions. Please correct.

Response: Thank you for bringing this to our attention. We apologize for the error in Fig. 1 regarding the TTV and TTH CU partitioning illustration. We will make the necessary correction to depict three even partitions as you've pointed out. Your feedback is valuable in ensuring the accuracy of our illustrations. The Fig. 1 has been corrected in the revised manuscript.

Reviewer 2 Report

Line 19-26. Please check the English language and rephrase.
The literature review starts quite abruptly. There is no overview presentation/clarification, just a basic revision of articles. It’s a very “brute” presentation. Very hard to follow or understand.
Please extend the literature review as there are quite many recent works that were ignored.
Line 52: “[12,13],. (1)”,
Please remove the second main contribution as it’s a direct result of the usage of the first contribution. Reporting results it’s not a novel contribution to the research world. Moreover, please provide more details of the first contribution. What exactly is the novelty? What is novel in this ELM model? Wasn’t already introduced in literature? Then what exactly is the novelty of the proposed method? It’s usage?
Please check each equation and add the necessary punctuation after each one.
In Table I, please explain why are the class A sequences not tested in this manuscript.
Please discuss the BDBR variation as there are quite large losses for some sequences.
Table 3, please compare with other more recent state-of-the-art methods, as there are quite many other articles published on this topic.
Figure 4, not that much information is reported with the 4 plots. A simple table would have helped much more.
What about some visual comparison? What exactly was predicted correctly and what was predicted wrong?

In general, the manuscript does not present too much novelty as it’s simply employing a well-known tool. Moreover, it’s lacking a comparison with more recent works.

Please check the English language as some sentences are quite strangely formulated.

Author Response

REVIEWER 2
Comment #1: Line 19-26. Please check the English language and rephrase.
Response: Certainly, here's the revised version of the provided text with improved English language and clarity:
"The new-generation Video Coding standard, VVC (Versatile Video Coding)/H.266, was formally established in July 2020. This standard introduces a range of novel coding tools aimed at enhancing coding performance while maintaining consistent reconstruction quality. The outcome of these improvements has resulted in a nearly 40% increase in coding efficiency [1]. It is important to note, however, that these enhancements in VVC have come with the trade-off of heightened encoding complexity. The rapid surge in computational complexity has led to amplified hardware costs for VVC encoders and posed challenges for real-time video encoding applications. Consequently, a significant research focus in the realm of video coding is dedicated to substantially reducing the computational complexity of VVC encoders, all the while minimizing potential compromises to video coding performance [2]."
Comment #2: The literature review starts quite abruptly. There is no overvie presentation，clarification, just a basic revision of articles. It’s a very “brute” presentation. Very hard to follow or understand.
Response: Thank you for your feedback on the literature review. I understand that you find the current presentation difficult to follow and lacking in clarity. To address these concerns, I'll work on restructuring and rephrasing the text to provide a more cohesive and comprehensible overview of the different methods in the field of CU fast partitioning:
"The optimization of CU (Coding Unit) fast partitioning has remained a prominent focus within the video coding domain. Approaches addressing this challenge can be broadly categorized into traditional techniques and machine learning-based methods.
(1)Traditional Methods:
In the realm of traditional methods, CU characteristics like texture information and depth are key determinants of CU division. Zhang et al. proposed an intra-CU partitioning algorithm employing global and local edge complexity, effectively minimizing encoding time [3]. A fast CU depth decision algorithm, introduced by Min et al., utilizes hypothesis testing to statistically analyze CU depth and decide optimal partitioning [4]. Sun et al. developed an efficient technique that leverages CU texture complexity to guide the division process, considering whether a CU should be further divided into sub-CUs [5]. An innovative approach by FAN et al. harnesses variance and gradient to expedite QTMT-based partitioning, thus reducing computational load [6].
(2)Machine Learning-Based Methods:
The second category embraces machine learning methods, encompassing decision trees, support vector machines (SVM), neural networks, and more, to expedite CU division. Among these, SVM has emerged as a widely adopted technique. Zhang et al. designed a rapid CU partitioning strategy leveraging an enhanced DAG-SVM classification model, framing the CU partitioning problem in H.266/VVC as a multi-class classification issue [7]. Cheng et al. introduced a decision-making algorithm based on SVM for CU division within VVC frames, utilizing entropy and texture contrast as indicators for division direction prediction [8]. Expanding beyond SVM, Tang et al. constructed a CNN model integrating a variable pooling layer, which adapts fluidly to various CU shapes and predicts the necessity of CU division [9]. Fu et al. devised a VVC intra coding algorithm that defers binary and ternary tree partitioning through Bayesian and RD cost methodologies [10]. Yang et al. proposed a low-complexity CTU partitioning approach, utilizing a decision tree to predict partitioning outcomes [11].
To enhance the presentation and coherence of our literature review, we have restructured the information to provide a more systematic overview of the various techniques employed in CU fast partitioning."
Please review the rephrased version, and if you have any further suggestions or adjustments, feel free to let me know.
Comment #3: Please extend the literature review as there are quite many recent works that were ignored.
Response: Certainly, I understand your concern, and we will diligently incorporate these recent works into the literature review section, allowing us to present a more thorough perspective on the advancements in CU fast partitioning.
We extend our gratitude for supplementing the literature review with the relevant references [12-17], and the ontributions of these papers are evaluated objectively in the manuscript on paragraph 3 of Section 1.
12. Shang, X., Li, G., Zhao, X., Zuo,Y., Low complexity inter coding scheme for Versatile Video Coding (VVC),
Journal of Visual Communication and Image Representation, 2023, 90, 103683.
13. Li, H., Zhang, P., Jin, B., Zhang, Q., Fast CU Decision Algorithm Based on Texture Complexity and CNN for
VVC, IEEE Access, 2023, 11, 35808-35817.
14.Wang, Y., Liu, Y., Zhao, J., Zhang, Q., Fast CU Partitioning Algorithm for VVC Based on Multi-Stage
Framework and Binary Subnets, IEEE Access, 2023, 11, 56812-56821.
15. A. Tissier, W. Hamidouche, S. B. D. Mdalsi, J. Vanne, F. Galpin and D. Menard, Machine Learning Based
Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders, IEEE Transactions on Circuits and Systems for
Video Technology, 2023, 33(8), 4279-4293.
16. Shang, X., Li, G., Zhao, X., Han, H., Zuo, Y., Fast CU size decision algorithm for VVC intra coding. Multimedia
Tools and Applications, 2023, 82, 28301–28322.
17.Zhang, M., Hou, Y. and Liu, Z., An early CU partition mode decision algorithm in VVC based on variogram
for virtual reality 360 degree videos. EURASIP Journal on Image and Video Processing, 2023, 1, 9.
Comment #4: Line 52: “[12,13],. (1)”,
Response: Thank you for pointing out the formatting issue. It has been fixed in the revised manusrcipt.
Comment #5: Please remove the second main contribution as it’s a direct result of the usage of the first contribution. Reporting results it’s not a novel contribution to the research world. Moreover, please provide more details of the first contribution. What exactly is the novelty? What is novel in this ELM model? Wasn’t already introduced in literature? Then what exactly is the novelty of the proposed method? It’s usage?
Response: Thank you for your feedback and guidance regarding the main contributions. I'll address your concerns and provide more clarity on the first contribution while removing the redundant aspect from the second contribution.
The main contributions of this paper are as follows:
(1) Novel CU Size Decision Modeling and ELM Application:
In this work, we propose a distinctive approach by modeling the CU size decision as a classification problem and employing the ELM (Extreme Learning Machine) model for predicting the CU partitioning mode. The uniqueness lies in the utilization of the ELM model, which offers the advantage of predicting the CU partitioning mode without necessitating image feature extraction. This stands in contrast to traditional machine learning algorithms and enhances efficiency.
(2) Online Learning for Enhanced Prediction Accuracy:
Additionally, to further elevate the predictive accuracy, we incorporate an online learning method. By continuously adapting to the evolving dataset, this technique improves the prediction accuracy of our proposed approach.
Regarding the first contribution, the novelty lies in leveraging the ELM model for CU partitioning mode prediction. While machine learning has been used in similar contexts, the distinctiveness of our approach lies in the ELM's unique characteristics that eliminate the need for image feature extraction, leading to increased efficiency compared to traditional methods. This novel aspect contributes to the advancement of CU size decision modeling.
We appreciate your guidance to focus on the core novelty and avoid redundant reporting of results. We will ensure to emphasize the unique aspects of our proposed method and its novel application of the ELM model for CU partitioning mode prediction.
Comment #6: Please check each equation and add the necessary punctuation after each one.
Response: Response: Thank you for your suggestion. I certainly review each equation and ensure the appropriate punctuation is added after each one to enhance the clarity and correctness of the manuscript.
Comment #7: In Table I, please explain why are the class A sequences not tested in this manuscript.
Response: Thank you for your question regarding the exclusion of class A sequences (4096x2160) from the testing in our manuscript. We apologize for any confusion caused by this omission. Given the characteristics and size of Class A sequences, they may introduce factors that are not fully relevant to our research focus.Therefore, to ensure the consistency and accuracy of the study, we decided to exclude these sequences.
Moreover, Testing on a large scale requires resources and time.Given the high resolution and complexity of Class A sequences, testing them may require more computational resources and time than our research plan.Therefore, with limited resources and time, we selected test sequences that fit the scope of our study. We give the reason why we did not choose test sequence A in the manuscript on paragraph 1 of Section 4.
Comment #8: Please discuss the BDBR variation as there are quite large losses for some sequences.
Response: Thank you for bringing up the concern regarding the variation in BDBR (Bjøntegaard Delta Bit Rate) and the notable losses observed in certain sequences. We appreciate your feedback, and we acknowledge the importance of addressing this aspect in our discussion.
(1)Resolution is one of the important factors that affect the encoding effect.Higher resolution video sequences typically have more detail and information, which can lead to larger data volumes and higher bit rates during the encoding process.
(2)The degree of image motion in the video also has a significant impact on the encoding effect.Highly moving image sequences may cause more displacement and motion estimation errors, resulting in reduced coding effect.We will analyze in depth the extent of image motion in video sequences and explore how it interacts with changes in BDBR.
By discussing in terms of resolution and degree of image motion, we provide a more specific interpretation and analysis of BDBR variations to enhance the understanding and reliability of our proposed approach in the manuscript on paragraph 2 of Section 4.
Comment #9: Table 3, please compare with other more recent state-of-the-art methods, as there are quite many other articles published on this topic.
Response: Thank you for highlighting the need to compare our proposed method with other recent state-of-the-art techniques in Table 3. We value your suggestion, and we acknowledge the importance of providing a comprehensive comparison to showcase the relative performance of our approach.
We expand the comparison in Table 3 to include other relevant state-of-the-art methods that have been published on the same topic. By incorporating a broader range of recent approaches, we aim to present a more holistic perspective on the effectiveness of our proposed method.
In the revised manuscript, the comparsion results with references [12, 13, 26, 27] have been added, and the details about these methods have been described in the manuscript on paragraph 4 of Section 4.

Comment #10: Figure 4, not that much information is reported with the 4 plots. A simple table would have helped much more.
Response: Thank you for your feedback regarding Figure 4. We appreciate your suggestion to consider using a table format instead of the plots to convey information more effectively. We review the content presented in Figure 4 and evaluate the feasibility of representing the data in a table format.
As you suggested, in Fig.4, the R-D curve has been modified as a simple graph, and the advantages of the proposed has been described in the manuscript on paragraph 3 of Section 4.
Comment #11: What about some visual comparison? What exactly was predicted correctly and what was predicted wrong?
Response: Thank you for your suggestion to include a visual comparison that highlights the correct and incorrect predictions made by our method. As you suggested, we incorporate visual comparisons within the manuscript to illustrate the correctness of predictions on specific examples. These comparisons will provide concrete insights into the effectiveness of our method and enhance the overall clarity of our presentation. In the revised manuscript, the visual comparison for BasketballDrive sequence has been added in Fig.5. It can be see that, compared with the original image, the quality of the reconstructed image does not decrease significantly.

General Comments: In general, the manuscript does not present too much novelty as it’s simply employing a well-known tool. Moreover, it’s lacking a comparison with more recent works.
Response: We appreciate your feedback and take your comments seriously. While we understand your concern regarding the novelty of our manuscript, we'd like to clarify a few points. Our work aims to apply a well-known tool, the Extreme Learning Machine (ELM), to address specific challenges in a novel context. We have highlighted the unique aspects of our approach, including its application in CTU partitioning, which can contribute to the existing knowledge.
Regarding the lack of a comparison with more recent works, we acknowledge the importance of benchmarking our approach against the latest advancements. We will address this by incorporating a comprehensive comparison with relevant recent works in the field. This will allow us to provide a clearer understanding of the strengths and limitations of our proposed method in comparison to the state-of-the-art techniques.
We thank you for your insightful comments, and we will revise the manuscript accordingly to enhance its novelty and provide a robust comparison with recent works.

Author Response File: Author Response.docx

Reviewer 3 Report

Authors of this paper introduce an extreme learning machine (ELM) based efficient coding unit partitioning algorithm for versatile video coding (VVC). The features of the proposed concept are verified by a set of simulations. Particularly, this work is the continuation of authors’ previously works in this field. The content of the article meets with the topics of Information journal.

The basic topic of the article is interesting. However, the article has some parts that need extension and/or better explanation. The article needs more than major revision. Please, see my notes!

Notes:

o Introduction – the differences between this and authors’ previous works should be clearly explained.

o Introduction – the second part of the state-of-the-art (SoA) is elaborated on good level. However, its first part, in my humble opinion, should be slightly improved. In general, there is talked about VCC and HEVC video codecs, but there should be also mentioned the 1) the emerging AV1 video codec and 2) additional field, concretely virtual reality (VR), where these video codecs (and related image codecs) will probably also used in the future (link on such works is missing). Hence, maybe the authors should consider about the extension of SoA with the following works (or other similar ones): “Software and Hardware Encoding of Omnidirectional 8K Video: A Performance Study” and “On the Compression Performance of HEVC, VP9 and AV1 Encoders for Virtual Reality Videos”. The first work deals with the performance study of SW/HW implementation of HEVC video codec for omnidirectional videos in UHD resolution. The second one focuses on the performance study of AV1 video codec with conventional ones on subjective level. I hope that the mentioned works can be helpful in the improvement of the elaboration of SoA (mainly extension the information about the conventional and emerging video codecs and their utilization in the field of VR). Otherwise, work with other papers. Thanks!

o Introduction – it should be “Ultra-High-Definition TV (UHDTV)”.

o Introduction – abbreviation “CU” (second paragraph) is not defined. Next, all the used abbreviations must be clearly defined (explained) in the text.

o Introduction – in some cases, there is missing a white space between the text and link on references (e.g., “et al. [6]proposed”). Sometimes, similar problems (within words) is also occurred in the text. Please, fix this typo!

o Section 2 – first sentence – “VVC (Versatile Video Coding)” – abbreviation “VVC” was defined in the Introduction.

o Section 2 – the beginning of second paragraph – it should be “ELMs”.

o Section 3 – in some cases, it is not clear that on the base what the values for the parameters of algorithm model were selected (e.g., m nodes).

o Section 3 – the “flowchart” of the ELM-enabled CU portioning algorithm should be better described –the whole proposed concept should be better introduced and presented.

o Section 3 – the structure of the used ML is not presented

o Section 4 – “H.266/VVC reference software” – link on this SW (websites) is missing. The settings and using of this SW is not clear.

o Section 4 – Table 1 – “the simulation environment” – this name is not the best for this Table.

o Section 4 – it is not clear that on the base what were the system parameters configurations selected.

o Section 4 – alle the used HW/SW equipment must be presented in detail. Next, for reproducible research, I would suggest the authors make the mathematical model publicly available.

o Section 4 – snapshot of pictures from videos used in this work is missing

o Section 4 – the R-D curve is presented only for two videos. What about next videos?

Section 4 – the obtained results are evaluated on very general level. The results should be evaluated in detail.

Article – the English grammar of the article contains some minor typos – not critical (e.g., “and life An indispensable”). Hence, please, check the whole article carefully once again! Next, the paper abstract should more reflect the main contributions and outputs of the article.

References -- check [11] where the numbering is started with [18]

Author Response

REVIEWER 3

General Comments: Authors of this paper introduce an extreme learning machine (ELM) based efficient coding unit partitioning algorithm for versatile video coding (VVC). The features of the proposed concept are verified by a set of simulations. Particularly, this work is the continuation of authors’ previously works in this field. The content of the article meets with the topics of Information journal.

The basic topic of the article is interesting. However, the article has some parts that need extension and/or better explanation. The article needs more than major revision. Please, see my notes!

Response: Thank you for your feedback on the paper. I understand that you find the topic of the article interesting, but you have identified several areas that require extension and clarification. I appreciate your thorough review and will carefully address your notes to ensure that the manuscript meets the required standards. If you have any specific notes or points you'd like me to assist you with, please feel free to share them, and I'll be more than happy to help.

Comment #1: Introduction – the differences between this and authors’ previous works should be clearly explained.

Response: Thank you for your feedback on the introduction section. We recognize the importance of clearly explaining the differences between our current work and our previous works. As you know, in previous work [4], we use a video salient object detection model via fully convolutional networks to obtain saliency maps in the preprocessing stage of video encoding. Based on the computed saliency values at the CU level, we propose a fast CU partition scheme, including the redetermination of the CU division depth by calculating Scharr operator and variance, as well as the executive decision for intra sub-partitions, to alleviate intra encoding complexity. However, in this work, the CU partitioning problem is modeled as a multi-classification problem, and the extreme learning machine is used as the classifier to decision the CU size.

As you suggested, we revise the introduction to include a focused discussion that highlights the specific differentiators between this work and our previous works in the manuscript on paragraph 1 of Section 1.

Li, W.; Jiang, X.; Jin, J.; Song, T.; Yu, F.R. Saliency-Enabled Coding Unit Partitioning and Quantization Control
for Versatile Video Coding. Information, 2022, 13, 394.

Comment #2: Introduction – the second part of the state-of-the-art (SoA) is elaborated on good level. However, its first part, in my humble opinion, should be slightly improved. In general, there is talked about VCC and HEVC video codecs, but there should be also mentioned the 1) the emerging AV1 video codec and 2) additional field, concretely virtual reality (VR), where these video codecs (and related image codecs) will probably also used in the future (link on such works is missing). Hence, maybe the authors should consider about the extension of SoA with the following works (or other similar ones): “Software and Hardware Encoding of Omnidirectional 8K Video: A Performance Study” and “On the Compression Performance of HEVC, VP9 and AV1 Encoders for Virtual Reality Videos”. The first work deals with the performance study of SW/HW implementation of HEVC video codec for omnidirectional videos in UHD resolution. The second one focuses on the performance study of AV1 video codec with conventional ones on subjective level. I hope that the mentioned works can be helpful in the improvement of the elaboration of SoA (mainly extension the information about the conventional and emerging video codecs and their utilization in the field of VR). Otherwise, work with other papers. Thanks!

Response: Thank you for your detailed feedback regarding the improvement of the first part of the state-of-the-art (SoA) section in the introduction. Your suggestions to include information about the emerging AV1 video codec and its relevance in the context of virtual reality (VR) are valuable. Additionally, you have provided specific works that could enhance the elaboration of the SoA.

We incorporate your suggestions and expand the SoA section to include information about the AV1 video codec and its significance in the field. Furthermore, we reference the works you've mentioned, "Software and Hardware Encoding of Omnidirectional 8K Video: A Performance Study" and "On the Compression Performance of HEVC, VP9 and AV1 Encoders for Virtual Reality Videos," to provide a more comprehensive overview of the current landscape of video codecs, especially in relation to VR applications in the manuscript on paragraph 1 of Section 1.

Polak, L.; Kufa, J.; Kratochvil, T. On the Compression Performance of HEVC, VP9 and AV1 Encoders for
Virtual Reality Videos. In Proceedings of IEEE International Symposium on Broadband Multimedia Systems
and Broadcasting (BMSB), Oct 2022, 1-5.
Kufa, J.; Polak, L.; Simka, M.; Stech, A. Software and Hardware Encoding of Omnidirectional 8K Video: A
Performance Study. In Proceedings of 33rd International Conference Radioelektronika, Apr 2023, 1-5.

Comment #3: Introduction – it should be “Ultra-High-Definition TV (UHDTV)”.

Response: Thank you for your clarification regarding the term "Ultra-High-Definition TV (UHDTV)" in the introduction. As you suggested, we fix it in the manuscript on paragraph 1 of Section 1.

Comment #4: Introduction – abbreviation “CU” (second paragraph) is not defined. Next, all the used abbreviations must be clearly defined (explained) in the text.

Response: Thank you for bringing to our attention the absence of the definition for the abbreviation "CU" in the second paragraph of the introduction.As you suggested, the coding unit (CU) has been defined in the manuscript on paragraph 1 of Section 1.

Comment #5: Introduction – in some cases, there is missing a white space between the text and link on references (e.g., “et al. [6]proposed”). Sometimes, similar problems (within words) is also occurred in the text. Please, fix this typo!

Response: Thank you for noting the absence of white space between the text and references in certain instances, like "et al. [6]proposed," within the introduction. We apologize for these typographical errors and will promptly rectify them to uphold correct formatting and improve readability. Furthermore, we will thoroughly review the text to identify and rectify any similar occurrences within words.

Comment #6: Section 2 – first sentence – “VVC (Versatile Video Coding)” – abbreviation “VVC” was defined in the Introduction.

Response: Thank you for pointing out the redundancy in defining the abbreviation "VVC" as "Versatile Video Coding" again in the first sentence of Section 2. We apologize for the oversight and will ensure that the content is revised to eliminate repetition while preserving clarity in the presentation.

Comment #7: Section 2 – the beginning of second paragraph – it should be “ELMs”.

Response: Thank you for pointing out the redundancy in defining the abbreviation "ELMs" as "Extreme Learning Machines" again in the second paragraph of Section 2. As you suggested, we fix it.

Comment #8: Section 3 – in some cases, it is not clear that on the base what the values for the parameters of algorithm model were selected (e.g., m nodes).

Response: We are sorry for not being clear about the values of the model parameters. As you know, in this work, CU partitioning is modeled as a multi-classification problem, and CU modes are divided into 6 categories. Therefore, the value of the parameter m is set to 6.

As you suggested, the details of the model parameters have been described in the manuscript on paragraph 2 of Section 3.1.

Comment #9: Section 3 – the “flowchart” of the ELM-enabled CU portioning algorithm should be better described –the whole proposed concept should be better introduced and presented.

Response: We are sorry for not being clear about the flowchat of the proposed method. As you suggested, we improve the presentation by providing a detailed explanation of the flowchart and ensuring that the proposed concept is thoroughly introduced in the manuscript on paragraph 2 of Section 3.2.

Comment #10: Section 3 – the structure of the used ML is not presented

Response: Thank you for highlighting the absence of the presentation of the machine learning (ML) structure used in the section. In response to your feedback, we will include a detailed explanation of the ML structure employed in the section, ensuring that its components and operation are thoroughly presented. As you suggested, the ELM-based machine learning structrue is shown in Fig.4.

Comment #11: Section 4 – “H.266/VVC reference software” – link on this SW (websites) is missing. The settings and using of this SW is not clear.
Response: We are sorry for the unclear statements. As you suggested, we appreciate your observation regarding the missing link to the "H.266/VVC reference software" and the unclear explanation of its settings and usage in Section 4.

As you suggested, the website of the reference model is added. Moreover, the details about the settings and using of this reference model are described in the manuscript on paragraph 1 of Section 4.

1 https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM

Comment #12: Section 4 – Table 1 – “the simulation environment” – this name is not the best for this Table.

Response: We are sorry for the unclear statements. As you suggested, “the simulation environment” is modified as “the system parameters configurations” in the revised manuscript.

Comment #13: Section 4 – it is not clear that on the base what were the system parameters configurations selected.

Response: Thank you for highlighting the lack of clarity regarding the selection of system parameter configurations. As you know, in the VVC reference model VTM12.0, the two configuration profiles, including “encoder_randomaccess_vtm.cfg”and“encoder_lowdelay_vtm.cfg”, can be choose to verify algorithm performance. During the experiment, we selected different video sequences to test the encoding performance with different QP value.

As you suggested, the details about the system parameters congfigurations have been described in the manuscript on paragraph 1 of Section 4.

Comment #14: Section 4 – all the used HW/SW equipment must be presented in detail. Next, for reproducible research, I would suggest the authors make the mathematical model publicly available.

Response: We are sorry for the unclear statements about HW/SW equipment. In this work, the operating system is Windows 7, and the processor is Inter(R) Core i3-2310M with 8G memory.The language used is the Python3 version for learning. The C++ version of Pytorch is used to load the model. As you suggested, a comprehensive description of the hardware and software equipment has been added in the manuscript on paragraph 1 of Section 4.

Furthermore, we appreciate your suggestion to make the mathematical model publicly available to promote reproducible research. We recognize the importance of transparency in research, and we commit to making the mathematical model accessible for fellow researchers, thus fostering a collaborative and verifiable research environment.

Comment #15: Section 4 – snapshot of pictures from videos used in this work is missing

Response: Thank you for your question regarding the exclusion of class A sequences (4096x2160) from the testing in our manuscript. We apologize for any confusion caused by this omission. Given the characteristics and size of Class A sequences, they may introduce factors that are not fully relevant to our research focus.Therefore, to ensure the consistency and accuracy of the study, we decided to exclude these sequences.

Moreover, Testing on a large scale requires resources and time.Given the high resolution and complexity of Class A sequences, testing them may require more computational resources and time than our research plan.Therefore, with limited resources and time, we selected test sequences that fit the scope of our study. We give the reason why we did not choose test sequence A in the manuscript on paragraph 1 of Section 4.

Comment #16: Section 4 – the R-D curve is presented only for two videos. What about next videos?

Response: We are sorry for the unclear statements about the R-D curves of the proposed method. As you know, the R-D curves of the proposed algorithm are compared under the best-case and worst-case scenarios of the test sequences. This comparison enables a comprehensive evaluation of the algorithm's performance across different scenarios.

As you suggested, the more details of the R-D curves of the proposed have been descript in the manuscript on paragraph 3 of Section 4.

Comment #17: Section 4 – the obtained results are evaluated on very general level. The results should be evaluated in detail.

Response: We are sorry for the unclear statements of evaluation result of this proposed method. We appreciate your observation regarding the evaluation of the obtained results in Section 4 and the suggestion for a more detailed analysis. To address this, we will augment the evaluation section by offering a thorough examination of the results, incorporating specific performance metrics, detailed comparisons, and insightful observations for each scenario.

As you know, For BasketballDrive sequence, the encoding efficiency loss is 2.17% for RA profile.There are two reasons for this result: (1) Resolutionis one of the important factors that affect the encoding effect. Higher resolution video sequences typically have more detail and information, which can lead to larger data volumes and higher bit rates during the encoding process.(2)The degree of image motion in the video also has a significant impact on the encoding effect. Highly moving image sequences may cause more displacement and motion estimation errors, resulting in reduced coding effect.In contrast, the corresponding video sequence BasketballDrill has a lower resolution and flat image motion, so the encoding efficiency loss of BasketballDrill sequence is only 0.22% for RA profile.

As you suggested, the encoding performances of the best-case and worst-case scenarios of the test sequences have been described in the manuscript on paragraph 2 of Section 4.

Comment #18: Article – the English grammar of the article contains some minor typos – not critical (e.g., “and life An indispensable”). Hence, please, check the whole article carefully once again! Next, the paper abstract should more reflect the main contributions and outputs of the article.

Response: We appreciate your feedback regarding the minor English grammar typos present in the article, such as "and life An indispensable." We apologize for these errors and will conduct a thorough review of the entire article to identify and rectify such typos to ensure linguistic accuracy. Additionally, we acknowledge your suggestion to revise the paper abstract to more accurately reflect the main contributions and outcomes of the article. Your input is invaluable, and we are committed to enhancing the quality and clarity of the manuscript.

Comment #19: References -- check [11] where the numbering is started with [18]

Response: We are sorry for the careless mistake. As you suggested, we fix it in the revised manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

My previous concerns have been well addressed. Thanks

There are still small language issues with the new version of this paper. It's better to carefully revise it again.

- "machine learning classifier is adopted to decision the CU size." -> "machine learning classifier is adopted to decide on the CU size."

- "traditional methods" -> "knowledge-based methods"

- "Shang et al. use coding information to accelerate the coding process[14], and this method predicts the coding area of the current coding unit by analyzing neighboring CUs, reducing unnecessary splitting modes." -> "Shang et al. use coding information to accelerate the coding process[14]. This method predicts the coding area of the current coding unit by analyzing neighboring CUs, reducing unnecessary splitting modes."

Author Response

REVIEWER 1

General Comments: There are still small language issues with the new version of this paper. It's better to carefully revise it again.

Response: Thank you for your feedback. We apologize for any lingering language issues in the revised version of the paper. We will take your advice seriously and conduct another thorough review to address these remaining language concerns. Your attention to detail is greatly appreciated, and we are committed to ensuring the highest quality of language and presentation in the paper.

Comment #1: "machine learning classifier is adopted to decision the CU size." -> "machine learning classifier is adopted to decide on the CU size."

Response: We are sorry for the careless mistake. As you suggested, “machine learning classifier is adopted to decision the CU size.” is modified as “machine learning classifier is adopted to decide on the CU size.” in the revised manuscript.

Comment #2: "traditional methods" -> "knowledge-based methods"

Response: We are sorry for the unclear statements. As you suggested, “traditional methods” is modified as “knowledge-based methods” in the revised manuscript.

Comment #3: "Shang et al. use coding information to accelerate the coding process[14], and this method predicts the coding area of the current coding unit by analyzing neighboring CUs, reducing unnecessary splitting modes." -> "Shang et al. use coding information to accelerate the coding process[14]. This method predicts the coding area of the current coding unit by analyzing neighboring CUs, reducing unnecessary splitting modes."

Response: We are sorry for the unclear statements. As you suggested, it has been fixed in the revised manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

Please ask a native English speaker colleague to check your manuscript as there are some expressions that were directly translated from your native language, e.g., “heightened encoding complexity”.
My comment 6). Please check again each equation and add the necessary punctuation after each one. Please note that not all the time a coma (",") is required after each equation.

Please ask a native English speaker colleague to check your manuscript.

Author Response

REVIEWER 2

Comment #1: Please ask a native English speaker colleague to check your manuscript as there are some expressions that were directly translated from your native language, e.g., “heightened encoding complexity”.

Comment #2: My comment 6). Please check again each equation and add the necessary punctuation after each one. Please note that not all the time a coma (",") is required after each equation.

Response: Thank you for your comment regarding the punctuation of equations. We will carefully review each equation and ensure that the appropriate punctuation is added.

Author Response File: Author Response.docx

Reviewer 3 Report

The article has been improved. Many thanks for the explanation letter! After the check of the article, I have the following notes:

o Section 2.2 – the name of this section should be “Extreme Learning Machine (ELM)”

o Section 4 – the visibility of bot curves in Fig. 5 is not the best. I recommend for the authors to use dashed lines for curves “VTM”.

o Section 4 – the R-D curves were obtained only for objective metric PSNR, which is fast but many times, their outputs do not meet well with the subjective scores. Can you complete these results by the using of other objective metrics, e.g., SSIM or better?

Explanation Letter – “Furthermore, we appreciate your suggestion to make the mathematical model publicly available to promote reproducible research. We recognize the importance of transparency in research, and we commit to making the mathematical model accessible for fellow researchers, thus fostering a collaborative and verifiable research environment.” Is it mean that the mathematical models will be available by request? If yes, then this information must be available in the article.

Author Response

REVIEWER 3

The article has been improved. Many thanks for the explanation letter! After the check of the article, I have the following notes:

Comment #1: Section 2.2 – the name of this section should be “Extreme Learning Machine (ELM)”

Response: We are sorry for the unclear statements. As you suggested, it has been fixed in the revised manuscript.

Comment #2: Section 4 – the visibility of bot curves in Fig. 5 is not the best. I recommend for the authors to use dashed lines for curves “VTM”.

Response: Thank you for your feedback regarding the visibility of the curves in Figure 5. We understand the importance of clear visualization in the figures. Following your recommendation, we will use dashed lines to represent the "VTM" curves, which should enhance the visibility and distinguishability of the curves in the plot.

Comment #3: Section 4 – the R-D curves were obtained only for objective metric PSNR, which is fast but many times, their outputs do not meet well with the subjective scores. Can you complete these results by the using of other objective metrics, e.g., SSIM or better?

Response: Thank you for your suggestion to enhance the evaluation of the R-D curves by incorporating additional objective metrics. We acknowledge the limitations of using only PSNR as an objective metric and agree that other metrics such as SSIM can provide a more comprehensive assessment of visual quality. In our revised paper, we have taken your advice and included results using MS-SSIM (Multiscale Structural Similarity Index) as an additional objective metric for the R-D curves. This should provide a more comprehensive and accurate representation of the visual quality of the proposed algorithm compared to other methods.

As you sugguested, the details about MS-SSIM have been described in the manuscript on paragraph 1 and paragraph 3 of Section 4.

Comment #4: Explanation Letter – “Furthermore, we appreciate your suggestion to make the mathematical model publicly available to promote reproducible research. We recognize the importance of transparency in research, and we commit to making the mathematical model accessible for fellow researchers, thus fostering a collaborative and verifiable research environment.” Is it mean that the mathematical models will be available by request? If yes, then this information must be available in the article.

Response: Yes, the mathematical model will indeed be made publicly available for access. To clarify, we apologize for any misunderstanding. As you suggeseted, the statement: “Supplementary Materials: Data available on request due to restrictions eg privacy or ethical”, has been added in the revised manuscript.

Author Response File: Author Response.docx

Round 3

Reviewer 3 Report

The article has been improved. Many thanks for the explanation letter!

Article Menu

Extreme Learning Machine-Enabled Coding Unit Partitioning Algorithm for Versatile Video Coding

Further Information

Guidelines

MDPI Initiatives

Follow MDPI