Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information

Zhang, Chaoqin; Yang, Wentao; Zhang, Qiuwen

doi:10.3390/electronics12091967

Open AccessArticle

Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information

by

Chaoqin Zhang

,

Wentao Yang

and

Qiuwen Zhang

^*

College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(9), 1967; https://doi.org/10.3390/electronics12091967

Submission received: 15 March 2023 / Revised: 14 April 2023 / Accepted: 20 April 2023 / Published: 23 April 2023

(This article belongs to the Special Issue Selected Papers from Young Researchers in Signal/Image/Video Coding and Processing)

Download

Browse Figures

Versions Notes

Abstract

:

In order to satisfy the growing need for high-quality video, VVC comes with more efficient coding performance. According to statistical analysis, the level of coding complexity in VVC is tenfold greater compared to that of HEVC, so it is our main goal to study that what methods can be employed to decrease the time complexity of VVC. CU split in intra-frame modes requires the split mode decision by RD loss calculation, and the process of coding makes it to calculate RD calculation for all possible mode combinations, which is an important area that brings complexity to video coding, so in order to achieve our goal. Initially, we introduce an optimal depth prediction algorithm for Coding Units (CUs) by leveraging temporal combination. This algorithm collects depth information of CUs to predict the coding depth of CU blocks. Additionally, we suggest a decision tree-based method for CU split mode decision. With this method, we can make a decision on the CU split mode within the obtained split depth, reducing the time complexity of coding. This decision is based on the predictions from the first algorithm. The results demonstrate that our algorithm achieves superior performance over state-of-the-art methods in terms of computational complexity and compression quality. Compared to the VVC reference software (VTM), our method saves an average of 53.92% in coding time and improves the BDBR by 1.74%. These findings suggest that our method is highly effective in improving both computational efficiency and compression quality.

Keywords:

spatio-temporal combination; CNN; VVC; decision tree; computational complexity

1. Introduction

Vision is an important way for human beings to perceive the world. With the rapid development of online media in recent years, many emerging technologies, such as short video platforms, live streaming platforms, VR technologies, etc., have emerged, and with the impact of the new crown epidemic, these technologies have gradually become an important channel for people to communicate in their lives. In 2020, a white paper on high-tech video-VR technology for 5G was released by the State Administration of Radio and Television’s Science and Technology Department, pointing out the market demand for higher technology format video, where high refers to video fusion 4K/8K, 3D, VR/AR/MR, high frame rate (HFR), high dynamic range (HDR), wide color gamut (WCG) and other high technology formats. The current capacity of the High Efficiency Video Coding (HEVC) standards is inadequate to fulfill the increasing market demand. In order to tackle this issue, the JVET group was established through a collaboration between MPEG and VCEG, with their inaugural meeting held in 2015. The main objective of the group was to conduct research on upcoming coding standards and release the test model JEM to assess the effectiveness of novel coding technologies [1]. During the tenth meeting held in April 2018, the latest video coding standard was officially designated as VVC (versatile video coding), and the first test model, VTM1.0, was unveiled. It is anticipated that the final version of the new generation coding standard will be made available in 2020. While exploring the latest video coding standard, VVC maintains the fundamental framework of HEVC, with new coding methods gradually being incorporated. When compared to the previous generation’s HM test model for the efficient video coding standard, HEVC, VVC demonstrates significant advancements. JEM adopts the coding tree unit split structure combined with quadratic tree split (QT) and binary tree split (BT) [2]. The range of the intra-frame angle prediction mode has been increased from 35 to 67. In the same time, VTM added many new methods, such as inter-frame local illumination compensation (LIC), bi-directional optical flow prediction (BIO), affine motion compensation (AMC) [3], and adaptive intra-loop filter (ALF) [4], which significantly improve the compression performance, but also bring huge computational complexity. In the All-Intra test configuration, the VVC codec can reduce bit rate by 30% to 50% while maintaining perceptual quality that is comparable to HEVC, but this benefit comes at the expense of coding complexity that is 10 times higher than that of HEVC. Additionally, the decoding complexity of VVC is about 1.5 times higher than that of HEVC. Therefore, it is crucial to investigate rapid encoding methods that reduce the complexity of VVC.

Video coding relies significantly on intra-frame prediction, which is primarily segregated into two parts: Coding unit (CU) split and mode decision, which are related and affect each other. For CU split operation, VTM adds trinomial tree split on the basis of JEM [5]. So the operation includes six choices of quadtree, horizontal binary tree, vertical binary tree, horizontal trinomial tree, vertical trinomial tree and no split, which improves the robustness for the accurate split of CU, where if CU takes the MT split method, the sub-CU no longer uses QT split. For the mode decision part, the VVC angle prediction mode has been expanded from 33 to 65, with the addition of the DC mode and plane mode, providing a total of 67 mode options [6]. The approximate sequence of steps for the CU split process is as follows: first, the most substantial CTU is partitioned into quadtrees, and then the second split can choose the quadtree split mode or no split mode, and then a hybrid split mechanism can be performed, in which all possible combinations of splits are combined. After traversal, compare the Sum of Absolute transform Difference (SATD) calculated of the parent CU with the sum of the sub-CUs based on Hadamard transform, then we can make a decision whether split or not. In the end, we need to calculate Rate Distortion (RD) cost to determine the optimal prediction mode. VVC uses a two-step fast RMD (rough modes decision) method to solve this problem, and numerous algorithms have been researched with the aim of decreasing the computational complexity of intra-frame prediction. A variety of different perspectives and methods have emerged, and some scholars have studied the image content of video sequences, for example, Qian [7] constructed a swarm class activation graph network that can segment the motion images of related image groups from the whole frame by weakly supervised learning. Based on such an approach we can adopt different policy approaches for the two parts to reduce unnecessary redundancy losses. Some other scholars have also studied the division structure of CU, for example, Zhao [8] introduced an asymmetric convolution kernel and trained the prediction of CU division patterns in a ResNet network to reduce unnecessary RD calculations.

In this paper, we put forward a proposal for a proficient algorithm that is specifically crafted to operate alongside the new QTMT partition arrangement for intra-frame coding, as well as the inclusion of two-sided coded intra-frame prediction modes. To reveal novel characteristics of VVC, we analyze the dispersion of intra-frame mode and CU depth. Subsequently, we present a less intricate CTU splitting decision approach and a speedy intra-frame mode decision approach to minimize the computational load of VVC intra-frame encoding. Our experiments have shown that the proposed algorithm achieves an optimal balance between reducing complexity and maintaining RD performance for various sequences. This paper’s primary contributions are being summarized as follows: (1) A temporal combination-based CU block depth prediction is proposed. Due to the strong correlation between adjacent frames, our approach involves selecting the depth information of CU blocks, adjacent encoded CUs, and adjacent CU blocks from the previous frame as features. We then build a lightweight CNN to predict the depth of the current CU, which indirectly determines whether a CU split is necessary, resulting in a significant reduction in redundant RD calculation; (2) we propose a decision tree based CU split mode decision, which requires the algorithm to utilize the predicted depth to the CU block, and build a decision tree model on the depth layer that needs to be divided to predict the probability of five split types, which has the potential to decrease the computational complexity even more.

The organization of the rest of this paper is presented below. Section 2 offers an overview of the background and related work, while Section 3 explores the statistical distribution of intra-frame coding modes and CU depth and explains our efficient intra-frame coding algorithm. The findings and conclusions are included in Section 4 and Section 5, respectively.

2. Background and Related Works

Today’s algorithms are broadly divided into two categories, one is data-driven algorithms, which require algorithms for data mining and processing to get data information, and the other is heuristic algorithms, the main popular ones are simulated annealing algorithms, genetic algorithms, neural networks, etc.

Machine learning and statistical analysis are the main methods to analyze the correlation between image texture features. In order to speed up intraframe prediction, Literature [9] suggested utilizing the correlation between adjacent CU depths to optimize coding order and incorporating adaptive internal candidates that adjust the number of candidates based on cost distribution. To achieve this, they proposed adaptive intraframe mode selection and bidirectional depth search algorithms. In a published paper [10], an intra-depth prediction algorithm for CU was proposed in which the texture complexity is defined using DCT. The algorithm then calculates the texture complexity of the CU to establish the depth limit for splitting the CU, and thereby determines if the CU should be divided. When the non-texture cost dominates the total cost, no split operation is necessary, as shown in literature [11], which examines the relationship between the texture cost and the non-texture cost of the current coding CU depth. A CU early termination algorithm is proposed in literature [12]. To determine the correlation between size and rate distortion cost, the distribution of rate distortion cost is calculated for various quantization parameters first. Then, the rate distortion cost threshold for early termination split is established for the coding unit. A proposed algorithm uses statistical analysis to enable fast mode selection for prediction units, reducing the number of detailed selection decisions and filtering them into coarse selection decisions based on the distribution characteristics of the best modes and the correlation of Hadamard transformation cost between adjacent angular modes. According to literature [13], CU can be divided into two groups–significant and non-significant–based on the correlation between the sum of absolute transformation differences of coarse mode decisions to the north and the cost of rate distortion optimization. To reduce computational complexity, all CUs belonging to the significant group are replaced with RDs, thus minimizing the Rate dcistortion optimized (RDO) computation process. Literature [14] proposed to use a 5-stage support vector machine to accelerate the intra-frame mode decision process. Literature [15] uses a random forest classifier to determine the most likely split mode for each coding block and introduces a risk interval for classifier decision to reduce the possible coding loss caused by misclassification and achieves a balance between coding hairiness reduction and coding loss by changing the risk interval size. Ref. [16] introducing a rapid CU classification algorithm that utilizes visible distortion (JND) and SVM. The method classifies CUs into three categories: smooth, normal, and complex, employing the JND model. Smooth CUs do not require classification, while complex CUs utilize SVM for classification, and VVC primitive procedure is used for operation in normal CUs. A fast CU split decision algorithm based on texture features is proposed in the literature [17]. The present coding unit (CU) is partitioned into multiple sub-blocks, and the dissimilarity between the average pixel values of these sub-blocks and the current CU is computed as the texture characteristic of the CU. Subsequently, the obtained result is matched against a threshold value to decide whether to halt the CU splitting procedure prematurely. In the literature [18], a CU split decision algorithm based on reinforcement learning is suggested to convert the CU split decision into a sequential decision problem. This algorithm uses a Batch-Mode reinforcement learning model to determine the optimal coding strategy for CU split. In the study referenced by [19], an algorithm for making CU split decisions based on support vector machines (SVMs) was proposed. To improve the accuracy of predictions, distinct SVM classifiers were designed for each depth as part of the algorithm. Additionally, the original RDO process was only activated in situations where the output result was uncertain.

Deep learning is a hot method in image processing in recent years, and researchers have constructed classifiers to achieve fast CU split by deep learning. In the literature [20], a three-level CNN network with early termination is designed and a CU partition map that can represent the three-level deep CU split is constructed as the output of the network, in the literature [21], a CNN convolutional neural network with asymmetric kernel is designed, a partition rate is proposed to make the network applicable to all quantization parameters, and the threshold decision of the network output is transformed into a multi-objective optimization problem to weigh the computational complexity and algorithm performance, in the literature [22,23], a Resnet model is proposed to predict the CTU partition of HEVC standard, in the literature [24], a fast decision algorithm for intra-frame modes based on convolutional neural networks is proposed by scaling prediction units of different sizes to CU of the same size by bilinear interpolation, followed by model learning to achieve fast decision making, in the literature [25] designed a three-level MSE-CNN network, which has a structure related to the kernel size and CU size, and designed an adaptive cross-entropy activation function to solve the problem of imbalance between different splitting cases, and literature [26] used a deep convolutional network to fuse all reference features obtained from varying convolutional kernels after extracting the spatio-temporal corresponding coding features to finally determine the intra-frame coding depth. And using probabilistic model and spatio-temporal coherence based to select the candidate split mode with the best encoding depth. Literature [27] establishes a pre-decision dictionary based on statistical theory and combines it with a convolutional neural network model based on adaptive adjustment of pooling layer size to construct a size-adaptive convolutional neural network to make decisions on CU size split, literature [28] proposes a global convolutional network based intra-frame mode that The global information of CU is captured by setting larger convolutional kernels so as to achieve pure deficit partition mode prediction in a quadtree plus multi-type tree structure and discard the partition modes with lower probability to reduce the computational complexity by ranking them according to the prediction probability. In the literature [29], a CNN model, which is lightweight, has been created. This model’s algorithm can bypass the laborious RDO process used in HEVC, resulting in a substantial reduction in coding time. Furthermore, the model incorporates the notion of depth-separable convolution, which further enhances the coding performance. In the literature [30]. To achieve the objective of making decisions about CU depth, a neural network architecture is developed and a dataset is created. Moreover, Data consisting of both image and vector formats, which contain information about PU decisions are utilized to enhance the prediction accuracy of the network. In the study referenced by [31], a multi-level output CNN (MLE-CNN) was employed for making decisions on CU segmentation. To achieve a high RD performance while minimizing complexity, the researchers created an adaptable loss function and a decision system with a variable threshold. In the study referenced by [32], a two-stage fast CU split algorithm is proposed. The first stage involves using a multi-branch CNN to extract features for two purposes: predicting the QT depth and determining the need for TT. In the second stage, the prediction information is used to prune potential combinations of CU split, which significantly reduces coding complexity. Additionally, the CNN’s training parameters are reduced by leveraging the MobilenetV2 network structure. In the literature [33], the proposed algorithm utilizes a variable pooling layer CNN to make intra-frame adaptive CU split decisions, enabling the algorithm to make split decisions for CUs of various sizes through the implementation of an adaptive pooling layer. In the literature [34], In the proposed neural network, a 64 × 64 CU is inputted, and the resulting output is a vector composed of 480 probability values, which correspond to boundary splits of length 4 within the CU. The CU split type is then determined by comparing these probability values to a set threshold value. In the literature [35], to reduce coding time and bypass the RDO process, a novel algorithm that utilizes both CNN and random forest classifier (RFC) has been developed. This algorithm predicts the depth and split type of a 32 × 32 CU by combining the strengths of CNN and RFC in a fast CU split decision process.

3. The Proposed Fast Intra Coding Algorithm

VVC adheres to the framework of the HEVC, which is a previous generation video coding standard. VTM utilizes a three-stage fast intra-frame mode decision. During the intra-frame prediction coding process, the coding tree unit (CTU) is initially divided into CUs of varying depth stages. Following this, N modes are chosen based on the Hadamard transform loss, and the sum of absolute transform differences is used to calculate the 67 intra-frame prediction modes. The most probable mode (MPM) table’s M modes are merged with the final prediction modes, and the optimal CU split mode is determined by comparing the RD loss of the parent CU and the sub-CU’s RD loss sums, as well as the RD loss comparisons obtained from various split modes. We design a CNN network-based optimal depth prediction for CU split, which can significantly reduce the RD loss calculation for multi-class depth combinations, which will effectively reduce the computational complexity of coding.

3.1. CU Depth Prediction Based on Spatiotemporal Combination

To simplify the CTU split process and minimize complications, VTM introduces a hybrid CU split mode. It is set that once CU has been multi-type tree (MTT) partitioned, no more quadtree split will be performed inside that CU. From Figure 1, we can see that CU of size 8 × 8 is no longer selected for the QT split mode, so we only consider the depth between size 128 × 128 and size 8 × 8. Therefore, we decide to divide the CTU of size 128 × 128 into sub-CU blocks of size 8 × 8, and then perform the optimal depth prediction for these sub-CUs. By dividing, we can get 16 × 16 blocks of sub-CUs of size 8 × 8.

To predict the split depth of the CU, we must extract the depth information for the CU block itself, as well as for the adjacent CU blocks, which will serve as one of the features. The neural network training needs sufficient number of features and data, because the adjacent frames of the video are continuous with strong correlation, and most of the regions can be considered as background regions, so the probability that the split depths of the corresponding CU blocks of two frames are the same is high. In order to enhance the accuracy of our model’s predictions, we incorporate the depths of the encoded CU blocks from the previous frame and the adjacent encoded CU blocks as features.

To establish the coordinates of each 8 × 8 CU block, we assign (x, y, t) to represent the position of the block in the frame, where (x, y) indicates its spatial location and t denotes the frame number or time. Then the depth information required to extract the block is as follows.

I n f o r B = \{\begin{matrix} I n f o r B (x + Δ x, y + Δ y, t), i f Δ x \leq 0 & Δ y < 0, \\ I n f o r B (x + Δ x, y + Δ y, t - 1), o t h e r w i s e \end{matrix}

(1)

where the range of

Δ x

is [−2, 2] and the range of

Δ y

is [−2, 2]. In other words, for the CU block of the current frame, we extract the upper and upper-left neighboring blocks of the current block, as well as the corresponding block from the previous frame’s CU. We extract the corresponding block and its left side block, as well as the three blocks below and three blocks at the right side, which form a 5 × 5 size depth block.

We integrate the collected depth information into the CNN network for training, and the input is a depth map of the size of 5 × 5, and output is the predicted prediction out of the composed depth map. We design the CNN network from the most primitive simple network in order to obtain a smaller computational complexity.

3.1.1. Network Infrastructure

Our proposed CNN network contains 2 convolutional layers, a flatten layer and a Dense layer. Figure 2 shows the specific framework of our proposed CNN. First, a depth map of size 5 × 5 is input to the network, and then it enters the channel to extract features, first by a 1 × 1 size convolution kernel and ReLu layer, and then by a 3 × 3 size convolution kernel for convolution to extract depth features, and 12 depth feature maps of size 5 × 5 are extracted.

Then the extracted 12 feature maps need to be fused by pulling all feature maps into a vector of length 300 through a flatten layer to extract richer depth features, and concatenating each depth unit and quantization parameter (QP) value in a fully-connected layer, and then passing through a Dense network with two implicit layers and a SoftMax layer to obtain a depth vector of length 64 size depth vector and QP values are fused into a vector, and the predicted depth is output through a second fully connected layer and a Dense layer. Since we only consider the depth of CU blocks larger than 8 × 8 size, our prediction depth range is [1,6] with a total of 6 prediction probabilities, and then the prediction depth with the highest probability is chosen as the best prediction depth of CU.

3.1.2. Model Training

We choose JVET general configuration CTC for testing. Table 1 contains the number of frames, frame rate and the bit depth of video sequences. The dataset is divided into 7 classes, each with different resolutions. Class A1 contains 3 test sequences with 3840 × 2160 resolution. Class A2 consists of three test sequences that have a resolution of 3840 × 2160. Class B comprises of five test sequences with a resolution of 1920 × 1080, while Class C has four test sequences with a resolution of 832 × 480. In Class D, there are three test sequences with a resolution of 416 × 240, whereas Class E has four test sequences with a resolution of 1280 × 720. Finally, Class F includes four test sequences, each with a resolution of either 832 × 480 or 1280 × 720.

We selected four representative QP values (22, 27, 32, 37), and compressed video list by VTM10.0. Firstly, we compress the dataset by traditional methods, and classify the depth information of each CU generated into a depth map. In order to enhance the model’s ability to generalize, we designated 70% of the data as the training set and allocated 30% of the data as the test set. At the same time, we opted for the Adam optimizer and set the learning rate to 0.0001. Due to the faster decrease of the cross-entropy function during learning and its convex optimization properties, it helps prevent the issue of getting stuck in local optima, and it is widely used in classification problems and shows excellent performance, so the loss function of the model is selected as the cross-entropy function.

We first trained the model and analyzed the prediction accuracy of the test set. Table 2 shows the prediction accuracy, recall, specificity, and accuracy of our proposed model. The excellent performance of our model is evident, where the average prediction accuracy reaches 88.6%, recall is 88.9%, specificity is 95.4%, and accuracy is 94.3%. 89.32% of depth values are accurately predicted. And the wrong prediction depths are kept in the range of [−1,1], where wrong prediction indicates the number of predicted values different from the actual value as a percentage of the total number of predictions.

3.1.3. The Depth Prediction

The model prediction will inevitably have some wrong predictions, and if these wrong depth predictions are ignored, the accumulation will have a bad effect on the subsequent RD cost calculation and the in-frame mode prediction. Therefore, in order to avoid these cases and improve the robustness of the model, we will add some noise to the predicted depth values to adjust the impact of the wrong prediction.

\hat{D} (x, y) = \tilde{D} (x, y) + \max {⌊\frac{1}{K_{D}} \sum_{k = 1}^{K_{D}} (D_{k} (x, y) - {\tilde{D}}_{k} (x, y)) + \frac{1}{2}⌋, 0}

(2)

where,

D_{k} (x, y)

represents the true depth value of CU block,

{\tilde{D}}_{k} (x, y)

represents the predicted depth value, and

K_{D}

represents the number of CU blocks in the depth map, where

K_{D}

= 25.

\hat{D} (x, y)

represents the predicted depth value after the noise processing. Through processing, we find that the subsequent effects caused by the accumulation of incorrect predictions are effectively handled and the robustness of the model is greatly enhanced. After adjustment, the optimal prediction depth for an 8 × 8 size CU block can be obtained. The depth value is derived from the following equation.

D_{0} = \max_{{x, y} \in C U} {\hat{D} (x, y)}

(3)

After the depth prediction, during the encoding process, only the CU is divided to the optimal depth, and then the RD calculation and subsequent operations are performed, which greatly reduces the amount of computational data and computational complexity, and also ensures the video quality.

3.2. CU Split Mode Decision Based on Decision Tree

In the coding process, the CTU is iteratively divided to the minimum size according to various possible split combinations, where only the optimal depth of the split value is required, and then the RD loss is calculated for all possible split mode combinations for the sub-CU and the parent CU, and the smallest value corresponding to the split mode is selected as the optimal split method by comparing the sum of the RD loss value of the parent CU and the RD loss value of the corresponding sub-CU. In VTM, a mixed split mode is added, which includes not only quadtree split, but also horizontal binary tree split, vertical binary tree split, horizontal trinomial tree split and vertical trinomial tree split. This undoubtedly greatly increases the number of RD loss calculations and increases the computational complexity, and it is necessary to predict the CU split modes to reduce the redundant calculations.

CU split mode decision has been studied extensively, usually converting CU split mode decision into a classification problem, and the commonly used classifiers are neural network, SVM, decision tree, etc. Different classifiers adapt to different scenarios according to their internal structures. SVM classifies different categories by mapping discrete and disordered data with status to a high-dimensional space through kernel functions, and classifies different categories through hyperplanes, which has little structural risk, relatively stable and less risk of overfitting compared to other classifiers. In the case of a small number of samples, many models are prone to overfitting and fail to produce correct conclusions, when the SVM classifier can be adopted for classification, but at the same time, the problem is that the computational complexity of SVM increases significantly when the number of samples is large. Compared with this problem, neural networks do better. Convolutional neural networks are often used in classification problems, which can extract features from things with a certain model and then classify things according to the extracted features. However, the pooling layer may lose a large amount of feature information and cannot meet the demand of local-to-whole correlation, and its gradient descent method is prone to overfitting. Decision tree is a tree structure to solve the classification problem, the non-leaf nodes are the selected attributes and the leaf nodes are the classification results, it is usually used for discrete data classification, it is not good at dealing with data sets with missing values and may be accompanied by overfitting phenomenon, however, when decision tree faces a lot of feature selection problems of data sets, it can select the more relevant feature attributes by information gain, so as to remove the redundant or blank attributes. The decision tree can handle data sets containing both data-based and regular attributes well, and can scale well to large databases, independent of the size of the database.

There is no doubt that decision trees are the best choice for us when we are making decisions about CU split modes accompanied by a large amount of data and at the same time want to produce a small computational complexity.

Various features including CU variance, CU gradient kurtosis, CU color number, and color contrast were examined to choose characteristic features for the new QTMT framework. Additionally, local features and CU size-related features were taken into account to accommodate multiple types of feature tree partitions. At the same time, attention should be paid to the computational complexity of feature computation to avoid computational overhead. Based on these principles, the algorithm being suggested ultimately utilizes three categories of features: global texture information, local texture information, and contextual information.

(1) Global texture information: HEVC often uses global texture information to describe the content complexity of CU blocks, and it also performs well in VVC. We choose a total of 8 sets of attributes to join the decision tree model. These are CU block size, normalized gradient

x_{N G}

, normalized maximum gradient magnitude

x_{M N G}

, average gradient in horizontal direction, average gradient in vertical direction, QP value, variance VarPix of luminance samples, and gradient ratio

x_{G R}

. The normalized gradient is composed of horizontal and vertical gradients and is calculated by the Sobel operator.

G_{x} = \sum_{i = 1}^{W} \sum_{j = 1}^{H} A * [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}]

(4)

G_{y} = \sum_{i = 1}^{W} \sum_{j = 1}^{H} A * [\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}]

(5)

x_{N G} = \frac{| G_{x} | + | G_{y} |}{N}

(6)

where A is the luminance pixel matrix,

G_{x}

is the normalized horizontal gradient,

G_{y}

is the normalized vertical gradient, W and H are the width and height of the CU block, respectively.

(2) Local texture information: Since VTM adds non-square CU blocks, it needs to describe its texture properties by local texture information. We choose five features to add to our attribute set, which are the brightness fast variance of sub-CU, the difference between the upper and lower parts of CU

x_{U B D}

, the difference between the left and right parts of CU

x_{L R D}

, and the mean, variance and gradient of

x_{U B D}

and

x_{L R D}

(6 features), and the brightness-based features of

x_{U B D}

and

x_{L R D}

(3 features).

Let the texture features of CU be denoted as

f

, the upper left corner as

f_{1}

, the upper right corner as

f_{2}

, the lower left corner as

f_{3}

, and the lower right corner as

f_{4}

.

x_{U B D} = | f_{1} - f_{3} | + | f_{2} - f_{4} |

(7)

x_{L R D} = | f_{1} - f_{3} | + | f_{2} - f_{4} |

(8)

(3) Contextual information: Because of the spatial correlation of video contents, the partition structure of CTU is almost identical or highly similar to the partition structure of spatially adjacent CU. Consequently, we have integrated two features of the CTU structure decision method from the adjacent CUs, which determine the QTMT depth (QTMTD) of a CU by combining the quadtree depth (QD) and multi-type tree depth (MD). Specifically, one contextual feature worth noting is that the QD of the current CU is greater than the QD of its neighboring CUs. The neighboring CU comprise those situated above, to the left, below left, above right, and above left of the present computing unit. Additionally, its (QTMTD) is greater than the count of adjacent CU of the current one. A larger binary means that the current CU is more likely to be divided into smaller sub-CUs, while a smaller binary means that the split of the current CU is likely to be terminated.

After the features are selected, the features with small relevance are removed by the same information gain, and the remaining features are used as attributes for CU split mode decisions. According to our previous related work, we get the best prediction depth of CU block, and then start to divide the CTU block by first dividing the CTU of 128 × 128 size by quadtree to get 4 sub-CUs of 64 × 64 size, and then enter into the process of our proposed algorithm to firstly determine whether the best depth predicted by this CU is 1 or whether it is 2. If the prediction depth is equal to 1 or 2, then it means no split or quadtree split is needed, so the next piece of CU split is performed directly. If the depth is greater than 2, it starts to judge a total of 5 split modes of CU, which are quadtree split, binary tree horizontal split, binary tree vertical split, trinomial tree horizontal split and trinomial tree vertical split, iteratively to the best prediction depth. Considering the possible wrong prediction, we need to set a threshold to weigh the results obtained from the split mode of traditional coding and the decision tree prediction to prevent the serious consequences caused by the superposition of wrong prediction, therefore, we set the confidence level to 0.9, when the prediction probability exceeds 0.9, we choose the model prediction as the current CU split mode, otherwise, the CU re-enters the traditional coding for CU split mode decision. At the same time, to reduce the possibility of overfitting, we prune the decision tree to reduce the risk of overfitting and improve the robustness of the model.

3.3. Framework of the Proposed Algorithm

With the above work, we propose two methods to combine the prediction of CU split modes to reduce the huge computational complexity caused by VTM. First, our proposition is to utilize a CNN-based probabilistic model for determining the ideal depth of CU, thereby significantly decreasing the RD cost calculation and avoiding needless and tedious CU split processes during intra-frame prediction. And further, we propose a decision tree based CU split mode decision, which combined with the predicted optimal depth can again reduce the calculation of RD cost in the CU split process. After predicting the depth, it is possible for us to ascertain whether the CU should be split at the present depth or not, so we do not need to consider the possibility of not dividing it in making CU split mode decision, and can estimate it by the depth, which further reduces the complexity of the decision tree.

In the coding process, we need to train the CNN model, first we need to get the depth information of the divided CU by traditional way, whose information can be obtained in the generated vtmbmsstats file, and then the obtained depth information is collated and fused into a depth map to be transported to the CNN model, through the multi-scale model training, we get six prediction probabilities ranging from 1 to 6, and the These probabilities are regularized, and he best prediction depth is determined by selecting the depth with the highest probability. Then we start to make decisions on CU split modes based on the first algorithm, first we need to train the decision tree model and use the attributes with higher relevance as the judgment conditions of CU split modes through information gain, after training the decision tree model, the specific process can be seen in Algorithm 1.

Algorithm 1. The Proposed Algorithm for Fast Decision-making of CU Split Modes.

Require:

The number of adjacent CUs and N_v that are valid; The validity of adjacent frame;

Ensure:

Optimal predictable depth of CUs, odepth; Optimal split mode of CUs, CUsplit mode;

1:: Verify the validity of the adjacent CUs and frame, and if they are found to be invalid, halt the proposed algorithm;
2:: Collect the depth information of the adjacent CUs and adjacent CUs of adjacent frame: InforB;
3:: Obtain the predictable depth of current CU by training CNN model: D₀;
4:: Obtain the optimal predictable depth of current CU by adding noise process: odepth;
5:: Encoding the current CU. Determine whether odepth is less than 2, and if not, encode the next CU;
6:: Set the initial depth to 2: Currdepth;
7:: Training the DT mode and get the predictable split mode: CUsplit mode;
8:: Determine whether Currdepth is equal to odepth, if equal, encode the next CU, otherwise, make odepth plus 1, and go to step 7.

Figure 3 is the overall flowchart of the two algorithms, where

P_{Q}

,

P_{B H}

,

P_{B V}

,

P_{T H}

,

P_{T V}

is the probability of the result of each of the five division mode classifiers. Firstly, determine whether the best depth of this CU is equal to 1 or equal to 2, if yes, then proceed to the next CU encoding directly, otherwise proceed to the next operation. Next, the current depth is initialized to 2, and then the CU split mode is predicted by the trained decision tree model. After the prediction is completed, the current depth is judged by the confidence level whether it is equal to the best prediction depth of this CU, and if it is not equal, the CU is first divided, and the split mode is predicted for the corresponding sub-CUs, and the above steps are iterated until the best depth.

4. Experimental Results

Our proposed algorithm’s effectiveness in reducing the computational complexity of VVC is evaluated in this section through experimental software tests. In IV-A, we experimentally compare the proposed method with state-of-the-art experiments to evaluate the RD performance and complexity. In IV-B, we assess the performance of the proposed two algorithms separately when combined with the conventional method. Our experiments are conducted on VTM10.0 software running on an 11th Gen Intel(R) Core (TM) i7-11800H @ 2.30 GHz platform in All-Intra configuration.

4.1. Comparison Results with Other Benchmarks

In this subsection, we choose the JVET Common Configuration CTC for testing, and the video sequences’ frame count, frame rate, and bit depth are listed in Table 3. We assess the performance of each algorithm by calculating the average time saved (ATS) and Bjøntegaard delta bit rate (BDBR) for all test sequences, where BDBR indicates the code rate savings of different methods at the same objective quality.

According to the findings, the suggested algorithm significantly decreases the computational complexity, resulting in an average reduction of 53.92%, while also improving the BDBR by 1.74%. Our proposed algorithm achieves the best time savings of 65.4% in Kimono sequences and the lowest time savings of 38.02% in BlowingBubbles sequences. According to the statistical analysis, the depth division prefers CU with complex texture, and the deep division is often no longer performed for regions with smooth texture. the Kimono sequence has smoother texture, so there is a large amount of redundant calculation of RD cost in CU division decision through the process of traditional encoding process, and the complexity caused by redundant calculation is greatly reduced by our algorithm, so the ATS of this sequence is higher. In contrast, compared with BlowingBubbles sequence, the image detail is more, the texture is more complex, and the CU division level is deeper, so the redundancy computation is reduced relatively less by our proposed optimal depth prediction algorithm, and therefore the ATS of this sequence is lower.

We test and compare with ResNet, DenseNet, Fan, Huang and Li. Table 3 presents the data we used, which shows that the algorithm proposed by ResNet achieves an average time saving of 55.02% and a corresponding BDBR improvement of 1.83%. On the other hand, the algorithm proposed by DenseNet results in an average time saving of 51.84% and a BDBR increase of 1.711%, and the algorithm proposed by Fan’s algorithm saves 48.49% of average time and BDBR grows 1.65%, Huang’s proposed algorithm saves 30.976% of average time and BDBR grows 0.528%, the algorithm proposed by Li’s algorithm saves 23.19% of average time and BDBR grows 0.97%, relatively speaking, our algorithm saves 53.92% of average time and BDBR increased by 1.74%.

Compared to the first ranked algorithm of ResNet, our BDBR is 0.09% less, compared to the algorithm proposed by DenseNet, we save 2.08% more average time, compared to the algorithm of Fan, we save 15.43% more average time, compared to the algorithm of Huang, we save 22.944% more average time, compared to the algorithm of Li, we save 30.732% more average time. Figure 4 provides a better visualization of the ATS and BDBR effects of all algorithms, we can conclude that Our algorithm has outstanding performance and is superior compared to these algorithms, and our structure is lightweight and easy to understand and implement.

4.2. Additional Analysis

In this subsection, Ablation experiments are conducted to assess the performance of the two suggested algorithms individually. First, the DPSC algorithm is employed to predict the optimal CU depth, following which the conventional RDO process is utilized for the CU split. Then we remove the first model without depth prediction and directly use a decision tree for the CU split mode. And the data are analyzed separately for the three combinations, and Table 4 records the ATS and BDBR under the three combinations.

From Table 4, it can be concluded that the combination of DPSC algorithm and conventional coding saves 47.32% of coding time and BDBR loses only 1.51%. The combination of SDDT algorithm and conventional coding saves 44.19% of coding time and BDBR loses only 1.62%.

Our proposed algorithm exhibits outstanding performance in comparison to Class B, C, D, and E, as demonstrated by the results of Figure 5, the algorithm performs better in Class A1, A2 and B. The DPSC algorithm achieves the largest ATS of 58.65% in the Kimono sequence and only 1.38% increase in BDBR, and the smallest ATS of 31.59% in the BlowingBubbles sequence and only 1.26% increase in BDBR. In the BasketballDrive sequence, the SDDT algorithm attains the highest ATS of 55.84% with a BDBR increase of merely 1.80%, while in the BlowingBubbles sequence, it records the lowest ATS of 33.39% with a BDBR increase of merely 1.36%.

5. Conclusions

Our proposal in this paper is to utilize a CNN network-based probabilistic model to predict the split depth of CU blocks and minimize the need for numerous unnecessary RDO processes. On this basis, we suggest using a decision tree-based decision algorithm for CU split mode, which not only lowers the computational complexity but also guarantees the video’s quality. According to the experimental results, the proposed algorithm can achieve a reduction of approximately 53.92% in computational complexity and incurs a RD performance loss of only 1.74% when compared to VTM10.0. The findings indicate the efficacy of our suggested algorithm. Furthermore, we will strive to enhance the model’s architecture, expand our video sequence database, and establish a repository to enhance the model’s performance.

Author Contributions

Conceptualization, C.Z. and W.Y.; methodology, C.Z.; software, W.Y.; validation, C.Z., Q.Z. and W.Y.; formal analysis, W.Y.; investigation, W.Y.; resources, Q.Z.; data curation, W.Y.; writing—original draft, W.Y.; writing—review and editing, C.Z.; visualization, C.Z.; supervision, Q.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China No.61771432, 61302118, and 61702464, the Basic Research Projects of Education Department of Henan No.21zx003, and No.20A880004, the Key Research and Development Program of Henan No.222102210156, and the Postgraduate Education Reform and Quality Improvement Project of Henan Province YJS2021KC12 and YJS2022AL034, the Science and Technology Research Project of Henan Province No. 222102210096, the Doctoral Research Start-up Fund of Zhengzhou University of Light Industry 2020BSJJ067.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nguyen, T.; Marpe, D. Future Video Coding Technologies: A Performance Evaluation of av1, jem, vp9, and hm. In Proceedings of the Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 31–35. [Google Scholar]
Yousfi, R.; Omor, M.B.; Damak, T.; Ayed, M.A.B.; Masmoudi, N. JEM-post HEVC vs. HM-H265/HEVC Performance and Subjective Quality Comparison Based on QVA Metric. In Proceedings of the 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 21–24 March 2018; pp. 1–4. [Google Scholar]
Jin, D.; Lei, J.; Peng, B.; Li, W.; Ling, N.; Huang, Q. Deep affine motion compensation network for inter prediction in VVC. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3923–3933. [Google Scholar] [CrossRef]
Said, A.; Zhao, X.; Karczewicz, M.; Chen, J.; Zou, F. Position Dependent Prediction Combination for Intra-Frame Video Coding. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 534–538. [Google Scholar]
Mercat, A.; Mäkinen, A.; Sainio, J.; Lemmetti, A.; Viitanen, M.; Vanne, J. Comparative rate-distortion-complexity analysis of VVC and HEVC video codecs. IEEE Access 2021, 9, 67813–67828. [Google Scholar] [CrossRef]
Yang, H.; Shen, L.; Dong, X.; Ding, Q.; An, P.; Jiang, G. Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1668–1682. [Google Scholar] [CrossRef]
Qian, X.; Zeng, Y.; Wang, W.; Zhang, Q. Co-saliency detection guided by group weakly supervised learning. IEEE Trans. Multimed. 2022, 24, 1. [Google Scholar] [CrossRef]
Zhao, J.; Wu, A.; Jiang, B.; Zhang, Q. ResNet-Based Fast CU Partition Decision Algorithm for VVC. IEEE Access 2022, 10, 100337–100347. [Google Scholar] [CrossRef]
Gu, J.; Tang, M.; Wen, J.; Han, Y. Adaptive intra candidate selection with early depth decision for fast intra prediction in HEVC. IEEE Signal Process. Lett. 2017, 25, 159–163. [Google Scholar] [CrossRef]
Menon, V.V.; Amirpour, H.; Timmerer, C.; Ghanbari, M. INCEPT: Intra CU Depth Prediction for HEVC. In Proceedings of the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 6–8 October 2021; pp. 1–6. [Google Scholar]
He, S.Q.; Deng, Z.J.; Shi, C. Fast Decision of CU Size Based on Texture Cost and Non-texture Cost for HEVC Intra Prediction. In Proceedings of the IEEE 21st International Conference on Communication Technology (ICCT), Tianjin, China, 13–16 October 2021; pp. 1162–1166. [Google Scholar]
Fengwei, G.; Yong, C.; Shuai, X. Fast Algorithm Design of HEVC Intra Prediction. In Proceedings of the International Conference on Innovations and Development of Information Technologies and Robotics (IDITR), Chengdu, China, 27–29 May 2022; pp. 38–42. [Google Scholar]
Tun, E.E.; Aramvith, S.; Onoye, T. Low complexity mode selection for H. 266/VVC intra coding. ICT Express 2022, 8, 83–90. [Google Scholar] [CrossRef]
Sulochana, V.; Shanthini, B.; Harinath, K. Fast Intraprediction Algorithm for HEVC Based on Machine Learning Classification Technique. In Proceedings of the IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 23–24 April 2022; pp. 1–8. [Google Scholar]
Amestoy, T.; Mercat, A.; Hamidouche, W.; Menard, D.; Bergeron, C. Tunable VVC frame partitioning based on lightweight machine learning. IEEE Trans. Image Process. 2019, 29, 1313–1328. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Wang, Y.; Zhang, Q. Fast CU Size Decision Method Based on Just Noticeable Distortion and Deep Learning. Sci. Program. 2021, 2021, 3813116. [Google Scholar] [CrossRef]
Liu, Y.; Wei, A. A CU Fast Division Decision Algorithm with Low Complexity for HEVC. In Proceedings of the IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; Volume 1, pp. 1028–1032. [Google Scholar]
Jamali, M.; Coulombe, S.; Sadreazami, H. CU Size Decision for Low Complexity HEVC Intra Coding Based on Deep Reinforcement Learning. In Proceedings of the IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, MA, USA, 9–12 August 2020; pp. 586–591. [Google Scholar]
Heindel, A.; Haubner, T.; Kaup, A. Fast CU split decisions for HEVC inter coding using support vector machines. In Proceedings of the Picture Coding Symposium (PCS), Nuremberg, Germany, 4–7 December 2016; pp. 1–5. [Google Scholar]
Xu, M.; Li, T.; Wang, Z.; Deng, X.; Yang, R.; Guan, Z. Reducing complexity of HEVC: A deep learning approach. IEEE Trans. Image Process. 2018, 27, 5044–5059. [Google Scholar] [CrossRef]
Chen, Z.; Shi, J.; Li, W. Learned fast HEVC intra coding. IEEE Trans. Image Process. 2020, 29, 5431–5446. [Google Scholar] [CrossRef]
Zaki, F.; Mohamed, A.E.; Sayed, S.G. CtuNet: A deep learning-based framework for fast CTU partitioning of H265/HEVC intra-coding. Ain Shams Eng. J. 2021, 12, 1859–1866. [Google Scholar] [CrossRef]
Li, Y.; Li, L.; Zhuang, Z.; Fang, Y.; Yang, Y. ResNet Approach for Coding Unit Fast Splitting Decision of HEVC Intra Coding. In Proceedings of the IEEE Sixth International Conference on Data Science in Cyberspace (DSC), Shenzhen, China, 9–11 October 2021; pp. 130–135. [Google Scholar]
Yingmin, Y.; Zhaoyang, Z.; Yiwei, Y.; Xianghong, X.; Yuxing, L. Fast Intra Mode Decision Algorithm of HEVC Based on Convolutional Neural Network. In Proceedings of the 2nd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS), Shenyang, China, 25–27 February 2022; pp. 76–79. [Google Scholar]
Li, T.; Xu, M.; Tang, R.; Chen, Y.; Xing, Q. DeepQTMT: A deep learning approach for fast QTMT-based CU partition of intra-mode VVC. IEEE Trans. Image Process. 2021, 30, 5377–5390. [Google Scholar] [CrossRef]
Zhao, T.; Huang, Y.; Feng, W.; Xu, Y.; Kwong, S. Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation. IEEE Trans. Multimed. 2022, 24, 1–11. [Google Scholar] [CrossRef]
Zhao, J.; Dai, P.; Zhang, Q. A complexity reduction method for VVC intra prediction based on statistical analysis and SAE-CNN. Electronics 2021, 10, 3112. [Google Scholar] [CrossRef]
Zhang, S.; Feng, S.; Chen, J.; Zhou, C.; Yang, F. A GCN-based fast CU partition method of intra-mode VVC. J. Vis. Commun. Image Represent. 2022, 88, 103621. [Google Scholar] [CrossRef]
Guo, X.; Wang, Q.; Jiang, J. A Lightweight CNN for Low-Complexity HEVC Intra Encoder. In Proceedings of the IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), Kunming, China, 3–6 November 2020; pp. 1–3. [Google Scholar]
Kim, K.; Ro, W.W. Fast CU depth decision for HEVC using neural networks. IEEE Trans. Circuits Syst. Video 2018, 29, 1462–1473. [Google Scholar] [CrossRef]
Javaid, S.; Rizvi, S.; Ubaid, M.T.; Tariq, A. VVC/H. 266 intra mode QTMT based CU partition using CNN. IEEE Access 2022, 10, 37246–37256. [Google Scholar] [CrossRef]
Fu, P.C.; Yen, C.C.; Yang, N.C.; Wang, J.S. Two-Phase Scheme for Trimming QTMT CU Partition Using Multi-Branch Convolutional Neural Networks. In Proceedings of the IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), Washington, DC, USA, 6–9 June 2021; pp. 1–6. [Google Scholar]
Tang, G.; Jing, M.; Zeng, X.; Fan, Y. Adaptive CU Split Decision with Pooling-Variable CNN for VVC Intra Encoding. In Proceedings of the IEEE Visual Communications and Image Processing (VCIP), Sydney, NSW, Australia, 1–4 December 2019; pp. 1–4. [Google Scholar]
Tissier, A.; Hamidouche, W.; Vanne, J.; Galpin, F.; Menard, D. CNN Oriented Complexity Reduction of VVC Intra Encoder. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 3139–3143. [Google Scholar]
Huang, Y.H.; Chen, J.J.; Tsai, Y.H. Speed up H. 266/QTMT intra-coding based on predictions of ResNet and random forest classifier. In Proceedings of the IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 10–12 January 2021; pp. 1–6. [Google Scholar]
Zhang, Q.; Guo, R.; Jiang, B.; Su, R. Fast CU decision-making algorithm based on DenseNet network for VVC. IEEE Access 2021, 9, 119289–119297. [Google Scholar] [CrossRef]
Fan, Y.; Sun, H.; Katto, J.; Ming’E, J. A fast QTMT partition decision strategy for VVC intra prediction. IEEE Access 2020, 8, 107900–107911. [Google Scholar] [CrossRef]
Li, Y.; Luo, F.; Zhu, Y. Temporal Prediction Model-Based Fast Inter CU Partition for Versatile Video Coding. Sensors 2022, 22, 7741. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Analysis of the split mode under different size CU.

Figure 2. CNN structural framework.

Figure 3. The flowchart outlining the proposed intra-frame CU split mode decision of VVC.

Figure 4. Comparison of ATS and BDBR for all algorithms.

Figure 5. The outcomes of the ablation experiments with regards: (a) ATS; (b) BDBR.

Table 1. JVET Comment Test Conditions.

Class	Sequence Name	Resolution	Frame Count	Frame Rate	Bit Depth
A1	Campfire	3840 × 2160	300	30 fps	10
	Tango2	3840 × 2160	294	60 fps	10
	FoodMarket4	3840 × 2160	300	60 fps	10
A2	Catrobot	3840 × 2160	300	60 fps	10
	DaylightRoad2	3840 × 2160	300	60 fps	10
	ParkRunning3	3840 × 2160	300	50 fps	10
B	BasketballDrive	1920 × 1080	500	50 fps	8
	BQTerrace	1920 × 1080	600	60 fps	8
	Cactus	1920 × 1080	500	50 fps	8
	Kimono	1920 × 1080	240	24 fps	8
	ParkScene	1920 × 1080	240	24 fps	8
C	BasketballDrill	832 × 480	500	50 fps	8
	BQMall	832 × 480	600	60 fps	8
	PartyScene	832 × 480	500	50 fps	8
	RaceHorsesC	832 × 480	300	30 fps	8
D	BasketballPass	416 × 240	500	50 fps	8
	BlowingBubbles	416 × 240	500	50 fps	8
	BQSquare	416 × 240	600	60 fps	8
	RaceHorses	416 × 240	300	30 fps	8
E	FourPeople	1280 × 720	600	60 fps	8
	Johnny	1280 × 720	600	60 fps	8
	KristenAndSara	1280 × 720	600	60 fps	8
F	BasketballDrillText	832 × 480	500	50 fps	8
	ChinaSpeed	1280 × 720	500	30 fps	8
	SlideEditing	1280 × 720	300	30 fps	8
	SlideShow	1280 × 720	500	20 fps	8

Table 2. The Classification Performance of Our Depth Prediction Model.

Depth	Precision	Recall	Specificity	Accuracy
1	0.928	0.904	0.965	0.956
2	0.872	0.888	0.95	0.939
3	0.881	0.866	0.953	0.938
4	0.862	0.878	0.948	0.935
5	0.903	0.854	0.956	0.936
6	0.868	0.942	0.955	0.953
Average	0.886	0.889	0.954	0.943

Table 3. Result of the proposed algorithm is compared with other algorithm.

Class	Sequence	ResNet [8]		DenseNet [36]		Fan [37]		Huang [35]		Li [38]		Ours
Class	Sequence	ATS	BDBR	ATS	BDBR	ATS	BDBR	ATS	BDBR	ATS	BDBR	ATS	BDBR
A1	Campfire	62.16	2.07	71.34	2.21	-	-	51.083	1.87	42.08	1.98	55.92	1.81
	Tango2	54.46	2.32	76.53	2.41	-	-	69.450	1.258	21.06	0.98	65.06	2.07
	FoodMarket4	60.34	2.25	77.05	2.10	-	-	-	-	21.10	0.19	51.77	1.95
	Average	58.99	2.21	74.97	2.24	-	-	60.267	1.564	28.08	1.05	57.58	1.94
A2	Catrobot	56.21	2.51	75.11	2.85	-	-	44.736	0.853	18.85	0.86	61.22	2.33
	DaylightRoad2	60.54	1.87	77.78	1.93	-	-	45.583	0.747	18.88	1.15	61.48	1.77
	ParkRunning3	57.34	1.79	69.85	0.85	-	-	-	-	30.92	0.76	57.88	1.71
	Average	58.03	2.06	74.25	1.88	-	-	45.16	0.8	22.88	0.92	60.19	1.94
B	BasketballDrive	51.15	2.23	62.73	2.07	-	-	55.091	0.547	27.61	1.23	58.63	1.92
	BQTerrace	55.57	1.51	51.61	1.52	45.30	1.08	33.125	0.554	27.91	0.55	54.55	1.50
	Cactus	52.48	1.84	58.98	1.93	-	-	-	-	25.42	0.77	61.26	1.73
	Kimono	56.06	1.53	62.49	1.55	59.51	1.93	53.004	0.405	-	-	65.40	1.55
	ParkScene	55.73	1.64	60.24	1.68	51.84	1.26	32.094	0.348	-	-	57.94	1.65
	Average	54.2	1.75	59.21	1.75	52.22	1.42	43.329	0.464	26.98	0.85	59.56	1.67
C	BasketballDrill	51.94	1.87	34.40	2.28	48.48	1.82	29.895	1.048	30.12	1.30	47.27	1.79
	BQMall	52.19	1.52	36.82	1.58	-	-	30.774	0.58	25.25	1.35	49.5	1.52
	PartyScene	54.06	1.06	27.34	0.76	38.62	0.26	28.027	0.132	28.77	1.02	48.22	1.18
	RaceHorsesC	53.73	1.27	39.39	1.02	49.05	0.88	32.281	0.626	30.11	1.54	48.26	1.35
	Average	52.98	1.43	34.49	1.41	45.38	0.99	30.244	0.597	28.56	1.30	48.31	1.46
D	BasketballPass	49.15	1.87	23.50	0.98	-	-	27.031	0.515	18.63	0.99	41.98	1.76
	BlowingBubbles	52.13	1.43	16.60	0.419	40.35	0.47	22.081	0.124	21.88	1.07	38.02	1.46
	BQSquare	53.26	1.33	17.94	0.41	31.95	0.19	19.171	0.212	16.02	0.34	43.67	1.34
	RaceHorses	48.64	1.78	39.39	0.66	49.05	0.54	23.586	0.092	25.48	1.63	47.21	1.76
	Average	50.8	1.6	24.36	0.617	40.45	0.4	22.967	0.236	20.50	2.02	42.72	1.58
E	FourPeople	57.87	1.87	54.98	2.69	57.57	2.70	27.994	0.862	11.73	0.38	56.81	1.77
	Johnny	59.22	2.43	55.54	3.32	56.88	3.22	41.054	1.058	8.88	0.29	57.65	2.30
	KristenAndSara	56.31	2.24	50.79	2.42	55.11	2.78	35.702	0.859	9.57	0.44	56.57	1.98
	Average	57.8	2.18	53.77	2.81	56.52	2.9	34.917	0.926	10.06	0.37	57.01	2.02
Total Average		55.02	1.83	51.84	1.711	48.49	1.65	30.976	0.528	23.19	0.97	53.92	1.74

Table 4. Result of The Overall Algorithm Is Compared with Single Algorithms.

Class	Sequence	DPSC		SDDT		Overall
Class	Sequence	ATS	BDBR	ATS	BDBR	ATS	BDBR
A1	Campfire	49.17	1.64	41.5	1.72	55.92	1.81
	Tango2	58.63	1.82	41.8	1.95	65.06	2.07
	FoodMarket4	45.02	1.76	45.36	1.83	51.77	1.95
A2	Catrobot	54.79	2.06	35.57	2.21	61.22	2.33
	DaylightRoad2	54.73	1.53	52.66	1.66	61.48	1.77
	ParkRunning3	51.13	1.49	44.49	1.61	57.88	1.71
B	BasketballDrive	52.2	1.72	55.84	1.80	58.63	1.92
	BQTerrace	47.8	1.31	46.89	1.39	54.55	1.50
	Cactus	54.83	1.51	48.49	1.64	61.26	1.73
	Kimono	58.65	1.38	54.69	1.45	65.4	1.55
	ParkScene	51.51	1.41	48.43	1.53	57.94	1.65
C	BasketballDrill	40.52	1.57	39.01	1.67	47.27	1.79
	BQMall	43.07	1.35	47.06	1.42	49.5	1.52
	PartyScene	41.47	0.92	36.55	1.08	48.22	1.18
	RaceHorsesC	41.83	1.15	40.22	1.21	48.26	1.35
D	BasketballPass	35.23	1.53	36.67	1.66	41.98	1.76
	BlowingBubbles	31.59	1.26	33.39	1.36	38.02	1.46
	BQSquare	36.92	1.18	37.42	1.25	43.67	1.34
	RaceHorses	40.78	1.49	35	1.61	47.21	1.76
E	FourPeople	50.06	1.53	49.3	1.66	56.81	1.77
	Johnny	51.22	2	50.81	2.14	57.65	2.30
	KristenAndSara	49.82	1.74	50.92	1.81	56.57	1.98
Total Average		47.32	1.51	44.19	1.62	53.92	1.74

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, C.; Yang, W.; Zhang, Q. Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information. Electronics 2023, 12, 1967. https://doi.org/10.3390/electronics12091967

AMA Style

Zhang C, Yang W, Zhang Q. Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information. Electronics. 2023; 12(9):1967. https://doi.org/10.3390/electronics12091967

Chicago/Turabian Style

Zhang, Chaoqin, Wentao Yang, and Qiuwen Zhang. 2023. "Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information" Electronics 12, no. 9: 1967. https://doi.org/10.3390/electronics12091967

APA Style

Zhang, C., Yang, W., & Zhang, Q. (2023). Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information. Electronics, 12(9), 1967. https://doi.org/10.3390/electronics12091967

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast CU Division Pattern Decision Based on the Combination of Spatio-Temporal Information

Abstract

1. Introduction

2. Background and Related Works

3. The Proposed Fast Intra Coding Algorithm

3.1. CU Depth Prediction Based on Spatiotemporal Combination

3.1.1. Network Infrastructure

3.1.2. Model Training

3.1.3. The Depth Prediction

3.2. CU Split Mode Decision Based on Decision Tree

3.3. Framework of the Proposed Algorithm

4. Experimental Results

4.1. Comparison Results with Other Benchmarks

4.2. Additional Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI