A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN

Wang, Ting; Wei, Geng; Li, Huayu; Bui, ThiOanh; Zeng, Qian; Wang, Ruliang

doi:10.3390/electronics12092091

Open AccessArticle

A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN

by

Ting Wang

,

Geng Wei

^*,

Huayu Li

,

ThiOanh Bui

,

Qian Zeng

and

Ruliang Wang

School of Physics and Electronics, Nanning Normal University, Nanning 530100, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(9), 2091; https://doi.org/10.3390/electronics12092091

Submission received: 4 March 2023 / Revised: 26 April 2023 / Accepted: 29 April 2023 / Published: 4 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

Among a series of video coding standards jointly developed by ITU-T, VCEG, and MPEG, high-efficiency video coding (HEVC) is one of the most widely used video coding standards today. Therefore, it is still necessary to further reduce the coding complexity of HEVC. In the HEVC standard, a flexible partitioning procedure entitled “quad-tree partition” is proposed to significantly improve the coding efficiency, which, however, leads to high coding complexity. To reduce the coding complexity of the intra-frame prediction, this paper proposes a scheme based on a densely connected convolution neural network (D-CNN) to predict the partition of coding units (CUs). Firstly, a densely connected block was designed to improve the efficiency of the CU partition by fully extracting the pixel features of CTU. Then, efficient channel attention (ECA) and adaptive convolution kernel size were applied to a fast CU partition for the first time to capture the information of the D-CNN convolution channels. Finally, a threshold optimization strategy was formulated to select the best threshold for each depth to further balance the computation complexity of video coding and the performance of RD. The experimental results show that the proposed method reduces the encoding time of HEVC by 60.14%, with a negligible reduction in RD performance, which is better than the existing fast partitioning methods.

Keywords:

HEVC; D-CNN; intra-prediction; deep learning; coding unit partition

1. Introduction

To transmit massive amounts of video data efficiently at the best bit rate, video transmission standards are constantly being developed, such as advanced video coding, AVC/H.264 [1]; high-efficiency video coding, HEVC/H.265 [2]; and a new generation of multi-function video coding standard, VVC/H.266 [3]. All these video coding standards apply very flexible block coding and introduce many efficient coding tools, making the complexity of various video coding standards increase dramatically. Taking HEVC and VVC as an example, HEVC adopts the quad-tree based CU recursive partition method for the CU. However, VVC uses not only the quad-tree structure, but also the binomial and trinomial tree partition structures for the best CU, which makes the intra-frame coding complexity, on average, 18 times higher than that of HEVC [4]. If the brute-force search rate-distortion (RD) optimization is adopted, the complexity will be even higher. Therefore, the high coding complexity limits the application of VVC standards in real-time scenarios. Furthermore, the deep learning-based acceleration algorithms applied to encoders usually require the support of image processing units (GPUs), and among the hardware GPUs, those supporting HEVC are much more mature and popular than that of VVC [5]. Therefore, HEVC is much more widely used than VVC in industrial applications [6]. However, Both HEVC and VVC adopt block-based encoding structures, with the only difference being in specific technologies. Moreover, the CTU partition structure of HEVC is much simpler than that of VVC and the most classical algorithms are now mainly used in HEVC, which makes it easier to verify the feasibility of the proposed algorithms and compare the performance of the proposed algorithms. Therefore, without loss of generality, to verify the effectiveness of the proposed algorithm, we chose HEVC for verification, and, thus, investigate the problem of reducing the complexity of HEVC coding.

Specifically, a more flexible quad-tree partition procedure is used in HEVC to improve the coding efficiency of intra-frame prediction and inter-frame prediction, including prediction units (PU), transform units (TU) [7], and coding units (CU). Each frame of the image is divided into the coding tree units (CTUs) [8] in the HEVC, and then the CTU is further divided into multiple CUs according to the quad-tree structure. To determine the best way to partition the coding unit, the HEVC introduces a recursive search method for the CU from 64 × 64 to 8 × 8 by traversing all the CU of different sizes in the CTU and calculating its rate-distortion (RD) cost [9]. However, the recursive CU partition method based on the quad-tree accounts for more than 80% of the coding time [10]. Therefore, it is crucial to optimize the CU partition method and reduce the HEVC coding complexity under the premise of ensuring video quality.

In the last few years, many fast methods for HEVC intra-coding have emerged. These methods can be mainly categorized into heuristic techniques and machine learning techniques. Heuristic-based methods [11,12,13,14,15,16] rely on artificially-specified empirical features. The CTU partition is described by hand-crafting the texture complexity of the CTU and the spatial correlation features with the neighboring CTU to save computational effort. For example, Chen et al. [16] proposed a feature to measure the complexity of video content and established a CTU depth prediction model based on feature analysis. Moreover, a fast pattern decision algorithm based on the RDO procedure is proposed based on the CU partition structure. However, such approaches cannot extract deeper features in the CU partition process and predict the CU size decisions accurately enough for non-uniform sequences. This inaccuracy is caused by the sensitivity of the defined descriptive features used.

Machine learning techniques used to optimize CU partitions overcome the drawbacks of heuristic approaches. Machine learning methods, especially those based on support vector machines (SVMs) and convolutional neural networks (CNNs), have been widely used to accelerate the CTU partition. Both SVMs and CNNs are aimed at reducing the HEVC computation complexity to the maximum extent of reducing the quality loss of video content. However, due to the inflexibility of the decision boundary in HEVC video coding, the SVM methods [17,18,19] cannot control the complexity during coding. Some scholars extend single-classification SVM to multi-classifications SVM to partition CTU. Liu et al. [20] suggested using three SVM classifiers to predict CTU partitions. Amna et al. [21] proposed a fuzzy support vector machine (FSVM) method to organize three binary classifiers together in a cascade manner to achieve accurate partition accuracy under the condition of obtaining sufficient features. The SVM methods saved the encoding time of the encoders, despite the fact that they have limited learning capability and the prediction accuracy cannot be guaranteed. CNN-based methods can automatically extract features and be superior in predictive power compared to the SVM methods. Therefore, CNN-based methods are gradually becoming widely used in the CU partition process and in various research areas [22,23,24].

Recently, Some CNN-based methods [25,26,27,28,29,30,31,32,33,34,35] for predicting the CU partition have been proposed. To enhance the feature representation of images, CNNs were used to automatically extract features related to the CU partition. Liu et al. [25] first used a CNN to achieve the optimization of the HEVC in hardware and software. Cui et al. [26] first directly applied a CNN to intra-frame prediction of HEVC, and mainly proposed an IPCNN network to predict CU partition. Schiopu et al. [27] first used lossless video coding based on deep learning techniques applied to intra-frame prediction and, thus, replaced all traditional HEVC based angular intra prediction models. Jin et al. [28] reduced CU partition complexity by directly predicting the depth range of 32 × 32 blocks for the first time using the quad-tree plus binary-tree (QTBT) partition ranges as a framework for a multi-classification task. Similarly, Wang et al. [29] designed a QTBT partitioning decision algorithm based on a CNN. The above algorithm has obtained good results in terms of reduced coding complexity through more flexible coding cell partition shapes. Moreover, Jamali et al. [30] used deep learning to reduce the coding complexity of HEVC intra-frame coding. Amna et al. [31] proposed a fast intra-frame encoding method for HEVC based on LeNet5. However, its CNN structure is slightly simpler, and it still shows much improvement for learning complex CU partition features. In addition, Xu et al. [32] used the hierarchical CU partition mapping (HCPM) to represent the CU partition of the whole CTU to solve the problem of repeatedly calling classifiers, and proposed an early termination hierarchical network (ETH-CNN) to learn to predict HCPM. Compared with the intra-frame prediction method proposed in [32], the method in [21] increases the diversity of the data sets and fine-tunes the training data to obtain better prediction results. Zhang et al. [33] established a texture threshold model related to the CU depth and quantization parameters (QPs) to design corresponding networks for different sizes of CU. Similarly, Galpin et al. [34] used texture information to make a fast CU decision and then used the CNN method to further realize a fast partition decision. Zaki et al. [35] used the three same ResNet18 networks to confirm CU partitions of different sizes. CNN methods perform predictive analysis for different sizes of the CU and integrate their corresponding training model results into a CTU, and then confirm the CU partition of the current CTU to achieve better performance. However, this method of building a CNN corresponding to each depth inevitably complicates its training process and indirectly reduces the coding performance of the network. Although embedding the CNN into HEVC to predict CU partition increases the coding complexity, the above CNN-based algorithms can reduce the complexity greatly, owing to the ability of CNNs to automatically extract features instead of early termination or ergodic search in the traditional CU partition methods. Good results have been obtained in the experiment. This shows that using CNNs to reduce the complexity is suitable.

Some scholars corresponded the CNN outputs to CU partition characterization according to a quad-tree structure and achieved end-to-end CU prediction. Ren et al. [36] proposed an intra-frame block partition CNN network, which adjusts the network output to a 4 × 4 matrix according to the quad-tree structure. Additionally, the matrix elements of the 4 × 4 matrix correspond to the depth values of the corresponding 16 × 16 CU blocks. Feng et al. [37] used an 8 × 8 matrix to represent the CTU depth map, and then the depth value was predicted according to the CNN and the CU partition was finally determined. Moreover, many classical or novel CNN-based methods have been proposed to optimize the CU partition problem. Imen et al. [38] reduced the complexity of the CU partition module by modifying the classical LeNet-5 and AlexNet classifiers. Yao et al. [39] used a dual network to construct a prediction network and target network based on a CNN. The training prediction model of the optimal CU partition mode is decided by a reinforcement learning method based on the RD function. Additionally, the optimal RD estimation of the CU partition is obtained. However, the dual structure of the CNN in [39], while seemingly effective and convincing, does not meet the requirements of prediction accuracy of the block texture features to be extracted in the CTU partition and the correlation between sub-CUs. As a matter of fact, the CU partition prediction accuracy of the CNN is closely related to the effective learning of its extracted pixel features.

In order to fully learn the extracted pixel features of the CTU and decrease the number of network parameters, a D-CNN is proposed for the CU partition prediction in the HEVC intra-frames in this paper. Moreover, a threshold optimization strategy is proposed to improve the partitioning accuracy and further balance the performance and speed of encoders. The experimental results show that the proposed method reduces the coding time by 60.14%, with 1.88% better BD-BR compared with the standard partition method in the HEVC on standard test sequences by JCT-VC [40].

In brief, our contributions are as follows.

We design a densely connected block to fully extract CTU pixel features, and propose a fast D-CNN based decision model with excellent performance to predict the HEVC coding unit size well, and reduce the complexity of HEVC intra-frame prediction calculation consequently.

We introduce, for the first time, an efficient channel attention (ECA) module and adaptive convolutional kernel size into the network structure of fast CU partition to improve the accuracy of CU partition feature extraction, which can make it to capture the information of convolutional channels in the D-CNN.

We propose a CU partition threshold optimization strategy to trade off the complexity of HEVC intra-frame prediction with the performance of RD.

2. The Proposed D-CNN of CU Partition Method

The proposed method can avoid redundant iterations when extracting the best CU depth compared with the traditional CU partition process, as shown in Figure 1. Three main decision-making processes are engaged. The first process is the design of the network framework and the selection of the corresponding parameters for the D-CNN. The second process is the design of the appropriate loss function for the D-CNN process. Additionally, the third process is the design of the optimal threshold for the CU partition results obtained from the D-CNN. For clarification, each process is discussed in detail in the following subsections.

2.1. Proposed Network Structure

According to the HEVC standard, the CTU has four different CU sizes

S_{i}

(

i = 0,1, 2,3

), corresponding to 64 × 64, 32 × 32, 16 × 16 and 8 × 8, respectively, and

i

represents the corresponding depth.

S_{i}

(

i = 0,1, 2,3

) represents that there are only two decision cases of partition and no partition, where

S_{i} = 0

represents no partition, and

S_{i} = 1

represents a partition. The CTU level of

S_{i}

(

i = 0

) is represented by one label. The CTU level of

S_{i}

(

i = 1

) is represented by four labels. The CTU hierarchy of

S_{i}

(

i = 2

) is represented by sixteen labels. The CTU of

S_{i}

(

i = 3

) does not require any decision label representation, and it is decided according to the previously described

S_{i}

(

i = 0,1, 2

) label. Therefore, each CTU contains 1 + 4 + 16 = 21 decisions about whether to divide down, and its decision flag bit can be used to describe the CU partition structure. According to the previous discussion, decision labels are divided into three levels to be adopted in the proposed D-CNN.

The D-CNN is mainly composed of three convolution layers, three densely connected blocks, two maximum pooling layers, an ECA module, and two fully connected networks, in which the densely connected block is composed of two convolution blocks. Without loss of generality, this paper only analyzes the luminance information of the video sequence, i.e., only the pixel blocks composed of Y components in the input 64 × 64 CTU are taken. Additionally, the global normalization preprocessing operation is carried out to make the training data more convergent. The overall structure of the proposed model framework is shown in Figure 2.

First, the processed CTU data are passed through a layer of 4 × 4 convolution layers. To fit the non-overlapping characteristics of the CU partition process, we set the step size of this layer equal to the size of the convolution kernel. To further learn deeper features, three dense connected blocks are used. Different from the convolution layers in the traditional CNN, the biggest uniqueness of the densely connected block [41] is that the connection relationship is established between different layers and makes full use of feature mapping. In addition, the densely connected block uses many convolution kernels of size 1 × 1 to perform low-cost cross-channel feature transformations, which can reduce the computational resource consumption and the risk of overfitting. However, the densely connected block is too data-heavy and occupies a large amount of memory. To improve efficiency and reduce memory and energy consumption, we have applied some modifications to the densely connected block to the practical problem of CU partition. Unlike the structure in [41], the structure of conv (1 × 1) + batch normalization (BN) + ReLU + conv (3 × 3) + BN + ReLU is called Block in this paper. Note that all feature map sizes must be consistent to avoid errors in subsequent stitching. Therefore, we set the step size of the Block convolution layers to 1 to facilitate feature fusion using concatenate. The densely connected block of the D-CNN only connects the input and output of the Block for feature reuse and, thus, makes full use of the CTU texture features, namely:

x_{l} = H_{l} ([x_{l - 1}]) + x_{l - 1}

(1)

when

l \geq 2

,

x_{l - 1} = G ([x_{l - 1}])

, where

H_{l} (\cdot)

is a nonlinear transformation function including a series of conv, BN, and ReLU.

G (\cdot)

denotes the nonlinear transformation function of conv, max pooling.

After passing through the Block, the number of output channels is relatively large. Adding 1 × 1 convolution can reduce the number of input feature maps and the calculation amount. Then, a non-overlapping maximum pooling process with a 2 × 2 size and a step stride of 2 is used to filter the features, which greatly reduces the number of parameters while retaining the main features and reduces the risk of overfitting.

Next, although the convolution operation can perform a multi-scale fusion of spatial features, it ignores the differences in the description of key information between each feature channel [42]. The channel attention mechanism module can significantly reduce the model complexity while maintaining performance through appropriate cross-channel interaction, which makes up for the deficiency of the convolution operation. Therefore, this paper introduces the ECA model [43] for the first time to solve the intra-frame CU partition complexity problem to improve the CU partition predictability of the D-CNN. After channel-level global average pooling without reducing the dimension, an adaptive size convolution layer is used to effectively achieve channel interaction coverage. The relationship between the size of the convolution kernel

k

and the dimension

C

of the channel is as follows:

k = {| \frac{{L o g}_{2} (C)}{γ} + \frac{b}{γ} |}_{o d d}

(2)

where

γ

and

b

are set to 2 and 1,

| | o d d

take an odd number. We test the local information interaction of the feature maps of different layers in the proposed network. The results show that adding a channel attention mechanism to the last densely connected block can effectively improve the feature accuracy without affecting the training time; thus, it can better adapt to the CU partition of different resolution video sequences.

Finally, the extracted fused feature map is transformed into a one-dimensional vector and input to the fully connected layer. The fully connected layer includes three levels. The CU partition probabilities of different sizes are obtained by three levels of the D-CNN outputs, which finally correspond to 21 decision outputs, and then the output layer is activated by the sigmoid function. Additionally, the QP value is added to the fully connected layer as an external feature so that the CU partition can better adapt to the different QP values. In addition, an early termination mechanism is used to reduce the redundant CU division coding process. For example, when the first-level decision label is not divided, the second-level or third-level division decision will not be made, and so on. Table 1 provides detailed parameters of the D-CNN.

2.2. Loss Function

In order to better train the D-CNN, we effectively designed the target loss function by integrating the effect of CU partition depth on the coding performance. Since cross-entropy is usually used to describe the error between the true values and the predicted values, this can synergistically train the network model very well. Usually, the higher the cross entropy, the greater the difference, and vice versa. Therefore, we use the cross-entropy sum as the loss function of the D-CNN to train the network model designed in this paper. For each sample, the D-CNN valid outputs are summed according to the cross-entropy. Then, the D-CNN model is trained by optimizing the loss function for all training samples. Suppose that the predicted labels are defined as

{y_{1}^{'} (i)}_{i = 1}^{1}

,

{y_{2}^{'} (i)}_{i = 1}^{4}

, and

{y_{3}^{'} (i)}_{i = 1}^{16}

, and the ground truth labels are defined as

{y_{1} (i)}_{i = 1}^{1}

,

{y_{2} (i)}_{i = 1}^{4}

, and

{y_{3} (i)}_{i = 1}^{16}

, respectively. The total number of samples is set as N and the individual samples as n, respectively. Then, the loss function

L_{N}

is calculated as follows:

L_{N} = \frac{1}{N} \sum_{n = 1}^{N} (\sum_{i = 1}^{1} H (y_{1} (i), y_{1}^{'} (i)) + \sum_{i = 1}^{4} H (y_{2} (i), y_{2}^{'} (i)) + \sum_{i = 1}^{16} H (y_{3} (i), y_{3}^{'} (i)))

(3)

where

H (*)

is the cross-entropy operator between the ground truth and the predicted labels. The Adam [44] optimization algorithm was used to train the D-CNN. Finally, the trained model can predict the partition of the entire CTU in the HEVC.

2.3. Threshold Optimization Decision

As mentioned above, the division of three levels of binary output labels decision probability is:

{y_{1}^{'} (i)}_{i = 1}^{1}

,

{y_{2}^{'} (i)}_{i = 1}^{4}

,

{y_{3}^{'} (i)}_{i = 1}^{16}

. Let

ρ_{i}^{r}

denote the predicted CU partition probability of all outputs, where

r

denotes the depth and

i

denotes the

i

th output of the proposed D-CNN. It is critical to select the corresponding threshold

β^{r}

for each depth to determine the CU partition decision. For example, when

ρ_{i}^{0} > β^{0}

,

{y_{1}^{'} (i)}_{i = 1}^{1} = 0

means that CU is partitioned, and vice versa. The overall algorithm flow about threshold optimization decision can be depicted as in Figure 3.

β^{r}

directly affects computational complexity and RD performance. In [32], the Bi-thresholds were chosen for the CU partition at three levels. However, the results of the experiments showed that the Bi-threshold application is not necessary for the intra-frame prediction mode. As

β^{r}

increases, CU partition probability decreases and computational complexity decreases, and, consequently, the RD performance decreases. On the other hand, the

β^{r}

decreases, the CU partition probability increases, and the RD performance increases, but the computational complexity increases. Therefore, we need to seek the best

β^{r}

to balance the computational complexity with the RD performance, i.e., to provide better RD performance while keeping the computational complexity low.

To obtain the best prediction accuracy of the CU partition,

β^{r}

is tested using the pre-assignment method. Specifically, the CU partition probability outputs by the D-CNN are compared with the thresholds to obtain the predicted entire CTU partition. Here,

β^{r} \in (0,1)

is preassigned sequentially and the corresponding average prediction accuracy is calculated each time. The above procedure was repeated for three different depths, 0, 1, and 2, and the four QP values {22, 27, 32, 37}. Finally, the relationship between the corresponding threshold for depths, 0, 1, and 2, and the prediction accuracy is obtained, respectively, as shown in Figure 4a–c. It can be seen in Figure 4 that the network training accuracy fluctuates with the threshold value changes. The fluctuations are due to the fact that the threshold value causes errors in the CU partition results predicted by the network, which, thus, affects the accuracy of the CU partition prediction in the D-CNN. Therefore, the choice of the threshold value is critical. Due to the special structure of quad-trees, the impact of the CU partition at high depths on computational complexity and the RD performance may be greater. It is clear that in Figure 4, with a threshold of 0.5, the overall network training accuracy obtains greater values. Therefore, we choose the threshold that achieves the maximum prediction accuracy at high depth as the optimal threshold for this experiment.

3. Results

3.1. Experimental Configuration

The experimental configuration and evaluation criteria in this paper are as follows: the experimental platform is the HEVC reference software HM16.5, in which the four QP values {22, 27, 32, 37} are selected to encode the video. The intra-frame mode is evaluated using the configuration file encoder_intra_main.cfg [45]. The hardware configuration and operating system are shown in Table 2. Among them, an AMD Ryzen 7 5800H computer with Radeon [email protected] GHz was used to complete the work in testing, and an NVIDIA GeForce RTX 3050 Laptop GPU was used to speed up the training of the D-CNN.

The network framework adopts the tensorflow platform [46] to implement the D-CNN. When training the model, all of the trainable parameters are randomly initialized, the training batch size N is 64, the initial learning rate is set to 0.01, and a total of 1 million iterations are performed.

The network proposed in this paper was trained and verified on the CPHI Test Set [47], an image data set with different resolutions. The database consists of four different resolution sequences, including 4K HD sequences. The diversity of the database resolution ensured the accuracy of the CTU partition decision in the training database. Each resolution database was randomly divided into 85% training, 10% testing, and 5% validation. To obtain the training, testing, and validation database containing the CU partition labels, all of the images were coded using the HEVC reference software HM 16.5 under the four QP values {22, 27, 32, 37}. In addition, to better show that the D-CNN has good prediction performance for test sequences with different resolutions, this paper performs verification experiments on 18 standard sequences with different resolutions in five categories (A, B, C, D, and E) given by JCT-VC for a fair comparison, and all settings adopt the default settings in CTC. Meanwhile,

∆ T

, BD-BR and BD-PSNR in VCEG-M33 [48] were used as the evaluation criteria for experimental results.

∆ T

represents the percentage of the coding time saved compared to the HM 16.5 encoder. In addition, BD-BR and BD-PSNR are used to evaluate the RD performance, which is expressed as the average bit rate difference and the average peak signal-to-noise ratio difference encoded by the optimized method and the original method, respectively, namely:

B D - B R = \frac{1}{∆ D} \int_{D_{l}}^{D_{h}} (r_{p r o p o s e d} - r_{H M 16.5}) d D

(4)

where

∆ D

=

D_{h} - D_{l}

,

D_{h}

, and

D_{l}

are high and low ends of the output RD curve range.

r_{p r o p o s e d}

and

r_{H M 16.5}

are the corresponding bit rates of the proposed method and the original method, respectively.

B D - P S N R = \frac{1}{∆ r} \int_{r_{l}}^{r_{h}} (D_{p r o p o s e d} (r) - D_{H M 16.5} (r)) d r

(5)

where the

∆ r

=

{l o g (R}_{h}) - {l o g (R}_{l})

,

r_{h} = {l o g (R}_{h})

, and

r_{l} = {l o g (R}_{l})

, respectively, are the output range of high and low bitrate RD curve.

D_{p r o p o s e d} (r)

and

D_{H M 16.5} (r)

are the RD curves corresponding to the proposed method and the original method. The smaller the increase in BD-BR and the decrease in BD-PSNR, the smaller the loss of the RD performance.

3.2. Experimental Results and Discussion

The

∆ T

, BD-BR, and BD-PSNR obtained by the proposed method on the JCT-VC standard test sequences with the QP values {22, 27, 32, 37} are shown in Table 3. The

∆ T

represents the percentage of coding time saved compared to the HM 16.5 encoder:

∆ T = \frac{T_{P r o p o s e d} - T_{H M 16.5}}{T_{H M 16.5}} \times 100

(6)

where

T_{H M 16.5}

represents the coding time consumed by the

H M 16.5

, and

T_{P r o p o s e d}

represents the coding time consumed by the proposed method.

As can be seen from the experimental results in Table 3, this paper uses the deep learning method to greatly reduce the coding complexity of the four QP values while keeping the video distortion negligible. Compared with the original encoder HM16.5, the coding time was saved −54.44%, −59.22%, −61.81%, and −65.15% under the QP values of {22, 27, 32, 37}, respectively, and the BD-BR was increased by 1.81%, and the BD-PSNR was reduced by −0.088 dB only. Moreover, it can be seen that with the increase in the QP value, the proposed method can better reduce the HEVC coding time consumption. This is mainly due to the fact the proposed D-CNN can make full use of the CTU texture features with smaller depths and achieve more accurate predictions. At the same time, the small values of the BD-BR and BD-PSNR also indicate that the proposed method has an acceptable effect on the video quality and the bitrate of the video.

To validate the high performance of the proposed method, four representative intra-frame coding methods in [16,20,25,39] are selected for the comparison on the 18 video standards given by JCT-VC, as shown in Table 4.

Let

{∆ T}_{a v g}

represent the average saved encoding time of the sequence under the four QP values configurations:

{∆ T}_{a v g} = \frac{1}{4} \sum_{i = 1}^{4} \frac{T_{P r o p o s e d} ({Q P}_{i}) - T_{H M 16.5} ({Q P}_{i})}{T_{H M 16.5} ({Q P}_{i})} \times 100

(7)

The comparison results in Table 4 show that the complexity of the proposed method is reduced by 60.14%. This result exceeds the complexity reductions of 45.23%, 46.23%, and 54.56% in [16,20,39] and is only 0.95% lower than that of [25]. However, the complexity reduction in [25] comes at the cost of a compression efficiency reduction of 6.19% and −0.32 dB in BD-BR and BD-PSNR, respectively. It is particularly worth mentioning that the rate performance and complexity cost in the proposed method are better than those in [39] with the same video quality. Meanwhile, although the complexity reduction in [25] is slightly better than that of the proposed method, the RD performance has significantly decreased compared with that of the proposed method. The experimental results show that the proposed method has better performance in predicting the CU partition, which also indicates that the proposed D-CNN can adequately train and learn the pixel information of the CTU.

Figure 5 shows the comparison results between the proposed algorithm and the other four algorithms mentioned above in a three-dimensional visual graphical format. The three-dimensional graph can visually display the performance of various indicators of each algorithm. The meaning of

{∆ T}_{a v g}

(%), BD-BR(%), and BD-PSNR(dB) are as mentioned above. It is undeniable that the proposed method is in the best position when compared with the four methods [16,20,25,39] and outperforms all of the compared algorithms in terms of overall evaluation criteria. In particular, the trade-off between complexity reduction and the RD performance of the proposed method is better than that of the method with minimum bit rate performance degradation [16] and the method with maximum complexity reduction [25]. It is noteworthy that the complexity reduction and coding efficiency performance of the proposed method are both better than those of the method in [39]. Experiments on other different resolution A–E class sequences have yielded similar results, showing excellent RD performance and good prediction performance for the CU partition structure without any limitations of the proposed method.

To analyze the RD performance of the proposed method more intuitively, the RD curves of the BQSquare sequence with the minimum performance loss and the Fourpeople sequence with the maximum performance loss are given in Figure 6. As we can see from Figure 6a, for the BQSquare sequence, the RD curve of the proposed method agrees with that of HM16.5. From Figure 6b, for the Fourpeople sequence with large performance loss, the proposed method is similar to the original results in the worst case of the experimental results. Similar curves are achieved for other sequences with different video features. The results indicate that the RD performance of the proposed method is intuitive in terms of code rate and image quality.

Finally, to assess the objective and subjective quality loss of the proposed method, in this paper, the complexity of prediction in HEVC intra-frames is mainly studied. Therefore, the I-frames [49] of the sequences BasketballPass, BQMall, and Cactus with complex backgrounds and intense motion are selected for comparison. In addition, to facilitate the observation of image loss details, we also provide some zoom-in areas of 64 × 64 size for subjective quality comparisons, as shown in Figure 7. Among them, Figure 7b,d,f shows the subjective results of the original encoder HM16.5 and Figure 7a,c,e shows the subjective results of the proposed method. It should be noted that the resolution of the selected frames varies, resulting in different sizes of the detailed box plots marked as 64 × 64 size in the figure. It can be seen that compared with HM16.5, the PSNR losses of BasketballPas, BQMall, and Cactus sequences in the proposed method are 0.0179, 0.0109, and 0.0227, respectively. However, the loss is not perceptible by the human eye and can be neglected. Similar results were observed in the other test sequences of JCT-VT. Therefore, the proposed method in this paper performs well in terms of image quality and coding efficiency.

4. Conclusions

In this paper, a D-CNN network based on deep learning is proposed for the fast CU partition decision in the HEVC intra-frame coding mode, which reduces the complexity of the HEVC. The main contribution is to use the prior information of the video coding process to design the model, in which the proposed dense connection block can realize parameter multiplexing so that the model parameters are used effectively. Moreover, the improved channel attention mechanism improves the texture expression ability of features and consequently strengthens the information interaction between channels. The experimental results show the effectiveness and superiority of the proposed method. Compared with the original HM encoder, the proposed method reduces the coding complexity by 60.14% at the cost of a 1.81% BD-BR increase and 0.09 dB BD-PSNR degradation on the JCT-VC test sequences.

For future work, the processing of inter-frame prediction will be considered to further reduce the complexity. Additionally, we will plan to extend the proposed method to the latest coding standard VVC/H.266 with a more complex partitioning structure.

Author Contributions

Conceptualization, T.W., G.W. and H.L.; methodology, T.W., G.W., H.L., T.B. and Q.Z.; software, T.W., G.W. and H.L.; validation, T.W., G.W., H.L. and R.W.; formal analysis, T.W., G.W., and H.L.; investigation, T.W., G.W. and H.L.; resources, data curation, T.W.; writing—original draft preparation, T.W. and G.W.; writing—review and editing, T.W., G.W., T.B., H.L., Q.Z. and R.W.; visualization, T.W. and G.W.; supervision, G.W.; project administration, G.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Guangxi Province (Grants No. 2020GXNSFAA297184, No. 2020GXNSFBA297097), the National Natural Science Foundation of China (Grant. No. 62161031), and the Science and Technology Planning Project of Guangxi Province (Grant. No. AD21238038).

Data Availability Statement

The data that support the findings of this study is available from Dr. Wei at this email address, [email protected], upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A. Overview of the h. 264/avc video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 560–576. [Google Scholar] [CrossRef]
Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.R. Overview of the versatile video coding (V-VC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
Wu, S.; Shi, J.; Chen, Z. HG-FCN: Hierarchical Grid Fully Convolutional Network for Fast VVC Intra Coding. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5638–5649. [Google Scholar] [CrossRef]
Li, Y.; Li, L.; Fang, Y.; Peng, H.; Ling, N. Bagged Tree and ResNet-Based Joint End-to-End Fast CTU Partition Decision Algorithm for Video Intra Coding. Electronics 2022, 11, 1264. [Google Scholar] [CrossRef]
Video Developer Report 2021. Available online: https://go.bitmovin.com/video-developer-report (accessed on 12 January 2022).
Kim, I.K.; Min, J.; Lee, T.; Han, W.J.; Park, J. Block Partitioning Structure in the HEVC Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1697–1706. [Google Scholar] [CrossRef]
Guo, H.; Zhu, C.; Xu, M.; Li, S. Inter-Block Dependency-Based CTU Level Rate Control for HEVC. IEEE Trans. Broadcast. 2020, 66, 113–126. [Google Scholar] [CrossRef]
Jamali, M.; Coulombe, S. Fast HEVC Intra Mode Decision Based on RDO Cost Prediction. IEEE Trans. Broadcast. 2019, 65, 109–122. [Google Scholar] [CrossRef]
JCT-VC, Hm Software. Available online: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.5 (accessed on 20 August 2020).
Fang, H.; Chen, H.; Chang, T. Fast intra prediction algorithm and design for High Efficiency Video Coding. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, QC, Canada, 22–25 May 2016; pp. 1770–1773. [Google Scholar] [CrossRef]
Kim, N.; Jeon, S.; Shim, H.J.; Jeon, B.; Lim, S.; Ko, H. Adaptive keypoint-based CU depth decision for HEVC intra coding. In Proceedings of the 2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Nara, Japan, 1–3 June 2016; pp. 1–3. [Google Scholar] [CrossRef]
Zhang, T.; Sun, M.T.; Zhao, D.; Gao, W. Fast intra-mode and CU size decision for HEVC. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1714–1726. [Google Scholar] [CrossRef]
Gu, J.; Tang, M.; Wen, J.; Han, Y. Adaptive intra candidate selection with early depth decision for fast intra prediction in HEVC. IEEE Signal. Process. Lett. 2018, 25, 159–163. [Google Scholar] [CrossRef]
Fu, B.; Zhang, Q.Q.; Hu, J. Fast prediction mode selection and CU partition for hevc intra coding. IET Image Process. 2020, 14, 1892–1900. [Google Scholar] [CrossRef]
Chen, F.; Jin, D.; Peng, Z.; Jiang, G.; Yu, M.; Chen, H. Fast intra coding algorithm for HEVC based on depth r-ange prediction and mode reduction. Multimed. Tools Appl. 2018, 77, 10. [Google Scholar] [CrossRef]
Liu, X.; Li, Y.; Liu, D.; Wang, P.; Yang, L.T. An adaptive CU size decision algorithm for HEVC intra prediction based on complexity classification using machine learning. IEEE Trans. Circuits Syst. Video Technol. 2017, 29, 144–155. [Google Scholar] [CrossRef]
Erabadda, B.; Mallikarachchi, T.; Hewage, C.; Fernando, A. Quality of Experience (QoE)-Aware Fast Coding Unit Size Selection for HEVC Intra-Prediction. Future Internet 2019, 11, 175. [Google Scholar] [CrossRef]
Pakdaman, F.; Yu, L.; Hashemi, M.R.; Ghanbari, M.; Gabbouj, M. SVM based approach for complexity control of HEVC intra coding. Signal. Process. Image Commun. 2021, 93, 116177. [Google Scholar] [CrossRef]
Liu, D.; Liu, X.; Li, Y. Fast CU Size Decisions for HEVC Intra Frame Coding Based on Support Vector Machines. In Proceedings of the 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure Computing, 14th Intl-Conf on Pervasive Intelligence and Computing, 2nd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Auckland, New Zealand, 8–12 August 2016; pp. 594–597. [Google Scholar] [CrossRef]
Amna, M.; Imen, W.; Soulef, B.; Fatma Ezahra, S. Machine Learning-Based approaches to reduce HEVC intra coding unit partition decision complexity. Multimed. Tools Appl. 2022, 81, 2777–2802. [Google Scholar] [CrossRef]
Wang, S.H.; Zhu, Z.; Zhang, Y.-D. PSCNN: PatchShuffle convolutional neural network for COVID-19 explainable diagnosis. Front. Public. Health 2021, 9, 1593. [Google Scholar] [CrossRef]
Wang, S.H.; Wu, K.; Chu, T.; Fernandes, S.L.; Zhou, Q.; Zhang, Y.D.; Sun, J. Sospcnn: Structurally optimized stochastic pooling convolutional neural network for tetralogy of fallot recognition. Wireless Communications and Mobile Computing. Wirel. Commun. Mob. Comput. 2021, 2021, 5792975. [Google Scholar] [CrossRef]
Zhang, Y.D.; Satapathy, S.; Zhu, L.Y.; Gorriz, J.M.; Wang, S. A seven-layer convolutional neural network for chest ct based covid-19 diagnosis using stochastic pooling. IEEE Sensors J. 2020, 22, 17573–17582. [Google Scholar] [CrossRef]
Liu, Z.; Yu, X.; Gao, Y.; Chen, S.; Ji, X.; Wang, D. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Trans. Image Process. 2016, 25, 5088–5103. [Google Scholar] [CrossRef] [PubMed]
Cui, W.; Zhang, T.; Zhang, S.; Jiang, F.; Zuo, W.; Zhao, D. Convolutional neural networks based intra prediction for HEVC. In Proceedings of the 2017 Data Compression Conference (DCC), Snowbird, UT, USA, 4–7 April 2017; p. 436. [Google Scholar] [CrossRef]
Schiopu, I.; Huang, H.; Munteanu, A. CNN-based intra-prediction for lossless HEVC. IEEE Trans. Circuits Syst. Video Technol. 2019, 99, 1816–1828. [Google Scholar] [CrossRef]
Jin, Z.; An, P.; Shen, L.; Yang, C. CNN oriented fast QTBT partition algorithm for JVET intra coding. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar] [CrossRef]
Wang, Z.; Wang, S.; Zhang, X.; Wang, S.; Ma, S. Fast QTBT partitioning decision for interframe coding with convolution neural network. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2550–2554. [Google Scholar] [CrossRef]
Jamali, M.; Coulombe, S.; Sadreazami, H. CU Size Decision for Low Complexity HEVC Intra Coding based on Deep Reinforcement Learning. In Proceedings of the 2020 IEEE 63rd International Midwest Symposium on Circuits and S-ystems (MWSCAS), Springfield, MA, USA, 9–12 August 2020; pp. 586–591. [Google Scholar] [CrossRef]
Amna, M.; Imen, W.; Ezahra, S.F. LeNet5-Based approach for fast intra coding. In Proceedings of the 2020 10th International Symposium on Signal, Image, Video and Communications (ISIVC), Saint-Etienne, France, 7–9 April 2021; pp. 1–4. [Google Scholar] [CrossRef]
Xu, M.; Li, T.; Wang, Z.; Deng, X.; Yang, R.; Guan, Z. Reducing complexity of HEVC: A deep learning approach. IEEE Trans. Image Process. 2018, 27, 5044–5059. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, G.; Tian, R.; Xu, M.; Kuo, C.J. Texture-Classification Accelerated CNN Scheme for Fast Intra CU Partition in HEVC. In Proceedings of the 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 26–29 March 2019; pp. 241–249. [Google Scholar] [CrossRef]
Galpin, F.; Racapé, F.; Jaiswal, S.; Bordes, P.; Le Léannec, F.; François, E. CNN-based driving of block partitioning for intra slices encoding. In Proceedings of the 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 26–29 March 2019; pp. 162–171. [Google Scholar] [CrossRef]
Zaki, F.; Mohamed, A.E.; Sayed, S.G. CtuNet: A Deep Learning-Based Framework for Fast CTU Partitioning of H265/HEVC Intra-coding. Ain Shams Eng. J. 2021, 12, 1859–1866. [Google Scholar] [CrossRef]
Ren, W.; Su, J.; Sun, C.; Shi, Z. An IBP-CNN Based Fast Block Partition For Intra Prediction. In Proceedings of the 2019 Picture Coding Symposium (PCS), Ningbo, China, 12–15 November 2019. [Google Scholar] [CrossRef]
Feng, A.; Gao, C.; Li, L.; Liu, D.; Wu, F. Cnn-Based Depth Map Prediction for Fast Block Partitioning in HEVC Intr-a Coding. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Imen, W.; Amna, M.; Fatma, B.; Ezahra, S.F.; Masmoudi, N. Fast HEVC intra-CU decision partition algorithm with modified LeNet-5 and AlexNet. Signal Image Video Process. 2022, 16, 1811–1819. [Google Scholar] [CrossRef]
Yao, C.; Xu, C.; Liu, M. RDNet: Rate–Distortion-Based Coding Unit Partition Network for Intra-Prediction. Electronics 2022, 11, 916. [Google Scholar] [CrossRef]
Ohm, J.R.; Sullivan, G.J.; Schwarz, H.; Tan, T.K.; Wiegand, T. Comparison of the coding efficiency of video coding standards—Including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1669–1684. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
Jia, K.; Cui, T.; Liu, P.; Liu, C. Fast Prediction Algorithm in High Efficiency Video Coding Intra-mode Based on Deep Feature Learning. J. Electron. Inf. Technol. 2021, 43, 2023–2031. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11534–11542. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar] [CrossRef]
Xu, M.; Deng, X.; Li, S.; Wang, Z. Region-of-interest based conversational HEVC coding with hierarchical perception model of face. IEEE.J. Sel. Top. Signal Process. 2014, 8, 475–489. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:1603.04467,2016. [Google Scholar] [CrossRef]
CPH-Intra. Available online: https://github.com/HEVC-Projects/CPH (accessed on 3 October 2018).
Grellert, M.; Bampi, S.; Correa, G.; Zatt, B.; Cruz, L.S. Learning-based complexity reduction and scaling for HEVC encoders. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1208–1212. [Google Scholar] [CrossRef]
Sze, V.; Budagavi, M.; Sullivan, G.J. High Efficiency Video Coding (HEVC): Algorithms and Architectures; Springer International Publishing: Cham, Switzerland, 2014; pp. 91–112. [Google Scholar] [CrossRef]

Figure 1. Comparison between the conventional flowchart and proposed flowchart.

Figure 2. The structure of the proposed D-CNN.

Figure 3. The flowchart of threshold decision algorithm.

Figure 4. Threshold vs. accuracy at different depths and QP values. (a)

β^{0}

vs. accuracy; (b)

β^{1}

vs. accuracy; (c)

β^{2}

vs. accuracy.

Figure 4. Threshold vs. accuracy at different depths and QP values. (a)

β^{0}

vs. accuracy; (b)

β^{1}

vs. accuracy; (c)

β^{2}

vs. accuracy.

Figure 5. Comparison of coding performance (

{∆ T}_{a v g}

, BD-BR, and BD-PSNR) with other algorithms on standard test sequences [16,20,25,39].

Figure 5. Comparison of coding performance (

{∆ T}_{a v g}

, BD-BR, and BD-PSNR) with other algorithms on standard test sequences [16,20,25,39].

Figure 6. RD curve of proposed method and HM16.5. (a) BQSquare in Class C; (b) Fourpeople in Class E.

Figure 7. Objective (in PSNR) and subjective results (for the zoom-in areas of the first frames of BasketballPass, Cactus, and BQMall sequences with QP = 37, respectively) obtained by the (b,d,f) original HEVC frame coding (PSNR = 32.4869 dB, 33.0733 dB, and 31.8188 dB); the (a,c,e) proposed CNN-based framework (PSNR = 32.4690 dB, 33.0506 dB, and 31.8079 dB).

Table 1. The detailed parameters in the D-CNN.

Layers	Output Size	Proposed CNN Configuration
Convolution	16 × 16	4 × 4 conv, stride 4
Densely Connection Black (1)	16 × 16	$[\begin{matrix} 1 \times 1 c o n v \\ 3 \times 3 c o n v \end{matrix}]$
Convolution	16 × 16	1 × 1 conv, stride 1
Max pooling	8 × 8	$2 \times 2$ max pooling, stride 2
Densely Connection Black (2)	8 × 8	$[\begin{matrix} 1 \times 1 c o n v \\ 3 \times 3 c o n v \end{matrix}]$
Convolution	8 × 8	1 × 1 conv, stride 1
Max pooling	$4 \times 4$	2 × 2 max pooling, stride 2
Densely Connection Black (3)	4 × 4	$[\begin{matrix} 1 \times 1 c o n v \\ 3 \times 3 c o n v \end{matrix}]$
ECA	4 × 4	——
Flatten	1280	——
Fully Connection Net (1)	——	FC 1-1:64	FC 2-1:128	FC 3-1:256
Fully Connection Net (2)	——	FC 1-2:48	FC 2-2:96	FC 3-2:192
Output	——	FC 1-3:1	FC 2-3:4	FC 3-3:16

Table 2. Experimental hardware configuration and operating system.

Operating System	Windows 11
CPU	AMD Ryzen 7 5800H with Radeon Graphics @ 2.40 GHz
GPU	NVIDIA GeForce RTX 3050 Laptop GPU
RAM	16 GB

Table 3. Results for the A–E Sequence of JCT-VT.

Class	Sequence	BD-BR (%)	BD-PSNR (dB)	ΔT (%)
Class	Sequence	BD-BR (%)	BD-PSNR (dB)	QP = 22	QP = 27	QP = 32	QP = 37
A (2560 × 1600)	PeopleOnStreet	1.99	−0.113	−59.08	−62.16	−61.96	−64.01
A (2560 × 1600)	Traffic	2.12	−0.115	−61.49	−65.75	−68.08	−71.12
B (1920 × 1080)	BasketballDrive	4.17	−0.104	−67.37	−76.08	−78.17	−78.42
	BQTerrace	1.12	−0.073	−50.83	−54.53	−57.21	−60.07
	Cactus	1.89	−0.072	−55.27	−63.57	−66.44	−72.92
	Kimono	1.52	−0.055	−83.17	−83.56	−83.87	−83.05
	ParkScene	1.76	−0.075	−61.16	−66.83	−69.44	−73.57
C (832 × 480)	BasketballDrill	2.72	−0.131	−39.75	−51.23	−58.75	−66.92
	BQMall	1.12	−0.070	−45.09	−50.29	−52.96	−57.36
	PartyScene	0.32	−0.025	−45.88	−47.42	−49.47	−53.47
	RaceHorses	1.55	−0.099	−49.91	−52.66	−56.16	−61.53
D (416 $\times$ 240)	BasketballPass	2.16	−0.124	−48.00	−55.02	−59.57	−63.48
	BlowingBubbles	0.76	−0.046	−35.87	−40.67	−45.95	−51.07
	BQSquare	0.23	−0.019	−38.11	−41.65	−43.88	−45.91
	RaceHorses	0.92	−0.064	−41.03	−46.32	−49.65	−53.35
E (1280 × 720)	FourPeople	2.56	−0.149	−59.62	−63.91	−65.36	−67.86
	Johnny	3.07	−0.127	−70.78	−73.42	−74.19	−75.52
	KritenAndSara	2.52	−0.130	−67.46	−70.92	−71.50	−73.12
Average of Class A–E		1.81	−0.088	−54.44	−59.22	−61.81	−65.15

Table 4. Comparison results with other algorithms for the A–E Sequence of JCT-VT.

Class	Sequence	[16]			[20]			[25]			[39]			Proposed
Class	Sequence	BD-BR (%)	BD- PSNR (dB)	${∆ T}_{a v g}$ (%)	BD-BR (%)	BD- PSNR (dB)	${∆ T}_{a v g}$ (%)	BD- BR $(%)$	BD- PSNR (dB)	${∆ T}_{a v g}$ (%)	BD- BR $(%)$	BD- PSNR (dB)	${∆ T}_{a v g}$ (%)	BD- BR $(%)$	BD- PSNR (dB)	${∆ T}_{a v g}$ (%)
A	PeopleOnStreet	1.23	−0.07	−42.96	9.63	−0.94	−43.84	3.97	−0.21	−55.59	2.20	−0.13	−57.53	1.99	−0.11	−61.80
A	Traffic	1.16	−0.06	−42.45	6.41	−0.30	−28.87	4.95	−0.24	−60.84	2.43	−0.13	−63.55	2.12	−0.12	−66.61
Class A Average		1.20	−0.07	−42.71	8.02	−0.62	−36.36	4.46	−0.23	−58.22	2.32	−0.13	−60.54	2.06	−0.11	−64.21
B	BasketballDrive	1.66	−0.05	−47.98	8.92	−0.24	−43.40	6.02	−0.14	−69.51	3.94	−0.09	−74.29	4.17	−0.07	−75.01
	BQTerrace	0.94	−0.04	−46.74	6.63	−0.30	−56.62	4.82	−0.27	−57.89	1.19	−0.08	−47.96	1.12	−0.07	−55.66
	Cactus	1.20	−0.04	−44.70	7.53	−0.25	−43.51	6.02	−0.21	−62.98	1.95	−0.08	−52.72	1.89	−0.06	−64.55
	Kimono	1.74	−0.06	−53.33	5.12	−0.17	−47.80	2.38	−0.08	−72.72	1.40	−0.05	−83.53	1.52	−0.08	−83.16
	ParkScene	1.42	−0.06	−44.24	3.63	−0.15	−52.85	3.42	−0.14	−66.03	1.76	−0.08	−59.25	1.76	−0.13	−67.75
Class B Average		1.39	−0.05	−47.40	6.37	−0.22	−48.84	4.53	−0.17	−65.83	2.05	−0.08	−63.55	2.09	−0.08	−69.23
C	BasketballDrill	0.91	−0.04	−40.07	9.82	−0.44	−53.93	12.21	−0.54	−63.58	2.74	−0.13	−47.87	2.72	−0.03	−54.16
	BQMall	1.33	−0.07	−43.21	9.65	−0.49	−42.06	8.08	−0.47	−52.14	1.33	−0.08	−33.08	1.12	−0.10	−51.43
	PartyScene	1.06	−0.08	−44.72	7.38	−0.47	−43.01	9.45	−0.67	−58.75	0.36	−0.03	−33.66	0.32	−0.12	−49.06
	RaceHorses	0.99	−0.05	−45.13	7.22	−0.38	−44.59	4.42	−0.26	−58.19	1.66	−0.11	−36.28	1.55	−0.05	−55.07
Class C Average		1.07	−0.06	−43.28	8.51	−0.45	−45.90	8.54	−0.49	−58.17	1.52	−0.09	−37.72	1.43	−0.08	−52.43
D	BasketballPass	1.28	−0.07	−42.33	10.05	−0.55	−39.72	8.40	−0.46	−64.02	1.85	−0.11	−57.06	2.16	−0.06	−56.52
	BlowingBubbles	1.02	−0.06	−42.27	6.18	−0.38	−37.04	8.33	−0.46	−60.78	0.85	−0.05	−37.87	0.76	−0.15	−43.39
	BQSquare	1.20	−0.10	−44.54	12.34	−0.88	−57.43	2.56	−0.21	−46.72	0.26	−0.02	−38.67	0.23	−0.13	−42.39
	RaceHorses	—	—	—	8.84	−0.49	−40.23	4.95	−0.32	−57.29	0.98	−0.07	−42.99	0.92	−0.13	−47.59
Class D Average		1.17	−0.08	−43.05	9.35	−0.58	−43.61	6.06	−0.36	−57.20	0.99	−0.06	−44.15	1.02	−0.06	−47.47
E	FourPeople	1.87	−0.08	−43.44	9.08	−0.48	−36.22	8.00	−0.44	−61.54	2.91	−0.17	−64.20	2.56	−0.11	−64.19
	Johnny	1.87	−0.07	−52.49	12.18	−0.47	−63.55	7.96	−0.31	−66.55	3.42	−0.14	−77.55	3.07	−0.12	−73.48
	KritenAndSara	1.85	−0.09	−48.27	13.35	−0.63	−57.51	5.48	−0.27	−64.72	2.66	−0.14	−74.00	2.52	−0.10	−70.75
Class E Average		1.86	−0.08	−48.07	11.54	−0.53	−52.43	7.15	−0.34	−64.27	3.00	−0.15	−71.92	2.72	−0.14	−69.47
Average of Class A–E		1.34	−0.06	−45.23	8.56	−0.42	−46.23	6.19	−0.32	−61.09	1.88	−0.09	−54.56	1.81	−0.09	−60.14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, T.; Wei, G.; Li, H.; Bui, T.; Zeng, Q.; Wang, R. A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN. Electronics 2023, 12, 2091. https://doi.org/10.3390/electronics12092091

AMA Style

Wang T, Wei G, Li H, Bui T, Zeng Q, Wang R. A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN. Electronics. 2023; 12(9):2091. https://doi.org/10.3390/electronics12092091

Chicago/Turabian Style

Wang, Ting, Geng Wei, Huayu Li, ThiOanh Bui, Qian Zeng, and Ruliang Wang. 2023. "A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN" Electronics 12, no. 9: 2091. https://doi.org/10.3390/electronics12092091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN

Abstract

1. Introduction

2. The Proposed D-CNN of CU Partition Method

2.1. Proposed Network Structure

2.2. Loss Function

2.3. Threshold Optimization Decision

3. Results

3.1. Experimental Configuration

3.2. Experimental Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI