Research on Deep Compression Method of Expressway Video Based on Content Value

Shen, Yang; Lu, Jinqin; Zhu, Li; Deng, Fangming

doi:10.3390/electronics11234024

Open AccessArticle

Research on Deep Compression Method of Expressway Video Based on Content Value

by

Yang Shen

¹,

Jinqin Lu

¹,

Li Zhu

¹ and

Fangming Deng

^2,*

¹

Jiangxi Kingroad Technologies Development Co. Ltd., Nanchang 330000, China

²

School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(23), 4024; https://doi.org/10.3390/electronics11234024

Submission received: 31 October 2022 / Revised: 29 November 2022 / Accepted: 29 November 2022 / Published: 4 December 2022

(This article belongs to the Section Industrial Electronics)

Download

Browse Figures

Versions Notes

Abstract

:

Aiming at the problem that the storage space and network bandwidth of expressway surveillance video are occupied largely due to data redundancy and sparse information, this paper proposes a deep compression method of expressway video depth based on content value. Firstly, the YOLOv4 algorithm is used to analyze the content value of the original video, extract video frames with vehicle information, and eliminate unintentional frames. An improved CNN is then designed by adding Feature Pyramids and the Inception module to accelerate the extraction and fusion of features at all levels and improve the performance of image classification and prediction. Finally, the whole model is integrated into HEVC encoder for compressing the preprocessed video. The experimental results show that at the expense of only a 5.96% increase of BD-BR, and only a 0.19 dB loss of BD-PSNR, the proposed method achieves a 64% compression ratio and can save 62.82% coding time compared with other classic data compression methods based on deep learning.

Keywords:

expressway video; data compression; deep learning; content value

1. Introduction

With the rapid development of expressway construction in China, smart expressways have become a symbol of modern transportation. Because expressway monitoring constantly collects, stores, and transmits data every day, it leads to a huge amount of monitoring video data, which takes up huge storage space and network bandwidth. Therefore, the data compression technology has become a research hotspot in the field of expressway monitoring [1].

Video data compression algorithm mainly depends on the coding module. In the traditional video compression methods, Karhunen Loeve Transform (KLT) is an optimal transform, which can effectively remove the correlation of the original data [2]. After the transformation, the energy concentration is high, and the optimal sparse representation can be performed, so as to achieve efficient compression. Another kind of classic video compression method is Discrete Cosine Transform (DCT), in which there are famous video coding standards, including ISO/IECs, MPEG, and JPEG [3]. The coding criteria of this kind is set in advance. There are also methods developed by ITU-T/ISO/IEC, such as H.262/MPEG-2, H.264, and H.265, etc., among which, HEVC/H.265 coding is widely used for its high error recovery capability and reduction of real-time delay [4]. Although the above traditional video compression methods have been upgraded many times, their coding is still complicated and there are problems such as a blocking effect and ringing effect in low-bit situations.

Nowadays, deep learning technology has been widely applied to the field of video compression. Video coding based on deep learning can be divided into two categories, inter prediction and intra prediction. For inter prediction, Xia et al. [5] used a multi-scale convolutional neural network model to generate weight parameters and construct two reference frames that were closer to the current frame, thereby using an accurate pixel-level representation of motion. The weight coefficient of the reference frame could also be enhanced through the deep neural network, and then the compensation was refined, but the workload was increased, and the cost performance was low [6]. Based on the idea that adjacent frame pixels do not move much, Chen et al. [7] disturbed the x and y directions, set six displacement estimates for each of the x and y directions, and compressed them through the displacement estimates of the encoder. Wu et al. [8] proposed a video compression algorithm based on frame interpolation. Lu et al. [9] proposed the inter frame compression algorithm DVC. DVC first estimates the motion information of the current frame and the reference frame using the optical flow network, and then encodes the motion information. Djelouah et al. [10] proposed an interframe compression algorithm, NIFC, which also used optical flow to estimate motion information. Different from DVC, this method used optical flow information of the front and back frames to reconstruct the current frame. The above reports are all belonged to inter prediction algorithms based on deep learning, but the video inter frame redundancy is not well utilized, and the performance of the algorithm is not good enough.

For intra prediction, Cui et al. [11] proposed a refined predictor based on a Convolutional Neural Network (CNN) to reduce prediction residual. Jin et al. [12] used the GANs network to predict and generate video frames. Agustsson et al. [13] proposed using CNN for image compression, by adding a convolutional self-encoder in end-to-end learning to generate lossy images. The performance of this work is equivalent to that of JPEG 2000, but it only needs convolutional self-encoder and entropy encoding, and the complexity is greatly reduced. After that, Balle et al. [14] proposed a super prior scheme combining edge information. The model is an end-to-end image compression based on a variational self-encoder to improve the compression performance while improving the image quality. Li et al. [15] designed an importance weight map for the quantization of image compression. When quantizing, the importance weight map is masked to determine how many bits are reserved for each position. Toderici [16] et al. proposed an image compression algorithm based on a cyclic convolution neural network to learn the overall structure of the image, making the reconstructed image more accurate. Kumar et al. [17,18,19,20] limited the pattern search range of the current prediction unit based on the optimal prediction mode of the left and upper prediction units. In the above literature, some deep-learning-based intra-frame compression algorithms are proposed to predict the local structure of the image by using the learning ability of the neural network, so as to better predict the video frame. However, the coding computation complexity is still very large, which consumes a lot of time.

From what has been discussed above, we can know that these discussed methods are compressed on the basis of clear compression content, as well as the lack of consideration of data information sparsity and the actual situation of the expressway [21]. The video content of no vehicle on the expressway has low utilization value. Hence, according to this problem, this paper proposes a deep compression method of expressway video depth based on content value. Firstly, the YOLOv4 algorithm is used to analyze the content value of the original video, extract video frames with vehicle information, and eliminate unintentional frames. An improved CNN is then designed by adding Feature Pyramids and the Inception module to accelerate the extraction and fusion of features at all levels and improve the performance of image classification and prediction.

2. Deep Compression Method Based on Content Value

The overall flow chart of the proposed deep compression method based on content value is shown in Figure 1. The flow chart is divided into two parts. First, the vehicle detection and information extraction preprocessing are performed on the input original video through the YOLOv4 algorithm, and the processed video is integrated. Then, the video is compressed through the fast division algorithm in Coding Unit (CU) based on HEVC intra prediction.

2.1. YOLOv4 Target Detection Algorithm

Target detection for video data is to recognize and extract specific targets from video. At present, deep learning algorithms have been widely used in vehicle detection, and the YOLO algorithm is one of the representatives. The YOLO algorithm has been updated for several generations, and now, YOLOv4, based on Darknet, is the most popular detection framework. YOLOv4 is upgraded from YOLOv3, which not only guarantees an extremely fast real-time detection speed (FPS), but also takes into account the detection accuracy requirements (mAP). It is very suitable for fast, real-time, and accurate detection of passing vehicles on highways. As shown in Figure 2, YOLOv4 consists of three parts: Backbone (backbone feature extraction network), Neck (feature pyramid), and Head (detection head). It adds Mosaic data enhancement, cmBN, and SAT self-confrontation training [22] at the input end.

In Backbone, in order to achieve the optimal balance of the input network resolution, convolution layers, parameter quantities, and output images, CSPParknet53, Mish activation function, Dropblock, and other methods are used to extract model features. CSPParknet53 is composed of many series of residual network structures, namely, one down sampling and multiple residual structures stacking. More Backbone networks can, therefore, obtain more layer of feature information and improve the detection accuracy. The Mish activation function is adopted in each network structure. Mish is a smooth curve. A smooth activation function allows better information to penetrate into the neural network, so as to obtain better accuracy and generalization value. Unlike the classical ReLU, ReLU is prone to gradient disappearance, and function accuracy decreases rapidly. The Mish activation function has greatly improved training stability, average accuracy, and peak accuracy. Its activation equation is shown in Equation (1). The Dropblock method is introduced to overcome the shortcomings of Dropout’s random loss of features. It can achieve the purpose of generating a simpler model. It can also introduce the concept of learning partial network weights in each training iteration to compensate the weight matrix and reduce over-fitting.

Mish = x \times \tanh (\ln (1 + e^{x}))

(1)

The structure of SPP, FPN+PAN, that is, the Feature Pyramid structure of the model, is adopted in Neck. By maximizing the pool operation of the input feature map of any size, a one-dimensional matrix can be obtained, which can significantly improve the size of receptive field and ensure that the detection speed is basically unaffected to obtain more contextual feature information [23].

The loss function of YOLO series is roughly divided into three parts, namely, regression frame loss function, confidence loss function, and classification loss function. The lower the loss function, the better the performance of the network target detection model. The calculation of confidence loss function and classification loss function has not changed in YOLOv4; the output side uses CIOU_Loss, DIOU_nms operation [24]. The addition of the above technology enables YOLOv4 to process the video and image enhancement data, enrich the data sets, and improve the detection speed and accuracy of the objects at different locations. The calculation Equation of CIOU is shown in Equations (2)–(4).

L_{ciou} = 1 - I O U + \frac{p^{2} (b, b^{gt})}{c^{2}} + α v

(2)

v = \frac{4}{π^{2}} {(\arctan \frac{ω^{gt}}{h^{gt}} - \arctan \frac{ω}{h})}^{2}

(3)

α = \frac{v}{(1 - I O U) + v}

(4)

Among them, IOU is the intersection and merger ratio between the prediction box and the real box, and is the European distance between the center point of the prediction box and the real box, which can reflect the actual situation between the two boxes.

α

represents the weight parameter; v represents the same height and width parameter;

ω^{gt}

represents the real frame height and

h^{gt}

represents width, respectively; and h represents the height and width of the prediction frame, respectively.

In this paper, the highway road environment is very complex, with different objects. The most important thing is to detect moving vehicles, especially vehicles with violations. Vehicle detection based on YOLOv4 is shown in Figure 3.

2.2. Fast Algorithm of Intra Prediction CU Partition Based on CNN

2.2.1. Overview

According to the HEVC standard, the intra prediction of CTU is first divided into each CU block in the form of quadtree recursion. However, whether the CU block will continue to be divided downward requires traversing 35 prediction modes through the cost function calculation rough mode (RMD) and the rate distortion optimization (RDO) based on the straightforward cost function calculation. In the process of calculating RDO, three or eight RD cost calculations are required. The final CTU division should select only one division method from 83,522 division methods to obtain the best prediction mode [25]. This process will generate a lot of computational redundancy, and the coding time will be very long. Therefore, the complexity of CU division of the intra prediction module in HEVC coding is an urgent problem to be solved [26].

Based on the above problems, in the CU partition phase, if the shallowest depth of the CU can be selectively known, some intra prediction calculations involved in the CU can be skipped, and the RMD and RDO candidate mode sets can be reduced to avoid traversing all quadtree depths. According to the research, the division depth of the CU is very closely related to its texture complexity. The depth of the CU is from 0 to 3. The more complex the texture is, the higher the depth of the CU is divided. Therefore, this section proposes a CNN-based intra prediction CU fast division algorithm, which adds a Feature Pyramid model. Using the top-down path and left to right connection, each layer of the neural network can quickly extract and integrate high-level features and low-level features. Secondly, the Inception module is also used, which can greatly reduce the amount of parameters and computation, increase the network depth and width, and improve the performance of image prediction classification [27,28,29,30,31].

2.2.2. Algorithm Model

The structure of a CNN-based intra prediction CU fast partition algorithm model proposed in this section is shown in Figure 4. The model is mainly composed of an input layer, a Feature Pyramid, a convolution layer, an Inception module, a full connection layer, and an output layer. The numbers following the comma in the figure represent the number of convolutional kernels.

In order to solve the complex problem of CU division of intra prediction module in HEVC coding, reduce the video data compression time, and improve the compression performance, this section has designed a CNN-based intra prediction CU fast division algorithm model architecture, as shown in Figure 4. The specific contents are as follows:

The input layer is a 64 × 64 pixel block CTU; in order to facilitate the calculation, the pixel value data is normalized. First, the texture features of the image are extracted through the Feature Pyramid network. Its shallow convolution core checks low-level features, and its deep convolution core checks high-level features. Through the use of top-down paths and left-to-right connections, the low resolution and strong semantic features are combined with the high resolution and weak semantic features. Each layer can access low-level and high-level features at the same time. Finally, the most appropriate layer is selected according to the size of the output CTU to predict the feature map. In the above figure, the feature map output from the feature pyramid CTU is downsampled into 32 × 32; the quantity is 8. This is followed by a convolution layer with a convolution core size of 4 × 4. This convolution kernel connects each pixel to the receptive field in the input feature map to obtain 24 feature maps; its size is 8 × 8.

In order to greatly reduce the amount of parameters and computation, increase the depth and width of the network, and improve the performance of image prediction and classification, the Inception module is used. Its core idea is to repeatedly stack different convolution layers to form a larger network, aggregate visual information in different sizes, and facilitate the extraction of features from different scales. Considering this practical feature, CTU should be divided and predicted in advance, so the size of convolution kernel and pooling layer is adjusted. Then, 24 low-level feature maps of 8 × 8 are input to the Inception module and processed on four different branches, respectively. The first branch passes 1 × 1 Convolution to get eight sheets of 1 × 1. The second branch passes 1 × 1 Convolution operation and then 2 × 2 Convolution to get eight sheets of 2 × 2. The third branch takes it by a 1 × 1 convolution and then takes it by a 4 × 4 convolution. The fourth branch first maximizes the pool, and then uses 1 × 1 Convolution to get eight sheets of 1 × 1. The 1 × 1 Convolution operation is to fuse the information of different branch channels, reduce the amount of parameters and calculations, and facilitate training.

The last two hidden layers are fully connected layers. Thirty-two sheets are obtained through the Inception module, and an 8 × 8 advanced feature map is integrated into a one-dimensional array and sequentially connected with the full connection layer to output training. The first full connection layer has a PReLU (Parametric Corrected Linear Unit) activation function, which can accelerate the convergence speed during network training. The second fully connected layer is made up of 10 1 × 1 nerve units. The loss function of this layer is Softmax. The network structure using this loss function can perform well in single label classification tasks. It is a single label classification task to judge the category of CTU depth range. Finally, the prediction results of one CU depth of 1, four CU depths of 2, and sixteen CU depths of 3 were obtained.

The PReLU (Parametric Rectified Linear Unit) activation function during model training is shown in Equation (5) and the loss function Softmax is shown in Equation (6):

P R e L U (x) = {\begin{cases} x, x > 0 \\ α_{i} x, x \leq 0 \end{cases}

(5)

where x represents the component input of the convolution feature image,

α

is a parameter between 0 and 1, and i represents different channels.

SL = - \frac{1}{m} \sum_{i = 1}^{m} \log \frac{e^{w_{y_{i}}^{T} x_{i}}}{\sum_{k \in D} e^{w_{K_{}}^{T} x_{i}}}

(6)

where SL represents the loss function Softmax Loss, which is the loss function of Softmax and cross-entropy loss to measure the predicted and true values of the model. The D represents the CU depth category label set, T represents the matrix transpose operation,

x_{i}

is the i th depth feature vector,

y_{i}

represents the corresponding depth label, w is the corresponding class K weight parameter vector, and n and m are the number of categories and the number of training samples, respectively.

2.2.3. Algorithm Flow Chart

Figure 5 shows the flow chart of the CNN-based intra prediction CU fast partitioning algorithm proposed in this section. First, the algorithm is used to extract the texture features of the CTU to be encoded, predict the shallowest depth of the CU in advance, and then recursively calculate the rate distortion cost of all CUs within the depth range, and then decide whether to continue the downward division. If the maximum depth of the optimal division is not reached, the calculation is continued, and finally, the optimal division is obtained.

3. Experimental Results and Analysis

3.1. Performance Experiment of Frame Predicting CU Fast Partition Algorithm Based on CNN

This In this section, a full-frame encoding mode will be adopted for testing in the HEVC official model HM16.20. The hardware configuration of the experimental environment is Intel Core i5-10200H CPU, 2.40 GHz main frequency, and 64-bit Windows 10 operating system with 16 G memory. The framework of the network model implementation uses pytorch and NVIDIA RTX 2060 GPU for acceleration. To embody the encoding performance of the algorithm presented in Section 2, the BD-BR (Bjontegaard Delta Bit-rate) and the BD-PSNR (Bjontegaard Delta Peak Signal-to-Noise Rate) ΔTime deevaluation algorithms are used, where BD-BR represents the saving situation of the code rate under the same objective quality (the smaller the value, the lower the code rate, and the better the encoding efficiency), and the BD-BR represents the difference in the PSNR of the peak SNR at the equivalent code rate (the larger the value, the smaller the image quality loss in the video). The ΔTime means that encoding saves time, and the calculation Equation is as follows:

Δ Time (%) = \frac{E ncoding T {ime}_{proposed} - E ncoding T {ime}_{HM 16.20}}{E ncoding T {ime}_{HM 16.20}} \times 100 %

(7)

where

T {ime}_{proposed}

represents the time required to propose algorithm coding, and

T {ime}_{HM 16.20}

represents the encoding time of HM16.20. The specific coding settings are shown in Table 1.

The training sample of this neural network is shown in Table 2, which comes from nine different HEVC test sequences, which are, respectively, Traffic and PeopleOnStreet of Class A, with a size of 2560 × 1600, and BQTerrace and Cactus of Class B, with a size of 1920 × 1080. Class C RaceHorses and BQMall with a size of 832 × 480, Class D RaceHorses and BasketballPass with a size of 416 × 240, and Class E FourPeople with a size of 1280 × 720 were tested, as shown in Table 3.

Table 4 shows the experimental results of the algorithm compared with the HM16.20 encoder. It can be seen from the table that the CNN-based intra prediction CU fast partitioning algorithm can greatly reduce the coding time and greatly improve the coding efficiency. Compared with the HM16.20, the algorithm reduces the coding time by 60.82% on average, whereas BD-BR only increases by 5.96% and BD-PSNR only loses 0.19 dB, which is acceptable. Especially for sequences with large areas of simple texture regions, the FourPeople sequence of Class E saves the most time, and the RaceHorses sequence of Class D saves the least time.

3.2. Video Compression Efficiency Experiment Based on YOLOv4

The hardware configuration of the experiment environment is the same as that in Section 3.1. During the training, the YOLOv4 network structure is taken as the model, freezing and thawing ideas are adopted for training, and the pre-training model is loaded. The frozen part occupies the first half of the iteration times. During the freezing, the learning rate is 0.001, and the weight of the Backbone network does not change to accelerate the training speed. The unfrozen part accounts for the last half of the iteration times, the learning rate is adjusted to 0.001, and the weight of the Backbone network is changed. The momentum coefficient is 0.9, and the learning rate is adjusted by the cosine annealing learning rate. The smooth label is set to 0.005, and image training of different scales is adopted to make the model have a better recognition effect.

To verify the effectiveness of adding a YOLOv4 target detection algorithm to preprocess and extract valuable video content for improving video compression efficiency, MPEG coding and FLV coding are selected for video compression testing. Table 5 shows the compression effect comparison results. The compression ratio definition, Equation (8), is as follows:

C R = \frac{S O - S C}{S O}

(8)

CR represents the compression ratio, SO represents the initial video size, and SC represents the compressed video size.

After the original video is preprocessed with YOLOv4, the video coding is performed, which greatly improves the compression efficiency. The compression ratio of FLV coding after the extraction and recognition of YOLOv4 is increased by 11% on the original basis, and that of MPEG coding is increased by 17%. The better the video compression effect is, the more obvious the improvement is. The results show that the video data compression based on YOLOv4 can not only retains the valuable video content needed, but also greatly compress the data, saving a lot of storage space.

3.3. Video Depth Compression Experiment Based on Content Value

In order to verify the effectiveness of highway video compression based on the hybrid depth learning method proposed in this paper, the highway real-time online monitoring video data are retrieved and tested using the traditional compression method, H.266, namely, VVC coding, reference algorithm, and the method proposed in this paper.

As shown in Table 6, compared with the other three methods, the algorithm proposed in this paper can save 62.82% of the encoding time when BD-BR only increases by 5.96% and BD-PSNR only loses 0.19 dB. It is verified that the algorithm proposed in this paper can save more time in exchange for lower bitrate performance, and reduce the intra frame encoding complexity more effectively. After the target detection algorithm, it slightly loses 2% of the content retention ratio and can achieve 64% of the high compression ratio. This shows that YOLOv4 is an effective method of expressway surveillance video target recognition based on content value. Compared with the traditional video compression methods, both the algorithm proposed in this paper and the algorithm in the references can improve the video compression performance better, which shows that video data compression combined with depth learning has better robustness and adaptability.

In order to further intuitively reflect the performance of the algorithm proposed in this paper, as shown in Figure 6, this figure is the rate distortion curve of RaceHorses, a class C video sequence, encoded by different algorithms. It can be seen from the figure that the algorithm proposed in this paper almost overlaps with the reference [11] IPCNN and [12] IPCED, as well as the RD curve of traditional encoding VVC and HM16.20, which again proves that the rate distortion loss of the algorithm in this paper is very small, and the compression performance of the algorithm is good.

4. Conclusions

Aiming at the problems of highway surveillance video, such as large storage space, sparse information, and complex and time-consuming traditional video coding, this paper proposes an expressway video data compression method based on hybrid deep learning. This method first uses YOLOv4 to analyze the content value of the original video, extract video frames with vehicle information, and eliminate unintentional frames. An improved CNN is then designed by adding Feature Pyramids and the Inception module to accelerate the extraction and fusion of features at all levels and improve the performance of image classification and prediction. Finally, the whole model is integrated into an HEVC encoder for compressing the preprocessed video. The experimental results show that the proposed method achieves a higher compression ratio and shorter coding time compared with other deep-learning-based compression methods.

This paper only considers the fast division of intra prediction CU in video coding based on a neural network, and does not consider the large amount of calculations generated in the selection of 35 prediction modes. In the future, we will continue to develop the fast selection of 35 prediction models through a neural network, combined with the algorithm proposed in this paper, to further reduce the coding complexity. Furthermore, we will conduct research on neural networks for the inter prediction of video coding.

Author Contributions

Data curation, J.L.; formal analysis, Y.S.; methodology, J.L. and F.D.; project administration, F.D.; software, J.L. and L.Z.; supervision, Y.S.; validation, L.Z.; writing—original draft, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Natural Science Foundation of China (52167008), Outstanding Youth Fund Project of Jiangxi Natural Science Foundation (20202ACBL214021), Key Research and Development Plan of Jiangxi Province (20202BBGL73098), and Science and Technology Project of Education Department of Jiangxi Province (GJJ210650).

Conflicts of Interest

The authors declare no conflict of interest.

References

Qingchun, Y. Analysis and Evaluation of Expressway Intelligent Video Monitoring System. Master’s Thesis, Suzhou University, Suzhou, China, November 2014. [Google Scholar]
Hua, Y.; Liu, W. Generalized Karhunen-Loeve transform. IEEE Signal Process. Lett. 1998, 5, 141–142. [Google Scholar]
Bowei, Y. An End-to-End Video Coding Method Based on Convolutional Neural Network. Master’s Thesis, Hebei Normal University, Shijiazhuang, China, May 2022. [Google Scholar]
Hong, R. Research on Scalable Coding in Video Compression Coding. Presented at the Western China Youth Communication Academic Conference, Chengdu, China, 1 December 2008; pp. 341–345. [Google Scholar]
Xia, S.; Yang, W.; Hu, Y.; Liu, J. Deep inter prediction via pixel-wise motion oriented reference generation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1710–1774. [Google Scholar]
Huo, S.; Liu, D.; Wu, F.; Li, H. Convolutional neural network-based motion compensation refinement for video coding. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–4. [Google Scholar]
Chen, M.; Goodall, T.R.; Patney, A.; Bovik, A.C. Learningto compress videos without computing motion. arXiv 2020, arXiv:2009.14110. [Google Scholar]
Wu, C.Y.; Singhal, N.; Krhenbühl, P. Video compression through image interpolation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Lu, G.; Ouyang, W.; Xu, D.; Zhang, X.; Cai, C.; Gao, Z. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11006–11015. [Google Scholar]
Djelouah, A.; Campos, J.; Schaub-Meyer, S.; Schroers, C. Neural inter-frame compression for video coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6421–6429. [Google Scholar]
Cui, W.; Zhang, T.; Zhang, S.; Jiang, F.; Zuo, W.; Wan, Z.; Zhao, D. Convolutional neural networks based intra prediction for HEVC. In Proceedings of the 2017 Data Compression Conference (DCC), Snowbird, UT, USA, 4–7 April 2017. [Google Scholar]
Jin, Z.; An, P.; Shen, L. Video intra prediction using convolutional encoder decoder network. Neurocomputing 2020, 394, 168–177. [Google Scholar] [CrossRef]
Agustsson, E.; Mentzer, F.; Tschannen, M.; Cavigelli, L.; Tim-ofte, R.; Benini, L.; van Gool, L. Soft-to-hard vector quantization for end-to-end learning compressible representations. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers: Burlington, MA, USA, 2017; Volume 30. [Google Scholar]
Balle, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. arXiv 2018, arXiv:1802.01436. [Google Scholar]
Mu, L.; Zuo, W.; Gu, S.; Zhao, D.; Zhang, D. Learning convolutional networks for content-weighted image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3214–3223. [Google Scholar]
Toderici, G.; O’Malley, S.M.; Hwang, S.J.; Vincent, D.; Minnen, D.; Baluja, S.; Covell, M.; Sukthankar, R. Variable rate image compression with recurrent neural networks. arXiv 2015, arXiv:1511.06085. [Google Scholar]
Kumar, V.; Govindaraju, H.; Quaid, M.; Eapen, J. Fast intra mode decision based on block orientation in high efficiency video codec (HEVC). In Proceedings of the 2014 International Symposium on Computer, Consumer and Control, Taichung, Taiwan, 10–12 January 2014; pp. 506–511. [Google Scholar]
Zhang, T.; Sun, M.-T.; Zhao, D.; Gao, W. Fast Intra-Mode and CU Size Decision for HEVC. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1714–1726. [Google Scholar] [CrossRef]
Liu, Z.; Yu, X.; Gao, Y.; Chen, S.; Ji, X.; Wang, D. CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network. IEEE Trans. Image Process. 2016, 25, 5088–5103. [Google Scholar] [CrossRef] [PubMed]
Lim, L.K.; Lee, J.; Kim, S.; Lee, S. Fast PU Skip and Split Termination Algorithm for HEVC Intra Prediction. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 1335–1346. [Google Scholar] [CrossRef]
Lv, X.; Yu, H.; Zhou, Y.; Liang, L.; Zhang, T.; Wu, J.; Zhang, S. Video data compression method for marine fishery production management based on content value. Ocean Inf. 2020, 35, 58–64. [Google Scholar] [CrossRef]
Yi, H. Research on Vehicle Detection in Complex Scenes with Improved YOLOv4 Algorithm. Master’s Thesis, Fuyang Normal University, Fuyang, China, June 2022. [Google Scholar]
Chen, Y.; Wang, Y.; Zhang, Y.; Guo, Y. PANet: A Context Based Predicate Association Network for Scene Graph Generation. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019. [Google Scholar]
Shaoming, L.; Jun, C.; Lu, L.; Xuji, T. Scale estimation based on IOU and center point distance prediction in target tracking. Autom. Equip. 2021, 1–14. [Google Scholar] [CrossRef]
Zeqi, F. Research on HEVC Based Coding Technology with Low Complexity and Strong Network Adaptability. Master’s Thesis, Beijing University of Technology, Beijing, China, May 2019. [Google Scholar]
Jun, M. Design and Implementation of Full HD Video Compression, Storage and Forwarding System. Master’s Thesis, Central North University, Taiyuan, China, June 2021. [Google Scholar]
Yuansong, L. Research on Fast Algorithm of Video Coding. Master’s Thesis, Guangdong University of Technology, Guangzhou, China, May 2022. [Google Scholar]
Yue, L. Research on Intra Prediction Coding Technology Based on Deep Learning. Ph.D. Thesis, University of Science and Technology of China, Hefei, China, May 2019. [Google Scholar]
Xuan, D. Research on Video Coding and Decoding Technology Based on Deep Learning. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, March 2020. [Google Scholar]
Huanchen, Z. Research on Fast Intra frame Algorithm of Video Coding Standard H.266/VVC. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, May 2021. [Google Scholar]
Wei, Y. Research on Automatic Extraction Algorithm of Texture Samples Based on Depth Learning. Master’s Thesis, Shenzhen University, Shenzhen, China, May 2020. [Google Scholar]

Figure 1. Flow chart of expressway video depth compression method based on content value.

Figure 2. YOLOv4 network model.

Figure 3. Vehicle Recognition Based on YOLOv4 Target.

Figure 4. CNN-based intra prediction CU fast partition algorithm model structure diagram.

Figure 5. Flow chart of CNN-based intra prediction CU fast partition algorithm.

Figure 6. RD performance curve of RaceHorses sequence of each algorithm.

Table 1. Coding parameter configurations.

Encoding Configuration
GOP Size	1
QPs	22, 27, 32, 37
Max CU Size	64 × 64
Min CU Size	8 × 8
Max Partition Depth	3
Min Partition Depth	0

Table 2. Training sample.

Sequence	Frame Number	CTU Number per Frame	Total CTU Number per Sequence
PeopleOnStreet, Traffic	1	1000	1000
BQTerrace, Cactus	1, 2	480	960
RaceHorses, BQMall	1–10	91	910
BasketballPass, RaceHorses	1–50	18	900
FourPeople	1–5	220	1100

Table 3. Training sample test sequence.

Class	Resolution	Sequence	Frame Rate
A	2560 × 1600	PeopleOnStreet, Traffic	30/30 fps
B	1920 × 1080	BQTerrace, Cactus	50/50 fps
C	832 × 480	RaceHorses, BQMall	50/50 fps
D	416 × 240	BasketballPass, RaceHorses	30/50 fps
E	1280 × 720	FourPeople	60 fps

Table 4. Time-saving comparison between this algorithm and HM-16.20.

Class	Sequence	BD-BR /%	BD-PSNR /dB	QP = 22	QP = 27	QP = 32	QP = 37	ΔTime%
A	PeopleOnStreet	6.45	−0.07	−61.43	−60.13	−61.96	−61.21	−61.20
A	Traffic	7.04	−0.10	−59.98	−62.12	−62.44	−61.79	−61.60
B	BQTerrace	5.23	−0.30	−61.42	−61.45	−62.43	−61.22	−61.63
B	Cactus	6.87	−0.46	−61.76	−62.12	−60.23	−61.70	−61.50
C	RaceHorses	4.67	−0.09	−64.32	−59.23	−60.30	−60.42	−61.06
C	BQMall	5.87	−0.21	−60.12	−60.99	−60.21	−60.69	−60.49
D	RaceHorses	4.12	−0.15	−56.23	−55.98	−56.78	−55.69	−56.17
D	BasketballPass	6.09	−0.19	−56.56	−56.61	−57.45	−56.95	−56.90
E	FourPeople	7.34	−0.14	−67.34	−66.88	−66.23	−66.98	−66.85
Average		5.96	−0.19	−61.01	−60.61	−60.89	−60.73	−60.82

Table 5. Comparison Results of Compression Efficiency of Different Encoding Formats Based on YOLOv4.

Initial Video /G	YOLOv4 Pretreatment	FLV Code	MPEG Code	Compressed Video Size/G	Compression Ratio
54.23	NO	YES	NO	40.78	23%
54.23	NO	NO	YES	36.45	33%
54.23	YES	YES	NO	35.86	34%
54.23	YES	NO	YES	26.74	50%

Table 6. Comparison of video compression performance between algorithms in this paper, reference algorithms, and VVC coding.

Algorithm	BD-BR /%	BD-PSNR /dB	ΔTime /%	Content Retention Ratio	Compression Ratio
VVC	0.99	−0.12	−26.56	1	35%
IPCNN	7.78	−0.36	−57.73	1	59%
IPCED	1.21	−0.05	−32.54	1	43%
Proposed	5.96	−0.19	−62.82	0.98	64%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, Y.; Lu, J.; Zhu, L.; Deng, F. Research on Deep Compression Method of Expressway Video Based on Content Value. Electronics 2022, 11, 4024. https://doi.org/10.3390/electronics11234024

AMA Style

Shen Y, Lu J, Zhu L, Deng F. Research on Deep Compression Method of Expressway Video Based on Content Value. Electronics. 2022; 11(23):4024. https://doi.org/10.3390/electronics11234024

Chicago/Turabian Style

Shen, Yang, Jinqin Lu, Li Zhu, and Fangming Deng. 2022. "Research on Deep Compression Method of Expressway Video Based on Content Value" Electronics 11, no. 23: 4024. https://doi.org/10.3390/electronics11234024

APA Style

Shen, Y., Lu, J., Zhu, L., & Deng, F. (2022). Research on Deep Compression Method of Expressway Video Based on Content Value. Electronics, 11(23), 4024. https://doi.org/10.3390/electronics11234024

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Deep Compression Method of Expressway Video Based on Content Value

Abstract

1. Introduction

2. Deep Compression Method Based on Content Value

2.1. YOLOv4 Target Detection Algorithm

2.2. Fast Algorithm of Intra Prediction CU Partition Based on CNN

2.2.1. Overview

2.2.2. Algorithm Model

2.2.3. Algorithm Flow Chart

3. Experimental Results and Analysis

3.1. Performance Experiment of Frame Predicting CU Fast Partition Algorithm Based on CNN

3.2. Video Compression Efficiency Experiment Based on YOLOv4

3.3. Video Depth Compression Experiment Based on Content Value

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI