1. Introduction
Over the past few years, numerous technological breakthroughs have led to an increase in the creation and consumption of audiovisual multimedia materials. Consumers are excessively exposed to video content through a multitude of social networking platforms, media-sharing Internet sites, and mobile phone applications. According to the most recent report by Cisco [
1], there is a notable increase in the popularity and demand for video applications. The video streaming market is expected to reach around 1.6 billion users by 2027, showing significant growth and a rising global interest in video streaming services. The user penetration rate is anticipated to increase from 18.3% in 2024 to 20.7% by 2027 [
2]. Specifically, it is anticipated that two-thirds (66%) of TV sets connected to the Internet will possess ultrahigh-definition (UltraHD) resolution, compared to a mere 33% in the year 2018. The term “UltraHD” is used to describe the resolution of 3840 × 2160 pixels, which is also commonly referred to as 4K. The usual bitrate for a 4K encoded video is commonly observed to range between 15 and 18 Mb/s, which exceeds the high-definition (HD) video bitrate by more than two-fold and surpasses the standard-definition (SD) video bitrate by a factor of nine [
3]. According to Cisco Visual Networking Index forecasts, an expected 23% increase in Compound Annual Growth Rate (CAGR) in worldwide IP traffic between 2021 and 2026 will occur, reaching 2.3 zettabytes annually by 2026. It is expected that ‘video’ traffic will remain dominant, constituting 87% of global IP traffic by 2026 [
4]. Nonetheless, the storing and delivery of this immense quantity of data pose significant challenges, necessitating the utilization of efficient compression methods [
5]. As smartphones and social networks have become more popular over the past few years, many streaming services (Netflix, Disney Plus, YouTube TV, Hulu, Apple TV Plus) are available that can stream 4K videos online. This surge in video consumption necessitates efficient coding methods, especially for UltraHD resolutions like 4K (3840 × 2160 pixels) and 8K (7680 × 4320 pixels).
The fundamental aim of most digital video coding standards is to reduce the bitrate of video while maintaining the integrity of visual presentation. This means minimizing the bitrate necessary for the representation of video content to reach a given level of video quality or maximizing the video quality achievable within a given available bitrate. As a successor to H.264, High-efficiency video coding (HEVC) [H.265/MPEG-H] standard was released in 2013 [
6]. HEVC was development prioritized two main concerns: higher video resolutions and the utilization of parallel processing architectures. However, the adoption of HEVC has been gradual, mainly due to higher processing power and other hardware requirements. The HEVC/H.265 video compression standard can effectively compress video content of various resolutions, including 8K.
HEVC standard was jointly created by the International Telecommunication Union—Telecommunication Standardization Sector (ITU-T)—Video Coding Experts Group (VCEG) and the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC)—Moving Picture Experts Group (MPEG). The HEVC standardization’s main objective was to facilitate a substantial enhancement in compression performance compared to existing standards. HEVC can achieve a bitrate reduction of around 50% (as compared to H.264) while maintaining an equivalent level of perceptual video quality [
4,
7]. Additionally, there is evidence to support the superiority of HEVC over VP9 in several aspects [
8]. HEVC can achieve this compression performance through a range of technical capabilities and qualities, such as supporting high-resolution video, improved color representation, and a more flexible block partitioning mechanism [
9].
HEVC outperforms H.264 due to several key factors. HEVC achieves increased compression efficiency by employing advanced encoding techniques and introducing new coding tools such as larger block sizes [
10], improved intra prediction [
11], improved motion estimation [
12], and compensation methods [
13], leading to superior inter-frame prediction [
14] and motion representation [
15].
HEVC has better support for high-resolution video, including 4K and 8K resolutions. Additionally, HEVC supports a wider range of bit depths, enabling more accurate color representation and improved visual quality. Finally, HEVC is designed for a wide range of applications and use cases, making it suitable for delivering high-quality video over various networks and platforms [
13,
16,
17,
18,
19,
20]. However, it is important to note that HEVC’s performance gains come with increased computational complexity, which can impact hardware requirements [
21].
The latest iteration of HEVC software is represented by the HEVC HM-18.0 reference software [
22].
The main contributions of this paper are listed as follows:
A suitable encoding configuration for low-activity video sequences is selected to improve the coding performance. For such sequences, our results show that using the IPPP configuration can significantly improve coding performance by up to 4 dB.
Investigated the impact of motion content on the coding efficiency of HEVC video coding. Our results show that for highly active sequences, IPPP has a negligible performance advantage over periodic-I and periodic-IDR. Here, our results suggest using periodic-I and periodic-IDR rather IPPP to obtain the benefits of I-frames of limiting error propagation and offering random access while not losing a significant coding performance.
Investigated the impact of coding structure on decoding complexity. Our results show that IPPP has a slightly lower decoding complexity than periodic-I and periodic-IDR.
Proposed an adaptive scheme that adjusts the GOP structure and intra coding techniques used based on the motion content of the encoded video.
The rest of the paper is arranged as follows. In
Section 2, the structure of HEVC codec is described.
Section 3 reviews the related work. The evaluation methodology and configurations are discussed in
Section 4, with an explanation of each phase.
Section 5 presents the performance results of sequences and their evaluations in terms of bitrate efficiency and video quality.
Section 6 discusses the results in the broadest context. A conclusion of this paper and suggestions for future work are provided in
Section 7.
2. HEVC Codec
Overall, the HEVC structure (shown in
Figure 1) provides a high degree of flexibility and adaptability, allowing it to optimize coding performance for a wide range of applications and content types.
HEVC divides video frames into a hierarchy of Coding Units (CU), as shown in
Figure 2 [
23]. The hierarchical structure organizes video frames into progressively smaller and more localized units for compression purposes. A Coding Tree Unit (CTU) is comprised of a rectangle picture area containing N × N samples of the luma component and its associated chroma components. The encoder has the ability to select the CTU sizes based on its specific architectural features and the requirements of the application environment. These limits may include memory requirements and constraints on latency. The bitstream contains a signal indicating the value of N, which can be either 64, 32, 16, or 8.
Furthermore, each CTU is partitioned into Coding Tree Blocks (CTBs), which can be further partitioned into multiple coding blocks (CBs), as shown in
Figure 3. The chosen sizes of CBs might differ based on the intricacy of the information being encoded. The smallest CB is 4 × 4 samples, and the largest is 64 × 64 samples.
The HEVC standard also includes new coding tools that contribute to its improved coding efficiency. These include a more flexible prediction structure, a more efficient intra prediction scheme, a more powerful transform and quantization process, and a more sophisticated entropy coding scheme.
There are 35 intra prediction modes integrated within the codec. HEVC uses two types of transform coding: Discrete Cosine Transform (DCT) of type II (DCT-II) and Discrete Sine Transforms of type VII (DST-VII). The sizes of the transformation blocks range from 8 × 8 to 16 × 16 to 32 × 32 [
24]. Encoder design features two loop filters, each optimized for a different aspect of video encoding. The first is the deblocking filter, which has the primary purpose of reducing compression-induced blocking artifacts. The second filter is called the Sample Adaptive Offset (SAO) filter, and it is used to eliminate artifacts caused by video coding’s transform and quantization processes [
25]. HEVC standard uses Context Adaptive Binary Arithmetic Coder (CABAC) as the only entropy coding technique. CABAC can greatly improve compression efficiency through an arithmetic coding approach. Nonetheless, CABAC implementation is intricate and has its drawbacks, including a decrease in processing speed and increased hardware costs [
26].
3. Related Work
Video compression and video quality assessment play pivotal roles in enabling efficient storage and transmission of multimedia content. As the need for multimedia services, especially video, has grown, these areas have become increasingly important. The HEVC standard has become a fundamental aspect among the several video compression standards, providing better compression efficiency in comparison to its predecessors.
Xu et al. [
27] performed a thorough evaluation of the H.265/HEVC compression standard, examining the impact of bitrate and Group Of Pictures (GOP) pattern on video quality. The research aimed to provide guidance on video compression techniques, particularly in the areas of bitrate and GOP pattern selection. The study aimed to examine the relationship between video quality and bitrate across different GOP patterns. The evaluation of video quality was performed using objective metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Video Quality Metric (VQM). The study’s results demonstrated that increasing the bitrate led to better video quality when using the same GOP pattern. Additionally, enlarging the GOP size while keeping the number of B-frames the same resulted in increased video quality.
Mackin et al. [
28] examined how changes in frame rate affect the compression of HEVC video. The study demonstrated that greater frame rates, specifically those over 60 fps, can improve the quality of perception, particularly when using higher bitrates. The researchers introduced a new approach to measure the degree of content dependency by classifying video sequences into distinct groups according to their motion characteristics. Their research uncovered a nearly straight-line connection between the average bitrate and the ideal frame rate, suggesting that the ideal frame rate fluctuates with different bitrates. The study emphasizes the significance of taking video content into account when choosing the most suitable frame rates for various video sequences.
Valizadeh et al. [
29] introduced a new approach to improve the effectiveness of video coding by using perceptual coding approaches that consider the characteristics of the Human Visual System (HVS). The proposed method included the video quality parameter PSNR-HVS in the rate–distortion optimization process in order to achieve greater compression efficiency compared to the HEVC standard system. Their proposed methodology demonstrated a reduction in bitrate of up to 4.56% for the evaluated video sequences.
The study conducted by Ruiz Atencia et al. [
30] examined the influence of different HEVC coding configuration parameters on the perceptual quality of reconstructed videos. The study employed diverse metrics, including traditional image quality assessments and Netflix’s Video Multi-Method Assessment Fusion (VMAF), moving beyond conventional PSNR metrics. The methodology involved encoding video sequences under different configurations. The paper underscored the importance of considering perceptual quality metrics and provided valuable insights for configuring video encoders to optimize perceptual rate–distortion performance.
Kobayashi et al. [
31] proposed a hybrid architectural system integrating hardware for efficient HEVC multiple channels encoding with software for packaging. This system allows for adaptive bitrate/multi-channel encoding, low-latency, and supports multiple HTTP streams protocol, as well as 4K video at 60 frames per second. In order to address the issue of content-aware bitrate control, the authors have proposed a technique that modifies the target bitrate according to the complexity of the video scene. This technique ensures that the bitrate ladder relationship is maintained and employs QP control to encode at a lower bitrate without causing noticeable degradation in image quality. Experimental results demonstrated a significant reduction in encoding bitrates.
Hamdoun et al. [
32] examined the efficacy of integrating error-protection methods for HEVC video transmission via satellite channels. These methods included systematic network coding and physical-layer turbo coding, emphasizing the advantages of network coding in terms of error protection and resilience performance gains. The study specifically examined the network coding attributes in cases where no packets are lost in order to identify the precise qualities that are relevant to the GOP in streaming multimedia. Additionally, the paper utilized the
IPPP encoding structure in the video encoding process, employing it to encode video sequences using the HEVC standard. Their results showed considerable video quality improvement compared to only UDP flow error-protection methods.
Joy and Kounte [
33] proposed a novel approach to enhance HEVC compression efficiency using deep learning technology. The proposed deep-depth decision algorithm employed a content-based deep learning approach to training separate chroma and luma components. The algorithm predicted the depth of CTU and converted it into a simplified vector with 16 elements, leading to a reduction in encoding time and an improvement in encoding bitrate.
Z Pan et al. [
34] suggested an algorithm that leverages the features of video content to enhance bit allocation in HEVC. Their algorithm established a correlation between motion activity, texture complexity, and bit allocation, resulting in enhanced rate–distortion performance and coding efficiency.
These research works used different techniques to improve compression efficiency and enhance the perceived video quality. However, none of them considered the effects of motion content and intra coding techniques on coding performance.
5. Results
Firstly, we analyzed the tested sequences to check their motion activity. We used average motion vectors per pixel (MVpp) as an indicator of motion activity.
We compressed the test sequences using the configurations in
Table 1. We then analyzed the compression performance, taking into consideration the motion activity of the tested sequences.
5.1. Motion Activity
Figure 7 shows the average MVpp for the tested video sequences when using
IPPP coding configuration. As can be seen,
Crowd_run and
Ducks_take_off are the most active sequences in terms of this indicator. On the other hand,
Sunflower and
HoneyBee show the least motion activity.
Figure 8 and
Figure 9 show a predicted frame of
Crowd_run and
HoneyBee test sequences consecutively, with motion vectors shown as white and red lines. As it is clear from these figures, the predicated frame of
Crowd_run includes more motion vectors (with many large ones) than the
HoneyBee sequence.
5.2. Rate–Distortion Performance
Figure 10,
Figure 11 and
Figure 12 show the rate–distortion curves for less active video sequences
HoneyBee,
Sunflower, and
FourPeople consecutively.
For
HoneyBee test sequence at a bitrate of 1 Mbps, the
IPPP configuration can achieve about 4 dB over both
periodic-I and
periodic-IDR configurations.
Periodic-I has a slight coding advantage over
periodic-IDR configuration, as shown in
Figure 10.
For the other two sequences in this low activity group (Sunflower and FourPeople) at a bitrate of 1 Mbps, IPPP configuration can give about (1 dB and 1.5 dB) quality improvement when compared to periodic-I and periodic-IDR configurations.
Figure 13 and
Figure 14 show the rate–distortion curves for the
Mobcal and
Shields test sequences that have intermediate motion activity.
The results show that the IPPP coding configuration can still achieve a reasonable quality improvement over periodic-I and periodic-IDR configurations. At 2 Mbps bitrate, the IPPP can achieve about 1.5 dB better than periodic-I and periodic-IDR for Mobcal. For Shields at 2 Mbps bitrate, the IPPP configuration can achieve about 1 dB better quality than the other two coding configurations.
For all sequences with low and intermediate motion activity, periodic-I shows a slight coding improvement over the periodic-IDR.
Figure 15,
Figure 16 and
Figure 17 show the rate–distortion curves for the more active sequences
YachtRide,
Ducks_take_off and
crowd_run consecutively. The results show that the
IPPP configuration has a very small performance advantage over
periodic-I and
periodic-IDR configurations. Additionally,
periodic-I and
periodic-IDR configurations show negligible coding differences for these sequences.
5.3. Encoding and Decoding Times
To check the complexity of encoding and decoding different video configurations, we encoded
Sunflower,
HoneyBee, and
FourPeople test sequences using
IPPP,
Periodic-I, and
periodic-IDR coding configurations. The sequences were encoded using QPs of 30, 32, and 27 consecutively to achieve an average PSNR of about 40 dB. The results of encoding times are shown in
Figure 18, while
Figure 19 shows the results of decoding times. Encoding times are almost not changed for different coding configurations except for the
Sunflower sequence, which needs more time to encode the
IPPP configurations than
periodic-I and
periodic-IDR configuration. Looking at
Figure 19, it is clear that the decoding time for the
IPPP coding configurations is less than that of the
periodic-I and
periodic-IDR configurations. The reason is that for the same video quality,
periodic-I and
periodic-IDR configurations use more bitrate than
IPPP (
Figure 10,
Figure 11 and
Figure 12). Increased bitrate needs more processing, which increases decoding times.
6. Discussion
Choosing the optimal coding configuration for encoding a video sequence is challenging as the motion activity of the coded video has a significant impact on the HEVC coding performance. We analyzed the coding performance of a range of video sequences with different levels of motion activity using different coding configurations.
6.1. Low Motion Activity
For low motion activity test sequences (HoneyBee, Sunflower, and FourPeople), the IPPP configuration can achieve considerably better coding than both periodic-I and periodic-IDR configurations. Periodic-I has a slight coding advantage over periodic-IDR configuration.
6.2. Intermediate Motion Activity
Rate–distortion curves for the test sequences with intermediate motion activity (Mobcal, Shields) show that the IPPP coding configuration can achieve a reasonable quality improvement over periodic-I and periodic-IDR configurations. Additionally, periodic-I shows a slight coding improvement over the periodic-IDR.
6.3. High Motion Activity
The rate–distortion curves for the more active sequences (YachtRide, Ducks_take_off and Crowd_run) show that the IPPP configuration has a very small performance advantage over periodic-I and periodic-IDR configuration. Additionally, periodic-I and periodic-IDR configurations show negligible coding differences for these sequences.
For the same quality video, IPPP uses fewer bits than periodic-I and periodic-IDR configurations. Therefore, the decoding time for the IPPP coding configurations is slightly less than that of the periodic-I and periodic-IDR configurations.
Generally, IPPP coding configuration can achieve a better coding performance by heavily relying on inter-frame prediction. Additionally, IPPP tends to have reduced decoding complexity compared to periodic-I and periodic-IDR structures. However, I-frames can minimize error propagation in error-prone environments and improve the random-access capability of encoded video.
However, for sequences with high motion activity, our results show that the coding advantage of the IPPP over periodic-I and periodic-IDR is very small. Therefore, for such sequences, we recommend including I-frames to obtain the advantages of these frames while not losing any significant coding performance.
Therefore, we propose an enhancement to the HEVC codec so that it can dynamically select the encoding configuration based on the motion content of the encoded video content. This adaptive scheme will offer better coding performance when the encoded video has low motion content and automatically add I-frames when motion activity increases.
7. Conclusions
IPPP typically achieves a lower bitrate by heavily relying on inter-frame prediction, leveraging previously encoded frames to predict the current frame. Additionally, IPPP tends to have reduced decoding complexity compared to periodic-I and periodic-IDR structures. On the other hand, intra coded frames minimize error propagation from inter-frame prediction and improve the random-access capability of encoded video. However, more frequent I-frames also elevate the bitrate, potentially reducing overall compression efficiency. Additionally, increased decoding complexity, particularly in real-time applications or resource-constrained devices, accompanies frequent I-frames. Hence, it is essential to carefully assess and evaluate the coding configuration in order to select the most appropriate configuration for a specific case and achieve the desired coding performance.
Our results for sequences with low motion content and intermediate motion content show that the IPPP configuration consistently has lower bitrates than the Periodic-I and Periodic-IDR configurations. This indicates that IPPP is able to perform efficient compression while efficiently maintaining visual quality. In contrast, Periodic-I and Periodic-IDR configurations incurred additional bits, leading to higher bitrates or lower quality at a specific bitrate. Additionally, periodic-I shows a slight coding improvement over the periodic-IDR for these sequences.
The results for tested sequences with high motion content indicate that the IPPP configuration achieved slightly lower bitrates than Periodic-I and Periodic-IDR. Taking into consideration the advantages of including I-frames in error-prone environments and the random access they offer, it may be preferable to use the Periodic-I and Periodic-IDR coding configurations in such scenarios.
Our results show how complicated the trade-offs are between bitrate, visual quality, and encoding methods. These findings emphasize the importance of choosing a suitable encoding configuration according to the motion activity of the encoded video sequence. If the priority is to achieve lower bitrates with acceptable PSNR, configuration with the IPPP coding structure is preferred. However, for videos with high-motion content, it may be preferable to use the Periodic-I and Periodic-IDR coding configurations because of the advantages these configurations can offer in error-prone environments.
Future work will investigate the effects of losses when these videos are sent over IP networks and compare with these results. Also, building the proposed codec that can dynamically select the encoding configuration based on the motion content of the encoded sequence is another important area for work in the future.