MDPI - Publisher of Open Access Journals

16 pages, 5544 KB

Open AccessArticle

Visual Feature Domain Audio Coding for Anomaly Sound Detection Application

by Subin Byun and Jeongil Seo

Algorithms 2025, 18(10), 646; https://doi.org/10.3390/a18100646 - 15 Oct 2025

Viewed by 350

Conventional audio and video codecs are designed for human perception, often discarding subtle spectral cues that are essential for machine-based analysis. To overcome this limitation, we propose a machine-oriented compression framework that reinterprets spectrograms as visual objects and applies Feature Coding for Machines [...] Read more.

Conventional audio and video codecs are designed for human perception, often discarding subtle spectral cues that are essential for machine-based analysis. To overcome this limitation, we propose a machine-oriented compression framework that reinterprets spectrograms as visual objects and applies Feature Coding for Machines (FCM) to anomalous sound detection (ASD). In our approach, audio signals are transformed log-mel spectrograms, from which intermediate feature maps are extracted, compressed, and reconstructed through the FCM pipeline. For comparison, we implement AAC-LC (Advanced Audio Coding Low Complexity) as a representative perceptual audio codec and VVC (Versatile Video Coding) as spectrogram-based video codec. Experiments were conducted on the DCASE (Detection and Classification of Acoustic Scenes and Events) 2023 Task 2 dataset, covering four machine types (fan, valve, toycar, slider), with anomaly detection performed using the official Autoencoder baseline model released in DCASE 2024. Detection scores were computed from reconstruction error and Mahalanobis distance. The results show that the proposed FCM-based ACoM (Audio Coding for Machines) achieves comparable or superior performance to AAC at less than half the bitrate, reliably preserving critical features even under ultra-low bitrate conditions (1.3–6.3 kbps). While VVC retains competitive performance only at high bitrates, it degrades sharply at low bitrates. These findings demonstrate that feature-based compression offers a promising direction for next-generation ACoM standardization, enabling efficient and robust ASD in bandwidth-constrained industrial environments. Full article

(This article belongs to the Special Issue Visual Attributes in Computer Vision Applications)

► Show Figures

Figure 1

17 pages, 7292 KB

Open AccessArticle

QP-Adaptive Dual-Path Residual Integrated Frequency Transformer for Data-Driven In-Loop Filter in VVC

by Cheng-Hsuan Yeh, Chi-Ting Ni, Kuan-Yu Huang, Zheng-Wei Wu, Cheng-Pin Peng and Pei-Yin Chen

Sensors 2025, 25(13), 4234; https://doi.org/10.3390/s25134234 - 7 Jul 2025

Viewed by 654

Abstract

As AI-enabled embedded systems such as smart TVs and edge devices demand efficient video processing, Versatile Video Coding (VVC/H.266) becomes essential for bandwidth-constrained Multimedia Internet of Things (M-IoT) applications. However, its block-based coding often introduces compression artifacts. While CNN-based methods effectively reduce these [...] Read more.

As AI-enabled embedded systems such as smart TVs and edge devices demand efficient video processing, Versatile Video Coding (VVC/H.266) becomes essential for bandwidth-constrained Multimedia Internet of Things (M-IoT) applications. However, its block-based coding often introduces compression artifacts. While CNN-based methods effectively reduce these artifacts, maintaining robust performance across varying quantization parameters (QPs) remains challenging. Recent QP-adaptive designs like QA-Filter show promise but are still limited. This paper proposes DRIFT, a QP-adaptive in-loop filtering network for VVC. DRIFT combines a lightweight frequency fusion CNN (LFFCNN) for local enhancement and a Swin Transformer-based global skip connection for capturing long-range dependencies. LFFCNN leverages octave convolution and introduces a novel residual block (FFRB) that integrates multiscale extraction, QP adaptivity, frequency fusion, and spatial-channel attention. A QP estimator (QPE) is further introduced to mitigate double enhancement in inter-coded frames. Experimental results demonstrate that DRIFT achieves BD rate reductions of 6.56% (intra) and 4.83% (inter), with an up to 10.90% gain on the BasketballDrill sequence. Additionally, LFFCNN reduces the model size by 32% while slightly improving the coding performance over QA-Filter. Full article

(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)

► Show Figures

Figure 1

32 pages, 4311 KB

Open AccessArticle

DRGNet: Enhanced VVC Reconstructed Frames Using Dual-Path Residual Gating for High-Resolution Video

by Zezhen Gai, Tanni Das and Kiho Choi

Sensors 2025, 25(12), 3744; https://doi.org/10.3390/s25123744 - 15 Jun 2025

Viewed by 800

Abstract

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding [...] Read more.

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding technologies, such as Advanced Video Coding/H.264 (AVC), High Efficiency Video Coding/H.265 (HEVC), and Versatile Video Coding/H.266 (VVC), which significantly improves compression efficiency while maintaining high video quality. However, during the encoding process, compression artifacts and the loss of visual details remain unavoidable challenges, particularly in high-resolution video processing, where the massive amount of image data tends to introduce more artifacts and noise, ultimately affecting the user’s viewing experience. Therefore, effectively reducing artifacts, removing noise, and minimizing detail loss have become critical issues in enhancing video quality. To address these challenges, this paper proposes a post-processing method based on Convolutional Neural Network (CNN) that improves the quality of VVC-reconstructed frames through deep feature extraction and fusion. The proposed method is built upon a high-resolution dual-path residual gating system, which integrates deep features from different convolutional layers and introduces convolutional blocks equipped with gating mechanisms. By ingeniously combining gating operations with residual connections, the proposed approach ensures smooth gradient flow while enhancing feature selection capabilities. It selectively preserves critical information while effectively removing artifacts. Furthermore, the introduction of residual connections reinforces the retention of original details, achieving high-quality image restoration. Under the same bitrate conditions, the proposed method significantly improves the Peak Signal-to-Noise Ratio (PSNR) value, thereby optimizing video coding quality and providing users with a clearer and more detailed visual experience. Extensive experimental results demonstrate that the proposed method achieves outstanding performance across Random Access (RA), Low Delay B-frame (LDB), and All Intra (AI) configurations, achieving BD-Rate improvements of 6.1%, 7.36%, and 7.1% for the luma component, respectively, due to the remarkable PSNR enhancement. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

18 pages, 1845 KB

Open AccessArticle

Fast Intra-Prediction Mode Decision Algorithm for Versatile Video Coding Based on Gradient and Convolutional Neural Network

by Nana Li, Zhenyi Wang, Qiuwen Zhang, Lei He and Weizheng Zhang

Electronics 2025, 14(10), 2031; https://doi.org/10.3390/electronics14102031 - 16 May 2025

Viewed by 1122

Abstract

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To [...] Read more.

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To accommodate this, VVC increases the number of intra-prediction modes from 35 to 67, leading to a substantial rise in computational demands. This study presents a fast intra-prediction mode selection algorithm that combines gradient analysis and CNN. First, the Laplace operator is employed to estimate the texture direction of the current CU block, identifying the most probable prediction direction and skipping over half of the redundant candidate modes, thereby significantly reducing the number of mode searches. Second, to further minimize computational complexity, two efficient neural network models, MIP-NET and ISP-NET, are developed to determine whether to terminate the prediction process for Matrix Intra Prediction(MIP) and Intra Sub-Partitioning(ISP) modes early, avoiding unnecessary calculations. This approach maintains coding performance while significantly lowering the time complexity of intra-prediction mode selection. Experimental results demonstrate that the algorithm achieves a 35.04% reduction in encoding time with only a 0.69% increase in BD-BR, striking a balance between video quality and coding efficiency. Full article

► Show Figures

Figure 1

26 pages, 11273 KB

Open AccessArticle

DREFNet: Deep Residual Enhanced Feature GAN for VVC Compressed Video Quality Improvement

by Tanni Das and Kiho Choi

Mathematics 2025, 13(10), 1609; https://doi.org/10.3390/math13101609 - 14 May 2025

Viewed by 796

Abstract

In recent years, the use of video content has experienced exponential growth. The rapid growth of video content has led to an increased reliance on various video codecs for efficient compression and transmission. However, several challenges are associated with codecs such as H.265/High [...] Read more.

In recent years, the use of video content has experienced exponential growth. The rapid growth of video content has led to an increased reliance on various video codecs for efficient compression and transmission. However, several challenges are associated with codecs such as H.265/High Efficiency Video Coding and H.266/Versatile Video Coding (VVC) that can impact video quality and performance. One significant challenge is the trade-off between compression efficiency and visual quality. While advanced codecs can significantly reduce file sizes, they introduce artifacts such as blocking, blurring, and color distortion, particularly in high-motion scenes. Different compression tools in modern video codecs are vital for minimizing artifacts that arise during the encoding and decoding processes. While the advanced algorithms used by these modern codecs can effectively decrease file sizes and enhance compression efficiency, they frequently find it challenging to eliminate artifacts entirely. By utilizing advanced techniques such as post-processing after the initial decoding, this method can significantly improve visual clarity and restore details that may have been compromised during compression. In this paper, we introduce a Deep Residual Enhanced Feature Generative Adversarial Network as a post-processing method aimed at further improving the quality of reconstructed frames from the advanced codec VVC. By utilizing the benefits of Deep Residual Blocks and Enhanced Feature Blocks, the generator network aims to make the reconstructed frame as similar as possible to the original frame. The discriminator network, a crucial element of our proposed method, plays a vital role in guiding the generator by evaluating the authenticity of generated frames. By distinguishing between fake and original frames, the discriminator enables the generator to improve the quality of its output. This feedback mechanism ensures that the generator learns to create more realistic frames, ultimately enhancing the overall performance of the model. The proposed method shows significant gain for Random Access (RA) and All Intra (AI) configurations while improving Video Multimethod Assessment Fusion (VMAF) and Multi-Scale Structural Similarity Index Measure (MS-SSIM). Considering VMAF, our proposed method can obtain 13.05% and 11.09% Bjøntegaard Delta Rate (BD-Rate) gain for RA and AI configuration, respectively. In the case of the luma component MS-SSIM, RA and AI configurations get, respectively, 5.00% and 5.87% BD-Rate gain after employing our suggested proposed network. Full article

(This article belongs to the Special Issue Intelligent Computing with Applications in Computer Vision)

► Show Figures

Figure 1

17 pages, 2527 KB

Open AccessArticle

Three-Stage Multi-Frame Multi-Channel In-Loop Filter of VVC

by Si Li, Honggang Qi, Yundong Zhang and Guoqin Cui

Electronics 2025, 14(5), 1033; https://doi.org/10.3390/electronics14051033 - 5 Mar 2025

Viewed by 1201

Abstract

For the Versatile Video Coding (VVC) standard, extensive research has been conducted on in-loop filtering to improve encoding efficiency. However, most methods use only spatial characteristics without exploiting the content correlation across multiple frames or fully utilizing the inter-channel relational information. In this [...] Read more.

For the Versatile Video Coding (VVC) standard, extensive research has been conducted on in-loop filtering to improve encoding efficiency. However, most methods use only spatial characteristics without exploiting the content correlation across multiple frames or fully utilizing the inter-channel relational information. In this paper, we introduce a novel three-stage Multi-frame Multi-channel In-loop Filtering (3-MMIF) method for VVC that improves the quality of each encoded frame by harnessing the correlations between adjacent frames and channels. Firstly, we establish a comprehensive database containing pairs of encoded and original frames across various scenes. Then, we select the nearest frames in the decode buffer as the reference frames for enhancing the quality of the current frame. Subsequently, we propose a three-stage in-loop filtering method that leverages spatio-temporal and inter-channel correlations. The three-stage method is grounded in the recently developed Residual Dense Network, benefiting from its enhanced generalization ability and feature reuse mechanism. Experimental results demonstrate that our 3-MMIF method, with the encoder’s standard filter tools activated, achieves 2.78%/4.87%/5.13% Bjøntegaard delta bit-rate (BD-Rate) reductions for the Y, U, and V channels over the VVC 17.0 codec for random access configuration on the standard test set, outperforming other VVC in-loop filter methods. Full article

► Show Figures

Figure 1

17 pages, 5365 KB

Open AccessArticle

FR-IBC: Flipping and Rotation Intra Block Copy for Versatile Video Coding

by Heeji Han, Daehyeok Gwon, Jeongil Seo and Haechul Choi

Electronics 2025, 14(2), 221; https://doi.org/10.3390/electronics14020221 - 7 Jan 2025

Viewed by 951

Abstract

Screen content has become increasingly important in multimedia applications owing to the growth of remote desktops, Wi-Fi displays, and cloud computing. However, these applications generate large amounts of data, and their limited bandwidth necessitates efficient video coding. While existing video coding standards have [...] Read more.

Screen content has become increasingly important in multimedia applications owing to the growth of remote desktops, Wi-Fi displays, and cloud computing. However, these applications generate large amounts of data, and their limited bandwidth necessitates efficient video coding. While existing video coding standards have been optimized for natural videos originally captured by cameras, screen content has unique characteristics such as large homogeneous areas and repeated patterns. In this paper, we propose an enhanced intra block copy (IBC) method for screen content coding (SCC) in versatile video coding (VVC) named flipping and rotation intra block copy (FR-IBC). The proposed method improves the prediction accuracy by using flipped and rotated versions of the reference blocks as additional references. To reduce the computational complexity, hash maps of these blocks are constructed on a 4 × 4 block size basis. Moreover, we modify the block vectors and block vector predictor candidates of IBC merge and IBC advanced motion vector prediction to indicate the locations within the available reference area at all times. The experimental results show that our FR-IBC method outperforms existing SCC tools in VVC. Bjøntegaard-Delta rate gains of 0.66% and 2.30% were achieved under the All Intra and Random Access conditions for Class F, respectively, while corresponding values of 0.40% and 2.46% were achieved for Class SCC, respectively. Full article

(This article belongs to the Section Circuit and Signal Processing)

► Show Figures

Figure 1

16 pages, 433 KB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 1130

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

26 pages, 1960 KB

Open AccessArticle

Fast CU Partition Decision Algorithm Based on Bayesian and Texture Features

by Erlin Tian, Yifan Yang and Qiuwen Zhang

Electronics 2024, 13(20), 4082; https://doi.org/10.3390/electronics13204082 - 17 Oct 2024

Viewed by 1138

Abstract

As internet speeds increase and user demands for video quality grow, video coding standards continue to evolve. H.266/Versatile Video Coding (VVC), as the new generation of video coding standards, further improves compression efficiency but also brings higher computational complexity. Despite the significant advancements [...] Read more.

As internet speeds increase and user demands for video quality grow, video coding standards continue to evolve. H.266/Versatile Video Coding (VVC), as the new generation of video coding standards, further improves compression efficiency but also brings higher computational complexity. Despite the significant advancements VVC has made in compression ratio and video quality, the introduction of new coding techniques and complex coding unit (CU) partitioning methods have also led to increased encoding complexity. This complexity not only extends encoding time but also increases hardware resource consumption, limiting the application of VVC in real-time video processing and low-power devices.To alleviate the encoding complexity of VVC, this paper puts forward a Bayesian and texture-feature-based fast splitting algorithm for coding intraframe bloc of VVC, which aims to reduce unnecessary computational steps, enhance encoding efficiency, and maintain video quality as much as possible. In the stage of rapid coding, the video frames are coded by the original VVC test model (VTM), and Joint Rough Mode Decision (JRMD) evaluation cost is used to update the parameter in the Bayesian algorithm to come and set the two thresholds to judge whether the current coding block continues to be split or not. Then, for coding blocks larger than those satisfying the above threshold conditions, the predominant direction of the texture within the coding block is ascertained by calculating the standard deviations along both the horizontal and vertical axes so as to skip some unnecessary splits in the current coding block patterns. The findings from our experiments demonstrate that our proposed approach improves the encoding rate by 1.40% on average, and the execution time of the encoder has been reduced by 49.50%. The overall algorithm has optimized the VVC intraframe coding technology and reduced the coding complexity of VVC. Full article

► Show Figures

Figure 1

24 pages, 6380 KB

Open AccessArticle

Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

by Woowoen Gwun, Kiho Choi and Gwang Hoon Park

Mathematics 2024, 12(18), 2874; https://doi.org/10.3390/math12182874 - 15 Sep 2024

Cited by 2 | Viewed by 2035

Abstract

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating [...] Read more.

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. Full article

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 7340 KB

Open AccessArticle

Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement

by Tanni Das, Xilong Liang and Kiho Choi

Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024

Cited by 3 | Viewed by 3004

Abstract

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such [...] Read more.

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

19 pages, 1012 KB

Open AccessArticle

Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm

by Wenjun Song, Congxian Li and Qiuwen Zhang

Electronics 2024, 13(17), 3465; https://doi.org/10.3390/electronics13173465 - 31 Aug 2024

Cited by 1 | Viewed by 1236

Abstract

H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to [...] Read more.

H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to H.265, accommodating a greater variety of texture patterns. However, the changes in the related structures have also led to a significant increase in encoding complexity. To address the issue of excessive computational complexity, this paper proposes a targeted rapid Coding Units segmenting approach combined with decision-making for an intra-frame modes algorithm. In the first phase of the algorithm, we extract different features for CU blocks of various sizes and input them into the decision tree model’s classifier for classification processing, determining the CU partitioning mode to prematurely terminate the partitioning, thereby reducing the encoding complexity to some extent. In the second phase of the algorithm, we put forward an intra-frame mode decision strategy grounded in gradient descent techniques with a bidirectional search mode. This maximizes the approach to the global optimum, thereby obtaining the optimal intra-frame mode and further reducing the encoding complexity. Experimentation has demonstrated that the algorithm achieves a 54.53% reduction in encoding time. In comparison, the BD-BR (Bitrate-Distortion Rate) only increases by 1.38%, striking an optimal balance between the fidelity of video and the efficacy of the encoding process. Full article

(This article belongs to the Special Issue Image and Video Processing and Retrieval Based on Machine Learning and Deep Learning)

► Show Figures

Figure 1

12 pages, 3185 KB

Open AccessArticle

Intra-Mode Decision Based on Lagrange Optimization Regarding Chroma Coding

by Wei Li and Caixia Fan

Appl. Sci. 2024, 14(15), 6480; https://doi.org/10.3390/app14156480 - 25 Jul 2024

Cited by 2 | Viewed by 1143

Abstract

The latest generation of standard versatile video coding (VVC) continues to utilize hybrid coding architecture to further promote compression performance, where the intra-mode decision module selects the optimal mode to balance bitrate and coding distortion. With regard to chroma intra modes, a scheme [...] Read more.

The latest generation of standard versatile video coding (VVC) continues to utilize hybrid coding architecture to further promote compression performance, where the intra-mode decision module selects the optimal mode to balance bitrate and coding distortion. With regard to chroma intra modes, a scheme that uses a cross-component linear model (CCLM) is involved by utilizing the component correlation between luma and chroma, which could implicitly introduce distortion propagation from luma blocks to subsequent chroma prediction blocks during coding, impacting the result of a Lagrange optimization. This paper presents an improved intra-mode decision-based modified Lagrange multiplier for chroma components in VVC. The characteristics of chroma intra prediction are examined in depth, and the process of an intra-mode decision is analyzed in detail; then, the coding distortion dependency between the luma and chroma is described and incorporated into a Lagrange optimization framework to determine the optimal mode. The proposed method achieves an average bitrate-saving effect of 1.23% compared with the original scheme by using a dependent rate-distortion optimization in an All-Intra configuration. Full article

► Show Figures

Figure 1

15 pages, 4633 KB

Open AccessArticle

Faster Intra-Prediction of Versatile Video Coding Using a Concatenate-Designed CNN via DCT Coefficients

by Sio-Kei Im and Ka-Hou Chan

Electronics 2024, 13(11), 2214; https://doi.org/10.3390/electronics13112214 - 6 Jun 2024

Cited by 2 | Viewed by 1681

Abstract

As the next generation video coding standard, Versatile Video Coding (VVC) significantly improves coding efficiency over the current High-Efficiency Video Coding (HEVC) standard. In practice, this improvement comes at the cost of increased pre-processing complexity. This increased complexity faces the challenge of implementing [...] Read more.

As the next generation video coding standard, Versatile Video Coding (VVC) significantly improves coding efficiency over the current High-Efficiency Video Coding (HEVC) standard. In practice, this improvement comes at the cost of increased pre-processing complexity. This increased complexity faces the challenge of implementing VVC for time-consuming encoding. This work presents a technique to simplify VVC intra-prediction using Discrete Cosine Transform (DCT) feature analysis and a concatenate-designed CNN. The coefficients of the (DTC-)transformed CUs reflect the complexity of the original texture, and the proposed CNN employs multiple classifiers to predict whether they should be split. This approach can determine whether to split Coding Units (CUs) of different sizes according to the Versatile Video Coding (VVC) standard. This helps to simplify the intra-prediction process. The experimental results indicate that our approach can reduce the encoding time by 52.77% with a minimal increase of 1.48%. We use the Bjøntegaard Delta Bit Rate (BDBR) compared to the original algorithm, demonstrating a competitive result with other state-of-the-art methods in terms of coding efficiency with video quality. Full article

(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)

► Show Figures

Figure 1

25 pages, 940 KB

Open AccessArticle

Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications

by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo

Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024

Cited by 10 | Viewed by 2750

Abstract

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

Search Results (69)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (69)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI