Next Article in Journal
Application of Genetic Algorithms for Strejc Model Parameter Tuning
Next Article in Special Issue
Inteval Spatio-Temporal Constraints and Pixel-Spatial Hierarchy Region Proposals for Abrupt Motion Tracking
Previous Article in Journal
Vehicle-UAV Integrated Routing Optimization Problem for Emergency Delivery of Medical Supplies
Previous Article in Special Issue
Radar Perception of Multi-Object Collision Risk Neural Domains during Autonomous Driving
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles

Division of Signal Processing and Electronic Systems, Institute of Automatic Control and Robotics, Poznan University of Technology, Jana Pawła 24, 60-965 Poznań, Poland
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(18), 3651; https://doi.org/10.3390/electronics13183651
Submission received: 6 August 2024 / Revised: 7 September 2024 / Accepted: 10 September 2024 / Published: 13 September 2024
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)

Abstract

:
In this paper, we introduce an efficient lossy coding procedure specifically tailored for handling video sequences of automotive high-dynamic range (HDR) image sensors in advanced driver-assistance systems (ADASs) for autonomous vehicles. Nowadays, mainly for security reasons, lossless compression is used in the automotive industry. However, it offers very low compression rates. To obtain higher compression rates, we suggest using lossy codecs, especially when testing image processing algorithms in software in-the-loop (SiL) or hardware-in-the-loop (HiL) conditions. Our approach leverages the high-quality VP9 codec, operating in two distinct modes: grayscale image compression for automatic image analysis and color (in RGB format) image compression for manual analysis. In both modes, images are acquired from the automotive-specific RCCC (red, clear, clear, clear) image sensor. The codec is designed to achieve a controlled image quality and state-of-the-art compression ratios while maintaining real-time feasibility. In automotive applications, the inherent data loss poses challenges associated with lossy codecs, particularly in rapidly changing scenes with intricate details. To address this, we propose configuring the lossy codecs in variable bitrate (VBR) mode with a constrained quality (CQ) parameter. By adjusting the quantization parameter, users can tailor the codec behavior to their specific application requirements. In this context, a detailed analysis of the quality of lossy compressed images in terms of the structural similarity index metric (SSIM) and the peak signal-to-noise ratio (PSNR) metrics is presented. With this analysis, we extracted some codec parameters, which have an important impact on preservation of video quality and compression ratio. The proposed compression settings are very efficient: the compression ratios vary from 51 to 7765 for grayscale image mode and from 4.51 to 602.6 for RGB image mode, depending on the specified output image quality settings. We reached 129 frames per second (fps) for compression and 315 fps for decompression in grayscale mode and 102 fps for compression and 121 fps for decompression in the RGB mode. These make it possible to achieve a much higher compression ratio compared to lossless compression while maintaining control over image quality.

1. Introduction

Due to the growing number of sensors exploited in advanced driver-assistance systems (ADASs) and autonomous driving systems (ADSs), the number of data being collected, processed, and stored is increasingly rapidly. Various types of sensors are used to obtain information about the environment: radar, laser (e.g., LIDAR—light detection and ranging), acoustic (e.g., ultrasonic sensors), and, above all, optical (cameras) [1,2]. The latter are used most often [3,4,5,6,7,8]. Cameras can acquire images in visible light or in infrared, or they can act as thermal detectors [4]. This article primarily considers images from visible light cameras, but the ideas and concepts contained herein can also be applied to a broader area.
In automotive applications, video sensors do not need to have very high resolution (high number of megapixels), but they must provide a high dynamic range (HDR) and the least amount of noise. This outcome is attained through the specialized design of the image sensor, as detailed in Section 2. The design ensures the acquisition of high-quality images. Without low-noise HDR, the required high image quality cannot be guaranteed in various lighting conditions, e.g., in very low light at night or full sunlight or in the dark with strong reflections [7].
In autonomous vehicles and ADASs, vision data can be processed and collected in various ways. Below, we list the most important of them:
  • All data are processed in real time in the vehicle. In this solution, data can be processed all the time or only after being triggered by other sensors, e.g., during an accident (overload detected by accelerometers) or proximity sensors. Analogously to processing, data can be also collected continuously or incidentally.
  • All data (raw or pre-processed) are sent to cloud systems for further processing. In this case, the data are usually not collected in the vehicle.
  • Some of the data are processed in the vehicle, and some are sent to cloud systems. This is a hybrid solution and allows for a reduction in the complexity and costs of systems installed in the vehicle, while, at the same time offering greater vehicle autonomy, e.g., in the case of loss of connection between the vehicle and remote systems.
  • Driving data are only collected in a vehicle. They are mainly used to conduct research on improving control algorithms, e.g., for autonomous vehicles. This case is used for the laboratory testing of image processing algorithms in software-in-the-loop (SiL) or hardware-in-the-loop (HiL) conditions. In off-vehicle processing, some processing steps can be performed or supervised by humans. Although artificial intelligence-based solutions are becoming more common, in critical areas such as road safety, one must be careful [9]. Some analyses are still best performed by humans.
The data streams from the cameras are very large; moreover, a modern car is equipped with even several cameras, so the amount of data grows quickly. Even a single HDR (12 bpp, bit per pixel), HD (4K, 4096’2160 pixels) 30 fps (frames per second) produces ca. 3.2 Gb/s. Consequently, in all of the above cases, there is a very large amount of data to be processed, stored or transmitted. This is associated with the increase in the complexity of data processing and transmission systems, disk capacity, and the consumption of resources such as electricity. Given the current state of technology, not all solutions are fully implementable (e.g., wireless transmission at a speed of Gb/s), and some of them will certainly generate costs that are unacceptable to the consumer [3].
This is why we want to reduce as much data as possible. Nowadays, mainly for security reasons, lossless compression is used in the automotive industry. Lossless compression ensures that no data will be changed or deleted, nor will artifacts be introduced resulting from imperfections in the compression algorithms. However, it offers very low compression rates [10].
In our previous work, we have already tested lossless compression for the automotive area. In [6], we tested 12 lossless codecs based on automotive recordings (motorway’s recording). The video data were produced by stereovision color cameras with 1024’768 resolution and 20 fps and color, i.e., red, green, blue (RGB) components without Bayer filter interpolation. The obtained compression ratios ranged from 2.0 to 3.5. In another work [10], we showed that the compression ratio obtained by one of the best lossless compression codecs (FFV1), after optimization for recordings in one of the color formats specific for automotive, ranges from 1.5 to 2.2. Such low values are unfortunately typical of lossless image compression and are mainly caused by two phenomena: images are noisy (especially in difficult lighting conditions) and, mainly in automotive conditions, change rapidly (the vehicle is usually in motion). Therefore, the data have relatively low entropy, resulting in a low lossless compression ratio.
It should be noted that in automotive applications, compression must be performed mainly in real time, with limited computational resources, which does not allow the use of complex compression algorithms [3,11]. Built-in automotive hardware codecs in cameras prefer to preserve quality rather than achieve high compression ratios [12]. In fact, it is easier to achieve a large throughput in this mode.
In the target application of compressed video material, i.e., to support innovative technologies in the field of active safety, which are used in ADAS and autonomous driving systems, the dominant (if not the only) image analysis technique is automatic processing [1,4]. The lossy codecs developed so far, intended mainly for the recipient, a human, are optimized to obtain the highest possible, but subjectively assessed, quality. The “lossiness” of these codecs is very often understood as “perceived lossiness”, so changes in the image that are not noticed by a human are not taken into account [13]. This is particularly important in details, rapidly changing scenes, and the movement of objects at high speed, which a human is unable to see. In automotive systems, these errors, although invisible to humans, may prove critical. When measuring image quality, it is therefore important to examine it in terms of all content, not just that perceived by humans.
When compressing an image for further automatic processing and the transmission of storage, it is therefore important to properly configure the image codec. When measuring the image quality after compression and decompression, we should examine the results for all content, not just that perceived by humans. In the literature, this issue is not widely analyzed. Below, we present selected works on this subject.
The authors of [11] present an overview of applications in which the automatic analysis of compressed images is performed (with particular emphasis on H.264 and H.265 codecs). In the paper [14], a method of increasing the compression ratio through a bit allocation and rate control strategy while maintaining a constant automatic object detection rate was presented.
In paper [3], authors compared video codecs for real-time video transmission inside a car and defined metrics to evaluate the performance of algorithms in wired and wireless networks. Video streams in ADASs have strict latency and quality requirements for the underlying network system, so they affect the hardware implementations of IP cameras with hardware-based video codecs.
The video codec requirements for ADASs differ significantly from the typical consumer video codec offerings currently available. These differences are illustrated through ten detailed issues on the H.264 codec, as presented in the study by [13]. An interesting solution is presented in [8]: a two-stage H.264 based video compression framework, dedicated to the compression of automotive context videos with two different values of compression ratios: one in the region of interest and the second in the region out of interest. In [15], the authors present a new approach that augments existing codecs with a small, content-adaptive super-resolution model that significantly boosts video quality. Some ideas of adaptive fuzzy video compression control for ADAS and transmission inside the vehicle are presented in [5]. In [16], the influence of JPEG compression on the effectiveness of automatic object detection by artificial neural networks and SVM (support vector machine) classifiers was discussed. The authors noticed that when the compression ratio increases, the precision decreases relatively more slowly.
This all means that lossy compression can be considered for automatic detection applications. Our detailed contribution described in this paper is as follows:
  • We propose original procedures of lossy compression of automotive-type video content with efficient real-time speed multithreaded processing and a high compression ratio.
  • We propose appropriate settings for the lossy codec to achieve the proper, constant quality in various cases, which is required in automotive applications for safety.
  • We prove the possibility of using lossy compression for automatic ADAS image analyses through the detailed image quality verification.
The remaining sections of the paper are structured as follows: The next section provides an overview of image sensors, filters, and transformations utilized in the automotive domain. Section 3 delves into the analysis of lossy compression and codecs. Section 4 details the proposed methodologies for applying lossy codecs to data in one of the automotive-specific image format. In Section 5, we present the experimental results obtained with the proposed codecs using a representative ADAS dataset and perform a comprehensive quality assessment of the lossy-compressed images. The paper concludes with a summary of key findings in the final section.

2. Image Sensors, Filters, and Transformations Used in an Automotive Area

As mentioned in the introduction, the technical requirements for image sensors built into cameras for the automotive industry are very high. They concern, as in most other applications, the high quality of recorded images.

2.1. High Quality Image Sensors

The most important parameters of the image sensor include the number of pixels, which, in combination with the lens optics, determines the spatial resolution of the image sensor, the size of a single pixel, and its quantum efficiency, which are related to the minimum level of lighting that can be recorded (it also must be above the dark noise level of the sensor), and the maximum level of lighting that can be converted into an electrical signal without saturating the sensor. In practice, these parameters translate into minimum sensitivity in very low light conditions, e.g., at night; in the maximum possible lighting e.g., on sunny days; or in the dark with strong light reflections.
The collected photons are converted into electrons, which, collected in a well, create electric charge in CMOS sensors (currently the most popular ones). This charge is then converted into a digital form. If the digital word length is greater than eight bits, we are talking about a high-dynamic-range (HDR) image.
The HDR images are produced using different techniques including multiple exposures, split pixel technology supported by tone mapping, and automatic HDR image processing, e.g., inside the imaging chip.
In all cases where there is rapid movement (e.g., in automotive cases), it is best if the HDR image is recorded using a single acquisition time (single exposure). Cheaper solutions to obtain a HDR image use typical (i.e., standard dynamic range (SDR)) sensor and two (or more) exposure times. Longer exposure times are intended to properly capture dark areas, while shorter exposure times are intended to capture bright areas. Then, using the appropriate hardware, the streams are sent in parallel or combined into one.
A broader description of automotive image sensors is presented in [10].

2.2. Color Filter Arrays

A typical image sensor is manufactured to exhibit the maximum and most flat sensitivity curve over the desired range of electromagnetic wavelengths, e.g., for visible light or infrared. The image from such a transducer will be monochromatic and will be the sum of the color components within the sensitivity range. In order to obtain a color image, selected sensors must be sensitized to the appropriate wavelength (color) using color optical filters. This brings us to the concept of a color filter array.
Image sensors used in the automotive area differ from typical cameras not only in their dynamic range, but also in their color mosaic (also called the optical color filter array (CFA)).
The most popular CFA is the so-called Bayer mask, consisting of an R filter, a B filter and two G filters (RGGB, Figure 1f). This system allows for the simplest and most accurate color reproduction and at the same time is matched to the highest sensitivity of human vision in the green color range. For the most commonly used color image format, RGB, these components are present for all pixels, so for each pixel, the remaining two components must be interpolated. CFA Bayer allows simple interpolation from adjacent two or four sensors. The biggest disadvantage of this CFA is its low sensitivity (all filters are relatively dark).
Since the color reproduction quality is not crucial in automatic vision systems, other CFAs are used. Here, the selection criterion is most often maximizing sensitivity, i.e., the brightest possible optical filters. We therefore reach the other extreme—no filters. This solution, i.e., CCCC (clear, clear, clear, clear) CFA, unfortunately, will not allow us to distinguish colors.
The red color is very important in road traffic control, e.g., prohibition road signs, stop lights at intersections or lit when vehicles brake are red. Therefore, the next, RCCC (red, clear, clear, clear) CFA is very popular in the automotive area. (Figure 1b) [10,17,18,19]. RCCC is the lightest type of colored CFA (3/4 of sensors have no filters); it also allows for the quick and easy transition to the monochrome format. Only 1/4 pixels (instead of red ones) need to be calculated by interpolation. The disadvantage of this CFA is that it is not possible to convert RCCC exactly to RGB format, since it has no blue filter. However, there are known solutions that use mathematical transformations to convert an image from RCCC format to RGB-like format [18].
The remaining CFAs (Figure 1c–e) are less popular in ADASs and at the cost of reduced brightness of optical filters offer the possibility of reproducing a wider range of colors (including RGB). However, it is worth paying attention to one of the newest CFAs: RYYc (Figure 1e). This CFA was introduced to distinguish between white and yellow road lines and to maintain the good performance in low light [12].
Taking into account the above properties, we took RCCC CFA for further research presented in this article.
RCCC-format data are more difficult to directly compress than typical video data for the following reasons:
  • Compressing an RCCC image as a monochrome image will not give good results because the R component is spatially subsampled (consecutive R values do not occur in the immediate vicinity) and the horizontal neighborhood of the C components is closer than the vertical neighborhood. This reduces the accuracy of prediction based on the neighboring pixels used in compression, so prediction using typical prediction masks is less accurate.
  • Compressing the RCCC format mapped directly to RGB (with three components) will also not give good results. The color components are compressed separately, and here, the similarity of neighboring C components will not be taken into account, which will significantly reduce the compression ratio [10].
In this work, we therefore decided to convert the RCCC format into two formats, monochrome and color (RGB), and further compress them like typical images.

2.3. Transformations of Color Components

It should also be noted that in the case of RGB color image compression, the RGB components are not directly compressed, but three components are transformed into three other components: typically, one representing brightness in monochromatic mode (so-called luminance, Y) and two chroma components, i.e., Cb and Cr (blue-difference and red-difference chroma components). Together, they form technical color space YCbCr.
The transformation is reversible and offers three important, practical advantages. The first is the ability to obtain a monochrome image directly from the luminance component. The second is the fact that the luminance and the two color components are less correlated with each other than the RGB components, so the obtained compression ratio of e.g., YCbCr will be higher than for RGB. The third is the fact that the sensitivity of human vision is lower for color components than for the brightness component. Therefore, in practical solutions, the spatial subsampling of color components is performed, e.g., in 4:2:2 or 4:2:0 format. Additionally, in color spaces other than RGB (e.g., YCbCr, HSV (hue, saturation, value)), it is easier to perform signal processing and analysis [20,21]. The description of these techniques is widely known in the image processing area and will not be presented in detail in this article [20,22,23].

3. Lossy Compression

Lossy compression techniques aim to reduce the file size of audio-visual data by discarding certain data elements while ensuring that the perceived loss of quality to the recipient, typically the human eye, remains minimal.
The lossy compression process involves various techniques such as special transforms like discrete cosine transform (DCT), quantization, motion prediction, and entropy coding. The increasing demands for higher resolutions, better image representation (e.g., HDR), and 3D imagery necessitate advanced algorithms to effectively process and compress video signals.

3.1. Bitrate Modes

In lossy compression, there is always a trade-off between the quality of the output and the size of the compression ratio. Two primary modes are available: constant bitrate (CBR), which maintains a consistent output stream but variable quality, and variable bitrate (VBR), which maintains consistent quality but variable output stream size [24].
The quantization parameter (QP) controls the degree of compression for each macroblock in a video frame. High QP values indicate rough quantization, leading to greater compression and lower image quality. Setting the codec to a constant QP value (constant quality, Q) results in an absolutely constant quality of the output stream, regardless of video content variability (Figure 2-Q). However, this setup can lead to highly variable bitrates based on scene complexity, making it inefficient for encoding the input video signal. Consequently, the Q mode is not recommended except for precise laboratories.
A more effective approach is constrained quality (CQ), which allows setting a maximum target quality level at limited output bitrate. In consequence, a perceptual quality level is constant as long as the bitrate is below a specified upper bound [25]. The actual quality achieved may vary slightly with image changes, but the fluctuations in stream size are less pronounced than in Q mode (Figure 2-CQ). In CBR mode, the codec attempts to stabilize the stream size without significantly exceeding the preset maximum values. However, this comes at the cost of quality reductions during high variability or scenes with many details (Figure 2-CBR). This mode is frequently employed for real-time video transmission over networks [26].
In VBR mode, both the stream size and the quality are permitted to fluctuate. Typically, the stream size greatly varies depending on the image content (Figure 2-VBR).
In summary, for ADAS, the Q mode may offer the best quality assurance but can result in substantial stream fluctuations. The CQ mode, permitting minor quality variations, would be much more stable in terms of data output. Reduced stream variability undoubtedly mitigates technical challenges, such as transfer speed requirements.

3.2. Codecs

Various codecs (coders and decoders together, in short) are designed to handle different types of audio-visual content. Their popular representatives are listed in Table 1 and Table 2.
H.264/AVC (Advanced Video Coding) and H.265/HEVC (High-Efficiency Video Coding) were developed by International Telecommunication Union—Telecommunication Standardization Sector (ITU-T) and the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) [27]. Key innovations of H.264 include variable block sizes for motion estimation and compensation, integer-based DCT for consistency between the encoder and decoder, and advanced entropy coding methods like context-adaptive variable-length coding (CAVLC) and context-adaptive binary arithmetic coding (CABAC). The block-based approach allows for adaptive segmentation of the video frame, improving compression efficiency by better handling areas with different levels of detail [28].
H.265 retains the fundamental structure of H.264 but introduces several enhancements such as larger block sizes up to 64 × 64 pixels, more directional modes for intra prediction, and improved support for HDR content. These improvements result in significantly better compression efficiency but at the cost of increased computational complexity [24].
VP9 is an open video codec designed and developed by the WebM Project as a successor to VP8, with a focus on high compression efficiency. VP9 employs a block-based hybrid coding scheme similar to HEVC, with block sizes up to 64 × 64 pixels. It includes various prediction modes (DC, template matching (TM), angular) and utilizes transforms like discrete cosine transform (DCT), asymmetric discrete sine transform (ADST), and Walsh–Hadamard transform (WHT) for efficient encoding. VP9’s entropy coding updates probability models in the frame header, enabling better adaptation to different content characteristics [29].
Versatile Video Coding (VVC), alternatively referred to as H.266 or MPEG-I Part 3, represents the latest advancement in video codec technology emanating from the MPEG family. As the successor to HEVC, VVC was conceived in 2020 through the collaborative efforts of the Joint Video Experts Team (JVET) [30].
AOMedia Video 1 (AV1) is a more recent codec developed by the Alliance for Open Media, aimed at providing superior compression performance and royalty-free licensing. AV1 improves upon VP9 with more advanced techniques for motion compensation, prediction, and transform coding, making it highly efficient for modern high-resolution video formats [31].
Recently, deep-learning-based codecs have emerged as a promising approach for video compression. These methods leverage the power of deep neural networks (DNNs) to learn compact representations of video data. Two primary categories exist: End-to-End Compression Frameworks and Adaptive Image Compression.
End-to-End Compression Frameworks, such as the Deep Video Compression (DVC) framework, utilize convolutional neural networks (CNNs) to encode and decode video frames, optimizing for compression efficiency through learned representations [32].
Adaptive Image Compression techniques, e.g., real-time Adaptive Image Compression, use DNNs to dynamically adjust the compression parameters based on the content of the video, offering improved compression rates and better visual quality [33].
These neural-network-based methods have shown significant promise, but they often require substantial computational resources, making them less practical for real-time applications compared to traditional codecs.
In our previous paper [6], we collected and analyzed comparative surveys and additionally performed our extra experiments using videos from the automotive area in order to compare three state-of-the-art lossy codecs: AV1, VP9, and H.265/HEVC. The analysis revealed that the AV1 codec delivers the highest quality among all tested sequences, but it suffers from extremely low performance, making it impractical for real-time applications [34]. The H.265 codec, especially in high-quality presets, outperformed VP9 in some sequences but generally required more time to process, which is a significant drawback for real-time usage. The VP9 codec presented slightly lower quality, providing a balanced compromise between quality and speed.
Our additional experiments highlighted the performance and compression capabilities of H.264, H.265, and VP9 codecs using a controlled testing environment. The findings indicated that although H.264 and VP9 offer similar compression speeds, VP9 excels in image quality. Moreover, VP9 was found to be more efficient than H.265 in terms of CPU utilization and overall processing speed. Various settings for each codec, such as quantization parameters and performance presets, demonstrated that faster processing resulted in lower compression ratios. For example, the “placebo” preset, which provides minimal picture distortion, significantly increased compression time for H.265. In contrast, VP9’s “-row-mt 1” setting allowed speed optimization without substantial quality loss.
All these characteristics of tested codec suggest that VP9 offers the best balance between efficiency, quality, and speed, making it the best candidate for real-time applications like ADAS.
Moreover, the VP9 is only a baseline, and other codes could also be used.

3.3. Group of Pictures (GOP)

In video encoding, a Group of Pictures (GOP) is a sequence of frames with a specific order within the encoded video stream. Within a GOP, frames can be intra-coded (known as I-frames) or inter-predicted (P-frames for predictive coded pictures and B-frames for bi-predictive coded pictures). A GOP always starts with an I-frame, followed by P-frames or combinations of P- and B-frames. The appearance of the next I-frame indicates the beginning of a new GOP. Each encoded video stream consists of consecutive GOPs [35].
The selection of the length and structure of a GOP is crucial in video compression. Typically, I-frames achieve the lowest compression ratio, P-frames a higher compression ratio, and B-frames the highest compression ratio. Therefore, the longer the GOP that allows effective P- and B-frame prediction, the higher the average compression ratio achieved. However, long GOPs have significant drawbacks. Since the only reference frame is the initial I-frame, subsequent frames are decoded based on it, making it impossible to start decoding “in the middle” of a GOP. While this is not significant during continuous playback, it is a clear limitation when playback is initiated from a selected point or during rewinding or repeating segments (such as in image analysis) [24]. For example, with a video stream speed of 50 fps and a GOP length of 100 frames, playback can only start at points 2 s apart.
Another drawback is that long GOPs are more susceptible to errors. If an error occurs during encoding, transmission, or in the file containing the video stream, error propagation continues until the next I-frame.
Excessive GOP length can also cause the predicted frames’ information to become so dissimilar (in cases of sudden scene changes, such as changes in lighting when entering or exiting a tunnel in automotive applications) that the compression ratio drastically decreases [35].
These errors and their propagation within the GOP can be noticeable, especially at very high compression ratios, where artifacts resulting from encoding errors become apparent. These artifacts significantly expand until the beginning of the next GOP, where they abruptly disappear. In systems sensitive to image quality, such as active security algorithms, this is a significant issue as it may cause errors in image analysis, including the detection of “moving pseudo-objects”, which are merely emerging and disappearing image distortions.
Some codecs, such as Motion JPEG (M-JPEG), only use intra-frame coding, generating only I-frames (in this case, the GOP length is 1). Codecs of this type are relatively simple, achieve low compression ratios, and eliminate error propagation.
Advanced video codecs, to improve compression ratios, use significantly longer GOPs [35]. Additionally, in adaptive codecs, the form or length of the GOP can vary. For instance, in the case of a scene change in the video sequence, inter-prediction becomes highly inaccurate, making it more efficient to end the current GOP and start a new one. However, such behavior requires implementing additional algorithms for scene change detection and transmitting information about the GOP modifications. In this project, addressing these issues is unnecessary, as vehicle-installed cameras record very long shots, lasting from the vehicle starting to turning off.
In conclusion, the selection of GOP length depends primarily on the compression goal (determining the importance of compression ratio, artifacts, and error proneness) and should be conducted as precisely as possible.

4. Lossy Codec Procedures for RCCC Format

While the lossless codec, presented in [4], works in one RCCC mode only, the lossy video codec of RCCC format, presented in this paper, is designed to work in two modes. In one mode, it produces and then compresses grayscale images. In the second mode, the codec compresses RGB images. In both modes, the images to compress are obtained from RCCC source images.
The grayscale image mode is intended for automatic image analysis performed by algorithms that allow some loss of video quality, e.g., deep neural networks (DNNs). Notice that many image processing algorithms operate on grayscale images, so by removing the information about color, further improvement of compression ratio is possible.
The lossy compression is preceded by reducing the pixels’ depth to eight bits. This is due to the fact that the automotive camera from which the video sequences in RCCC format were acquired produced pixels in a 12-bit HDR format, and for most analyses, the HDR image is not required, so the pixel representation was reduced to 8 bits.
Then, the grayscale image is created by replacing the R component in the RCCC image by the interpolated value calculated from four neighboring C components (c.f. Figure 1a,b). After the grayscale image is prepared, it is compressed by lossy codec and then written to the video container. The writing into the video contained is performed in a separate thread to increase the efficiency. The procedure is presented in Figure 3.
The RGB image mode (Figure 3) is designed mainly for manual image-analysis purposes, although, like the monochrome mode, it can be used, for example, when processing using DNNs. The 24-bit RGB (8 bits for each component) image is artificially prepared from the RCCC image to enhance the quality of manual image analysis [12]. If people perform the video analysis, the color image, even with some color artifacts, is better and more natural to watch than images in the other (e.g., RCCC or grayscale) formats. Unfortunately, in RCCC format there is no information about the relationship between green and blue components; thus, to calculate all three RGB components, this relationship is taken from statistical data of real RGB recordings.
The RGB image is created from the RCCC image before the compression process, as presented in Figure 3. RCCC to RGB conversion is a computationally consuming process, and to enhance its speed of computing a special buffer is created as follows. Eight consecutive images are simultaneously converted into RGB space and kept in a buffer for further processing. After the RCCC to RGB conversion, the images are read from the buffer and then compressed with the VP9 codec. Finally, the new separate thread is created for writing the compressed data to the output file. This procedure is shown in Figure 3.
As mentioned before, for lossy compression, we have selected a VP9 video codec to achieve real-time performance on one CPU together with high image quality. This codec belongs to the High Efficiency Video Coding (HEVC) standard and offers a quality of image comparable to the x265 codec.
For both codec modes, the lossy compression should ensure a constant quality of the compressed images. In fact, in many applications (e.g., in streaming or limited-throughput media) the lossy codecs are often set to achieve a constant bitrate. This could result in a huge data loss, especially in quickly changing scenes with many details. However, in automotive applications, such quality loss is unacceptable. Thus, the lossy codecs should have to be set to VBR and Q or CQ mode. By setting the value of the quantization parameter, it is possible to significantly affect the behavior of the codec. We allow this setting to a user, so the quality may be adjusted to the application needs. We also address checking of this parameter in our experiments.

5. Experiments

To validate the high compression ratio and real-time performance of the proposed codec procedures, a series of experiments were conducted. The RCCC video database was curated specifically for these experiments. The experimental protocol involved evaluating both the compression ratio and the performance metrics of the proposed codec algorithms. Additionally, the quality of compressed videos and the correlation between all tested parameters were assessed.

5.1. RCCC Video Database

In all experiments, 11 uncompressed video sequences in the RCCC format were utilized. These sequences were recorded using an RCCC camera mounted on the front of a vehicle. The selected sequences are representative of ADAS scenarios, encompassing a diverse range of road scenes, road types, environments, and weather conditions. An average length of sequences equals approximately to 1800 frames (50 s). Each frame has 12 bpp, 36 fps, and a resolution of 1280’969 pixels. Detailed information regarding these sequences is provided in Table 3.
Sequences numbered 5, 7, 8, 9, 10, and 14 were recorded under nighttime conditions. The recordings contain both static scenes and dynamic ones with detailed elements (e.g., vehicle license plates, informational signs) suitable for evaluating compression quality. Generally, the test sequences were recorded in a continuous manner, without cuts or abrupt scene changes. Apart from the continuous movement in the image resulting from the movement of the camera vehicle, there are no rapidly moving objects relative to the camera. Additionally, much of the image involves relatively uniform areas (e.g., road surfaces, sky), thereby enabling the lossy encoder to achieve high compression rates.

5.2. Testbed and Experimental Setup

We assumed that the preferred processing platform to use in a vehicle is similar in performance to a standard computer with a single processor-based CPU. For all tests we used a Dell Precision T1500 (manufactured in Łódź, Poland) with mid-class Intel i7-3770 processor.
The lossy codec was implemented in C/C++ programming language based on the VP9 codec with the FFMPEG v.4.0 library. As introduced in Section 4, we prepared the codec to work in two modes: 8-bit grayscale mode and 24-bit RGB mode. Both modes require a maximum 16-bit RCCC format as the input (in experiments, we used 12-bit format).
In the grayscale mode, the RCCC image is converted to an 8 bpp image directly using the C1, C2, and C3 components, while the R component is replaced by the averaged value from the four neighboring C components (left, right, up, down). The grayscale image is then compressed with VP9 codec (for details, see Section 3).
The conversion of a 12-bit image to an 8-bit image was achieved by eliminating the least-significant bits. This is the simplest and fastest method that does not require any calculations. It additionally removes noise from the image, as typically the least-significant bits are the noisiest.
Nonetheless, for certain applications, it may be advantageous to employ tone mapping to retain details in both the darker and lighter regions of the image. In prior work, we developed two techniques specifically for ADAS applications, with additional results for other techniques documented in [7].
In fact, the VP9 codec does not support the grayscale mode for compression. Instead, the YUV 4:2:0 mode is used with Y as the grayscale image container, and color components U and V are set to 0. The YUV 4:2:0 mode has no spatial subsampling for Y (no loss of data) and the highest subsampling of U and V, making the smallest size of unused U and V components).
In the RGB mode, the RCCC image is converted to the RGB format by the special conversion function. The conversion is performed in parallel manner for eight consecutive RCCC images, which are stored in a buffer. Then, the RGB images are compressed with VP9 codec.

5.3. Methodology

To assess the efficacy of the proposed video codec procedures for the RCCC format, the following metrics were utilized: compression ratio, performance, and operational speed [36].
Compression ratio (CR):
CR = uncompressed   stream compressed   stream
Compression throughput of the input stream ( T h r ):
T h r = u n c o m p r e s s e d   s t r e a m c o m p r e s s i o n   t i m e M B / s
Decompression throughput of the output stream ( T h r o u t ):
T h r o u t = d e c o m p r e s s e d   s t r e a m d e c o m p r e s s i o n   t i m e M B / s
The separation of input and output throughput facilitates the analysis of compression and decompression speeds, independent of the compression ratio. The proposed RCCC codecs are designed to process raw RCCC images with bit depths of up to 16 bpp. Consequently, whether the RCCC image is converted to an 8 bpp grayscale image or a 24 bpp RGB image prior to lossy compression, the CR for lossy compression is determined by the size of the uncompressed RCCC input images. In our experiments, we thus referred to the size of the 12-bit RCCC sequences (initially, 12 bits are used for storing information, and others remain zeros).
Regarding throughput metrics T h r and T h r o u t , which encompass the overall performance of both compression and decompression processes, it is essential to consider the resulting bit depth of the output images. Therefore, for the proposed lossy codec, we have 8 bpp grayscale format and throughputs T h r g r a y and T h r o u t g r a y or 24 bpp RGB format and throughputs T h r R G B and T h r o u t R G B :
T h r = 1.5 · h e i g h t · w i d t h / 2 20 c o m p r e s s i o n   t i m e M B / s
T h r o u t = 1.5 · h e i g h t · w i d t h / 2 20 d e c o m p r e s s i o n   t i m e M B / s
T h r g r a y = h e i g h t · w i d t h / 2 20 c o m p r e s s i o n   t i m e M B / s
T h r o u t g r a y = h e i g h t · w i d t h / 2 20 d e c o m p r e s s i o n   t i m e M B / s
T h r R G B = 3 · h e i g h t · w i d t h / 2 20 c o m p r e s s i o n   t i m e M B / s
T h r o u t R G B = 3 · h e i g h t · w i d t h / 2 20 d e c o m p r e s s i o n   t i m e M B / s
Furthermore, we determine the throughputs for both compression and decompression by calculating the fps.
For lossy compression, in addition to assessing the compression ratio and throughput, it is crucial to evaluate the quality of the compressed images. Typically, image quality evaluation employs both objective (numerical) and subjective (observer-based, such as the MOS (mean opinion score)) techniques. In this study, we utilize two objective metrics: the peak signal-to-noise ratio (PSNR) and the structural similarity index metric (SSIM).
The PSNR can be computed for the entire video stream, referred to as overall PSNR, or averaged across the frames, known as frame-averaged PSNR. The fundamental PSNR, expressed in [dB], is directly derived from the mean square error (MSE), where M A X I represents the maximum pixel value in the dataset under analysis:
P S N R = 20 l o g 10 M A X I M S E d B
The SSIM, introduced in 2004 [37], serves as a measure of image quality. It quantifies the similarity between an image post-compression and its original pre-compressed version. In assessing overall similarity, SSIM considers three key components: luminance similarity, contrast similarity, and structural similarity.
Unlike traditional metrics such as MSE or PSNR, which estimate absolute errors, SSIM is a perception-based model. It assesses image degradation by considering perceived changes in structural information while also taking into account important perceptual phenomena, including both luminance masking and contrast masking. Structural similarity comparison acknowledges the fact that image pixel values exhibit strong dependencies with neighboring pixels, which carry significant information about the structure of objects within the visual scene. Luminance masking describes the phenomenon where image distortions are less noticeable in brighter regions, whereas contrast masking indicates that distortions are less detectable in areas with high brightness variance or the presence of textures.
The numerical computation of SSIM involves deriving a score for each pixel using a window composed of neighboring pixels. The SSIM metric ranges from −1 to 1, with a value of 1 indicating identical images.
S S I M x , y = 2 μ x μ y + C 1 2 σ x y + C 2 μ x + μ y + C 1 σ x + σ y + C 2
where μ x , σ x , σ x y :
μ x = i = 1 N ω i x i
σ x = i = 1 N ω i x i μ x 2
σ x y = i = 1 N ω i x i μ x y i μ y
C 1 = K 1 L 2 , C 2 = K 2 L 2 , where L represents the dynamic range of pixel values, for example, 255 for an 8-bit representation, and K 1 , K 2 1 [37].
A commonly used metric for evaluating color video sequences is S S I M Y U V . This metric is calculated as a weighted average of the SSIM values computed for each component—luminance (Y) and chrominance components (U and V)—as follows:
S S I M Y U V = 4 S S I M Y + S S I M U + S S I M V 6
One can notice a greater importance given to luminance than to color components.
In the study by [37], the correlation between objective metrics (PSNR and MSSIM, where MSSIM is SSIM averaged for a single image) and the subjective evaluation metric MOS were estimated for 344 test images subjected to JPEG and JPEG2000 compression. The analysis revealed that to achieve an image with minimal perceptible distortions (relatively high subjective quality rating for a still image, such as a paused video sequence), the PSNR metric should exceed 35, and the SSIM metric should exceed 0.95. These values can be used as limits when setting the quality of a lossy compressed image.
While PSNR and SSIM are widely used metrics for evaluating image quality, they have notable limitations. PSNR, which is derived from the mean square error (MSE), primarily measures the pixel-wise differences between the original and compressed images. Although a higher PSNR value generally indicates better image quality, it does not account for human visual perception. This means that two images with similar PSNR values might be perceived very differently by human observers [37]. SSIM, on the other hand, is well-suited for comparing perceptual quality as it considers luminance, contrast, and structural similarities [38]. This makes SSIM a more reliable metric for assessing image quality in ADAS applications, where perceptual quality is crucial. However, even SSIM can fall short in capturing the perceptual quality of images in dynamic and complex scenes typical of ADAS environments. For instance, high SSIM scores might not accurately reflect the visibility of critical details like road signs or pedestrians, which are essential for the efficacy of ADAS [39].
Moreover, the efficacy of downstream tasks such as object detection, lane keeping, and obstacle avoidance in ADAS is not always correlated with high PSNR and SSIM scores. These metrics do not consider the specific requirements of these tasks, such as the need for high contrast and clear edges to detect objects accurately. For example, an image with a high PSNR might still have artifacts that interfere with object-detection algorithms, leading to poor performance in real-world scenarios [38]. Therefore, while SSIM is a valuable metric for assessing perceptual quality, additional evaluations need to be performed for each ADAS task to ensure comprehensive assessment. To provide a more rounded understanding of image quality, it would be beneficial to include user studies that evaluate perceptual quality and task-specific performance evaluations. These could involve testing the performance of ADAS algorithms on compressed images to see how well they detect and respond to various objects and conditions. Such evaluations would ensure that the metrics used align more closely with the system’s operational needs, providing a more practical understanding of image quality in the context of ADAS.
To sum up, it can be seen that the issue of image quality assessment for the automotive area is unfortunately not yet sufficiently scientifically and practically solved and is beyond the scope of this article.

5.4. Results

As mentioned in Section 3, typically, lossy codecs have many parameters that control their compression ratio, speed of compression, and the produced image quality. In the experiments, the following internal settings of the VP9 codec were used for both modes (i.e., the grayscale and RGB mode) to achieve the real-time performance: the speed was set to 8, the quality was set to “realtime”, and GOP was set to 50. To achieve constant quality, no constraints on the bitrate had to be introduced. Instead, the constant rate factor (CRF) was used to control the overall image quality (smallest CRF = 0 means the highest image quality and highest CRF = 63 means lowest image quality). In the experiments, two sample values of CRF were selected: CRF = 15 for high quality, and CRF = 55 for reduced quality. The first value was selected as a minimum value, which allows a higher compression ratio to be achieved by the lossy VP9 codec than the lossless FFV1 codec presented in [10] for both modes. Moreover, the codec, with the CRF = 15 setting, guarantees near the same perceptual image quality as the source image. During the tests, we also noticed that the CRF values below 15 contributed to very low compression ratios, with no noticeable improvement of the image quality. The second CRF value (55) was subjectively selected as a value for which the manual analysis of the compressed images is still possible without significant artifacts in the image.

5.5. Quantitative Analysis in Lossy Compressed Images

The results of the experiments of lossy compression and decompression are presented in Table 4 (for grayscale mode) and Table 5 (for RGB mode). The tests were performed for 11 test sequences and two various quality settings (CRF = 15 and 55). The results show that in the grayscale mode, the achieved CR value ranged from 51 to 456 for high image quality (CRF = 15) and ranged from 916 to 7765 for reduced image quality (CRF = 55). In RGB mode, the CR was significantly smaller and ranged from 4.51 to 7.93 for high image quality and ranged from 120.3 to 602.6 for reduced image quality. The CR for both modes was calculated in the same manner, i.e., taking into account the overall ratio between the size of input and output streams. However, in the RGB mode, the image was firstly converted to the RGB space, which actually tripled the input size of the stream before the compression, finally resulting in the lower compression ratio.
Similar results were obtained in the performance analysis. In the grayscale mode, the proposed codec procedure achieved high efficiency and real-time performance for both compression (up to 130 fps) and decompression (up to 315 fps). The efficiency varied depending on the quality and the content of the video sequence, but never fell below 69 fps during compression and 126 fps during decompression.
In the RGB mode, due to bigger size of the compressed stream, the efficiency achieved was reduced, but it was still very high (up to 102 fps for compression and 121 fps for decompression). For high image quality settings, the efficiency was significantly lower (ranging from 36 to 45 fps for compression and from 24 to 34 fps for decompression) than for the reduced quality. Notice that these values are still near the real-time processing (the input video sequences had 36 fps). The performance results could be easily increased by using more efficient or advanced CPU or GPU.
Besides the analysis of the performance, in the case of the lossy compression, an analysis of the image quality is very important, especially in applications like ADAS. If we look at the results presented in Table 3 and Table 4, we can notice that the differences between values of PSNR as well as values of SSIM metrics calculated for various video sequences, but given quality settings and modes, are relatively small. The reduction in the image quality by changing the CRF from 15 to 55 arises in the reduction in PSNR (e.g., from 51.06 to 42.75 for sequence number 5 in grayscale mode). The SSIM parameter is less sensitive. In the same case, it only changes from 0.993 to 0.978.

5.6. Qualitative Analysis in Lossy Compressed Images

As mentioned before, the quality of lossy compressed images dedicated to ADAS and autonomous vehicles should be constant or at least confined. Besides the CRF parameter, the quality of compressed images also depends on the GOP size. The GOP size specifies how often the keyframes (intra-frame compressed images) will occur. These frames offer the highest image quality but the lowest compression ratio. For the other types of frames, an error of the inter-frame prediction is compressed only (the inter-frame compression is performed). Inter-frame compression offers a much higher compression ratio but at the expense of lower image quality.
To show how the size of GOP influences on the quality of each frame in the sequence, we calculated the PSNR and SSIM for all frames in the sequence number 3. The results are presented in Figure 4 and in Table 6.
The relatively small variability of the PSNR and SSIM (especially for high quality compression, CRF = 15) confirms the correct behavior of the encoder, i.e., maintaining a constant perceptual image quality (CQ mode of the codec). Please note that there are other modes of compression (c.f. Section 3), but only the CQ mode offers high stability of the image quality in the video sequence.
However, with the increased size of the GOP, fluctuations in quality also increase. They are visible within a single GOP (as presented in Figure 4), especially for reduced image quality (PSNR loss up to 8 dB and SSIM loss from 0.95 to 0.91). The farther from the keyframe, the lower the quality. For GOP = 1, this phenomenon does not occur, and the quality is very stable. We also noticed that the fluctuations also do not occur for still scenes and reduced image quality (CRF = 55, the first 70 frames in this sequence when the car with the camera is not moving). In general, only keyframes (i.e., the first frames of the GOP) offer higher quality. If GOP equals 1, there are only keyframes, so the quality is the highest possible.
We should still remember that the small size of the GOP significantly reduces the CR. For example, if we change the GOP from 50 to 1 and still want to obtain high-quality images, the CR fall drastically from 51.18 to 8.81.
We can notice the same trend in the case of the performance and throughput of the codec. For example, if we change the GOP from 50 to 1, the compression throughput decreases from 100 fps to 61 fps, while the decompression throughput decreases even more, i.e., from 126 fps to 38 fps.
Despite the advantages of high CR for long GOP, long GOPs are not recommended due to another reason. In the automatic or manual analyses of the video sequences, which require video scrolling, fast forwarding, or rewinding the compressed video, a decoding of all preceding frames within the GOP (even those skipped) is required. This increases the requirements for the computational efficiency of the decoder.
In order to more accurately illustrate the impact of the lossy compression settings on the image details, an additional experiment has been performed on the selected image. We selected the 196th frame from the sequence number 7. The source RCCC image was transformed into the RGB color space and then compressed and decompressed. The decompressed image is shown in Figure 5. At first glance, we see unnatural colors (except the red one—this is the only color component in the source RCCC image). This is due to the specificity of the RCCC source format (see Section 2) and, in fact, is not a very important aspect.
During the compression, the GOP size was set to 50; thus, the tested image became the 46th image in its Group of Pictures (almost at the end, so with relatively low quality). Two license plates of cars were analyzed in detail: the first one of the car standing in the middle (with original resolution of (184, 40) pixels) and the second one with the moving car (with original resolution of (80, 22) pixels). Table 7 shows decompressed zoomed parts of the image with the car plates for various compression settings. In the case of a still object, the quality is maintained, even for very high compression ratios. The plate is still readable, even a for very high C R = 1481 . This C R value means that in the compressed image, 1 bit stores information about 123 input 12-bit pixels, on average, which makes an impression.
In the case of moving parts of the picture, the loss of the quality is visible in much lower CRs. For CRs below 100, the digits on the plate are readable, but for higher compression ratios, they are not. Taking into account the specifics of the ADAS recordings (almost constant movement in the video sequences), this problem surely arises and should be considered during the design stage. This problem results from the specificity of the codec, which is adapted to the human recipient (see Section 3).
To additionally visualize the effect of the GOP length on the image quality, we present the results for the same two license plates as in the previous example (Table 8) and for three road signs (Table 9). For still cars, the license plates’ initial resolution was (213, 65), and for moving cars, the initial resolution was (70, 30). Three road signs’ initial resolutions are similar: (113, 106), (112, 119), and (113, 147).
The tests were carried out for a GOP length of 50 and for two compression parameters: high quality (CRF = 15) and low quality (CRF = 55). Three frames were selected for visualization: the 50th (the last frame of the x-th GOP), i G O P x = 50 , and the first and the 5th of the next GOP ( i G O P x + 1 = 1 , i G O P x + 1 = 5 . This allowed for minimizing differences in the source image, especially for moving objects (measurement conditions were as similar as possible).
As can be seen in previous analyses (c.f. Figure 4), the highest PSNR and SSIM quality metrics occurred for the first, I-type key frame in the GOP. The further from the key frame, the more the quality decreased. This is also confirmed by the numerical results determined for entire frames presented in Table 8 and Table 9 (second column).
If the GOP length were reduced to 1, all frames would have the highest possible quality; with a GOP length of 5, any frame quality would be no worse than for the 5th frame shown.
Visually, results differ depending on the speed of objects and the amount of detail. For example, for the license plate on a still car, and for signs with a small amount of detail, the perceptible differences are very small, even at high compression levels (CRF = 55). With a fast-moving car, the results are not as good. When the image fragment has low resolution and the speed is high, the details are blurred. Lowering the compression quality increases this effect even more.
Larger GOP sizes increased the compression ratio but led to larger quality fluctuations in the video sequences. Smaller GOP sizes stabilized the quality but reduced the compression ratios and throughput. However, the GOP length affects the compression quality much less than the CRF compression parameter.
Based on the above analysis, codec presets can be prepared for various driving scenarios. In urban areas, speeds will be lower, but there will be a lot of detail (e.g., many road signs). In highway driving, speeds will be much higher, but there are also large signs, and the amount of detail in the images is lower. Determining detailed codec settings, however, requires a further detailed analysis taking into account the context and ADAS application. One solution would be an adaptive system, but it requires much more computing resources to work in real time.
In summary, the conducted research shows that lossy compression has great potential for use in automotive areas. Unfortunately, there is still a lack of good metrics for assessing image quality for automatic analysis. Global (for the entire frame) indicators such as PSNR and SSIM are not very sensitive to local changes that concern a small part of the image. There are very large differences in the quality of moving and immobile objects, which has very unfavorable consequences in the automotive industry. For example, a fast-moving, relatively small, but potentially dangerous object (a bird, a stone) flying towards a car will certainly be of much worse quality than a stationary object, which will certainly reduce the probability of correct recognition.
Despite its disadvantages, lossy compression may have more advantages over lossless compression, apart from significantly increasing the compression ratio. The block effect resulting from the compression method reduces pixel noise. It is the noise of image sensors, especially visible in difficult lighting conditions on roads, that makes lossless compression and automatic image analysis difficult [10]. Additional reduction of brightness and color variability brings the appearance of symbolic objects (road signs) closer to their originals, which has a positive effect on the quality of automatic recognition [40].

6. Conclusions

This paper presents an efficient lossy coding procedure tailored for red, clear, clear, clear (RCCC) sensors and designed for advanced driver-assistance systems (ADASs). The proposed approach leverages the high-quality VP9 codec in two distinct operational modes: grayscale for automatic image analysis and RGB for manual analysis.
By adopting the VP9 codec, the procedure achieves high compression ratios while maintaining real-time processing capabilities, which is essential for automotive applications. It allows for significantly higher compression ratios compared to lossless compression techniques while maintaining control over the process. These ratios vary from 51 to 7765 for grayscale image mode and from 4.51 to 602.6 for the RGB image mode, depending on the specified output image quality settings. The throughput reached is 129 fps for compression and 315 fps for decompression in grayscale mode and 102 fps for compression and 121 fps for decompression in RGB mode.
Our study demonstrates that by leveraging the lossy codec in VBR mode with a CQ parameter, users can balance compression efficiency and image quality based on their specific requirements. The codec allows for tailored settings based on the application’s specific needs, ensuring optimal balance between compression ratio and image quality. This adaptability is important for different ADAS scenarios, ranging from automatic detection systems to manual image analyses.
With the experimental analysis, critical codec parameters were established for ADAS applications, which ensures stable video quality without significant degradation in image detail. The analysis confirmed that maintaining a high PSNR (above 35) and SSIM (above 0.95) is vital for achieving minimal perceptible distortions, ensuring the compressed image quality meets the stringent requirements of ADAS applications.
Our study emphasizes the importance of GOP size on image quality, highlighting that larger GOP sizes, while increasing compression ratios, led to greater quality fluctuations within video sequences. Conversely, smaller GOP sizes stabilized the quality but reduced the compression ratios and throughput.
The above conclusions, although obtained on data specific to the automotive area, can be extended to other areas of image compression for further processing, both manually and using artificial intelligence.

Author Contributions

Conceptualization, P.P. and K.P.; methodology, P.P. and K.P.; software, K.P. and P.P.; validation, K.P. and P.P.; formal analysis, P.P. and K.P.; investigation, P.P. and K.P.; writing—original draft preparation, P.P. and K.P.; writing—review and editing, P.P. and K.P.; visualization, P.P. and K.P.; supervision, P.P. and K.P. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was prepared with the research subsidy 0211/SBAD/0224 in Poznan University of Technology financial means.

Data Availability Statement

Third-party data. Restrictions apply to the availability of these data. The data are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nguyen, T.-T.-N.; Phan, T.-D.; Duong, M.-T.; Nguyen, C.-T.; Ly, H.-P.; Le, M.-H. Sensor Fusion of Camera and 2D LiDAR for Self-Driving Automobile in Obstacle Avoidance Scenarios. In Proceedings of the 2022 International Workshop on Intelligent Systems (IWIS), Ulsan, Republic of Korea, 17–19 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–7. [Google Scholar]
  2. Yeong, D.J.; Barry, J.; Walsh, J. A Review of Multi-Sensor Fusion System for Large Heavy Vehicles Off Road in Industrial Environments. In Proceedings of the 2020 31st Irish Signals and Systems Conference (ISSC), Letterkenny, Ireland, 11–12 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  3. Rahmani, M.; Kloess, H.; Hintermaier, W.; Steinbach, E. Real-Time Video Compression for Driver Assistance Camera Systems. In Proceedings of the First Annual International Symposium on Vehicular Computing Systems, Dublin, Ireland, 21–25 July 2008; ICST: Saitama, Japan, 2008. [Google Scholar]
  4. Piniarski, K.; Pawłowski, P. Efficient pedestrian detection with enhanced object segmentation in far IR night vision. In Proceedings of the 2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 20–22 September 2017; pp. 160–165. [Google Scholar]
  5. Kromer, P.; Prauzek, M.; Stankus, M.; Konecny, J. Adaptive Fuzzy Video Compression Control for Advanced Driver Assistance Systems. In Proceedings of the 2018 26th International Conference on Systems Engineering (ICSEng), Sydney, Australia, 18–20 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–9. [Google Scholar]
  6. Pawłowski, P.; Piniarski, K.; Dąbrowski, A. Selection and tests of lossless and lossy video codecs for advanced driver-assistance systems. In Proceedings of the 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 19–21 September 2018; pp. 344–349. [Google Scholar]
  7. Piniarski, K.; Pawłowski, P.; Dąbrowski, A. Efficient HDR tone-mapping for ADAS applications. In Proceedings of the 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 18–20 September 2019; pp. 325–330. [Google Scholar]
  8. Wang, Y.; Chan, P.H.; Donzella, V. A Two-stage H.264 based Video Compression Method for Automotive Cameras. In Proceedings of the 2022 IEEE 5th International Conference on Industrial Cyber-Physical Systems (ICPS), Coventry, UK, 24–26 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  9. Ku, B.; Kim, K.; Jeong, J. Real-Time ISR-YOLOv4 Based Small Object Detection for Safe Shop Floor in Smart Factories. Electronics 2022, 11, 2348. [Google Scholar] [CrossRef]
  10. Pawłowski, P.; Piniarski, K.; Dąbrowski, A. Highly Efficient Lossless Coding for High Dynamic Range Red, Clear, Clear, Clear Image Sensors. Sensors 2021, 21, 653. [Google Scholar] [CrossRef] [PubMed]
  11. Babu, R.V.; Tom, M.; Wadekar, P. A survey on compressed domain video analysis techniques. Multimed. Tools Appl. 2016, 75, 1043–1078. [Google Scholar] [CrossRef]
  12. Orlaco Vision Systems for All Types of Vehicles-Stoneridge. Available online: https://stoneridge-orlaco.com/en/vehicles (accessed on 6 August 2024).
  13. 10 Reasons Your H.264 Codec Isn’t Good Enough for Automotive|LinkedIn. Available online: https://www.linkedin.com/pulse/10-reasons-your-h264-codec-isnt-good-enough-marco-jacobs/ (accessed on 5 August 2024).
  14. Choi, H.; Bajic, I.V. High Efficiency Compression for Object Detection. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1792–1796. [Google Scholar]
  15. Khani, M.; Sivaraman, V.; Alizadeh, M. Efficient Video Compression via Content-Adaptive Super-Resolution. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 4501–4510. [Google Scholar]
  16. Marsetic, A.; Kokalj, Z.; Ostir, K. The Effect of Lossy Image Compression on Object Based Image Classification—Worldview-2 Case Study. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 38, 187–192. [Google Scholar] [CrossRef]
  17. Huang, H.-W.; Lee, C.-R.; Lin, H.-P. Nighttime vehicle detection and tracking base on spatiotemporal analysis using RCCC sensor. In Proceedings of the 2017 IEEE 9th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Manila, Philippines, 1–3 December 2017; pp. 1–5. [Google Scholar]
  18. Lelowicz, K.; Jasinski, M.; Pilat, A.K. Discussion of Novel Filters and Models for Color Space Conversion. IEEE Sens. J. 2022, 22, 14165–14176. [Google Scholar] [CrossRef]
  19. Karanam, G. Interfacing Red/Clear Sensors to ADSP-BF609® Blackfin Processors (EE-358); Engineer-to-Engineer Note; Analog Devides, Inc.: Wilmington, MA, USA, 2013. [Google Scholar]
  20. Kim, D.-M.; Yoon, Y.-S.; Ban, Y.; Suh, J.-W. Prex-Net: Progressive Exploration Network Using Efficient Channel Fusion for Light Field Reconstruction. Electronics 2023, 12, 4661. [Google Scholar] [CrossRef]
  21. Kang, H.-C.; Han, H.-N.; Bae, H.-C.; Kim, M.-G.; Son, J.-Y.; Kim, Y.-K. HSV Color-Space-Based Automated Object Localization for Robot Grasping without Prior Knowledge. Appl. Sci. 2021, 11, 7593. [Google Scholar] [CrossRef]
  22. Jin, X.; Yin, S.; Li, X.; Zhao, G.; Tian, Z.; Sun, N.; Zhu, S. Color image encryption in YCbCr space. In Proceedings of the 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP), Yangzhou, China, 13–15 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar]
  23. Chen, Y.; Wen, C.; Liu, W.; He, W. DBENet: Dual-Branch Brightness Enhancement Fusion Network for Low-Light Image Enhancement. Electronics 2023, 12, 3907. [Google Scholar] [CrossRef]
  24. Sullivan, G.J.; Ohm, J.-R.; Han, W.-J.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
  25. VP9 Bitrate Modes in Detail. Available online: https://developers.google.com/media/vp9/bitrate-modes (accessed on 5 August 2024).
  26. Wiegand, T.; Schwarz, H.; Joch, A.; Kossentini, F.; Sullivan, G.J. Rate-constrained coder control and comparison of video coding standards. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 688–703. [Google Scholar] [CrossRef]
  27. Wiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A. Overview of the H.264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 560–576. [Google Scholar] [CrossRef]
  28. The H.264 Advanced Video Compression Standard, 2nd Edition|Wiley. Available online: https://www.wiley.com/en-ie/The+H.264+Advanced+Video+Compression+Standard%2C+2nd+Edition-p-9780470516928 (accessed on 5 August 2024).
  29. The Latest Open-Source Video Codec VP9—An Overview and Preliminary Results. Available online: https://research.google/pubs/the-latest-open-source-video-codec-vp9-an-overview-and-preliminary-results/ (accessed on 5 August 2024).
  30. Uhrina, M.; Sevcik, L.; Bienik, J.; Smatanova, L. Performance Comparison of VVC, AV1, HEVC, and AVC for High Resolutions. Electronics 2024, 13, 953. [Google Scholar] [CrossRef]
  31. Chen, Y.; Murherjee, D.; Han, J.; Grange, A.; Xu, Y.; Liu, Z.; Parker, S.; Chen, C.; Su, H.; Joshi, U.; et al. An Overview of Core Coding Tools in the AV1 Video Codec. In Proceedings of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 41–45. [Google Scholar]
  32. Lu, G.; Ouyang, W.; Xu, D.; Zhang, X.; Cai, C.; Gao, Z. DVC: An End-to-end Deep Video Compression Framework 2019. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
  33. Rippel, O.; Bourdev, L. Real-Time Adaptive Image Compression 2017. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar] [CrossRef]
  34. Chen, L.; Cheng, B.; Zhu, H.; Qin, H.; Deng, L.; Luo, L. Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications. Electronics 2024, 13, 2150. [Google Scholar] [CrossRef]
  35. Xu, J.; Zhou, B.; Zhang, C.; Ke, N.; Jin, W.; Hao, S. The impact of bitrate and GOP pattern on the video quality of H.265/HEVC compression standard. In Proceedings of the 2018 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Qingdao, China, 14–16 September 2018; pp. 1–5. [Google Scholar]
  36. Daede, T.; Norkin, A.; Brailovskiy, I. Video Codec Testing and Quality Measurement; Internet Engineering Task Force: Fremont, CA, USA, 2018. [Google Scholar]
  37. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
  38. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed]
  39. Wang, Y.; Chan, P.H.; Donzella, V. Semantic-Aware Video Compression for Automotive Cameras. IEEE Trans. Intell. Veh. 2023, 8, 3712–3722. [Google Scholar] [CrossRef]
  40. Pawłowski, P.; Prószyński, D.; Dąbrowski, A. Recognition of road signs from video. In Proceedings of the IEEE NTAV/SPA 2008, New Trends in Audio and Video Signal Processing, Algorithms, Architectures, Arrangements and Applications, Poznan, Poland, 25–27 September 2008. [Google Scholar]
Figure 1. Most popular color filter arrays (CFAs) in automotive sensors: (a) monochrome (CCCC), (b) RCCC, (c) RCCB, (d) RGCB, (e) RYYc, (f) RGGB (C—clear; R—red; B—blue; G—grey; Y—yellow; c—cyan).
Figure 1. Most popular color filter arrays (CFAs) in automotive sensors: (a) monochrome (CCCC), (b) RCCC, (c) RCCB, (d) RGCB, (e) RYYc, (f) RGGB (C—clear; R—red; B—blue; G—grey; Y—yellow; c—cyan).
Electronics 13 03651 g001
Figure 2. The illustrative example of the variability of the output stream (bitrate) and image quality across Q, CQ, CBR, and VBR modes for the VP9 codec [25].
Figure 2. The illustrative example of the variability of the output stream (bitrate) and image quality across Q, CQ, CBR, and VBR modes for the VP9 codec [25].
Electronics 13 03651 g002
Figure 3. Lossy compression process scheme used for RCCC images with conversion to monochrome or RGB images.
Figure 3. Lossy compression process scheme used for RCCC images with conversion to monochrome or RGB images.
Electronics 13 03651 g003
Figure 4. PSNR and SSIM within sequence 3 with various GOP size and quality settings (CRF = 15 for high quality and CRF = 55 for reduced quality).
Figure 4. PSNR and SSIM within sequence 3 with various GOP size and quality settings (CRF = 15 for high quality and CRF = 55 for reduced quality).
Electronics 13 03651 g004
Figure 5. One image from sequence 7 presented in RGB color space.
Figure 5. One image from sequence 7 presented in RGB color space.
Electronics 13 03651 g005
Table 1. A list of codec groups with examples.
Table 1. A list of codec groups with examples.
Codec GroupExample Codecs
Early Digital CodecsMPEG-1, MPEG-2, H.261
Standard DefinitionMPEG-4 Part 2, H.263
High DefinitionH.264 (AVC), VP8
4K and UHDH.265 (HEVC), VP9
New TechnologiesVVC (H.266), VP10 (in development)
Table 2. A list of codecs with key features.
Table 2. A list of codecs with key features.
CodecOrganizationIntroduction YearKey Features
H.264/AVCITU-T, ISO/IEC2003Hybrid video coding, variable block sizes, intra and inter-frame coding, integer-based DCT, CAVLC, CABAC
H.265/HEVCITU-T, ISO/IEC2013Larger block sizes (up to 64 × 64), more directional modes, improved HDR support, better compression efficiency
VP9Google2013Open standard, block sizes up to 64 × 64, various prediction modes, DCT, ADST, WHT, adaptive entropy coding
AV1AOMedia2018Advanced motion compensation, improved prediction and transform coding, royalty-free
Table 3. Tested sequences of RCCC format.
Table 3. Tested sequences of RCCC format.
Sequence
Number
Number
of Frames
Description
11855urban, sunny
21875urban, sunny
3515test ride in laboratory
42011urban, sunny
52051road crossing, winter
62114Suburban, cloudy
71761road crossing, sunny
81707suburban, evening
91909suburban, sunny
101897departure from the property
112149road crossing, cloudy
Total frames19,844
Table 4. Compression ratios, throughputs, and qualities for lossy compression and decompression in grayscale mode for various sequences and quality settings.
Table 4. Compression ratios, throughputs, and qualities for lossy compression and decompression in grayscale mode for various sequences and quality settings.
Sequence NumberCRF = 15, GOP = 50CRF = 55, GOP = 50
Compression
CR and Thrin
Decompression
Throut, PSNR, SSIM
Compression
CR and Thrin
Decompression
Throut, PSNR, SSIM
CR[MB/s][fps][MB/s][fps]PSNR [dB]SSIMCR[MB/s][fps][MB/s][fps]PSNR [dB]SSIM
160.5589.976164.43713947.260.9871091.78120.7102229.50219437.870.945
273.3888.775197.56116748.490.9921112.12120.7102184.54815638.320.951
351.19118.3100149.05812645.850.986998.13153.8130372.64531536.590.941
4233.0597.082165.6214050.890.9942987.8136.0115182.18215442.050.974
5232.3793.579165.6214051.060.9933343.58134.9114203.47617242.750.978
6243.694.680182.18215451.010.9933039.57133.7113185.73115742.190.976
7105.84101.786196.37816648.420.9921870.34131.3111220.03818639.310.956
851.4681.669152.60712946.910.987916.62124.2105210.57417837.670.943
973.42110.093178.63315148.280.9911513.58120.7102211.75717938.690.948
1094.2891.177170.35214448.270.9912471.67101.786204.65917339.390.957
11456.3989.976165.6214051.310.9937765.87121.8103210.57417843.740.979
Table 5. Compression ratios, throughputs, and qualities for lossy compression and decompression in RGB mode for various sequences and quality settings.
Table 5. Compression ratios, throughputs, and qualities for lossy compression and decompression in RGB mode for various sequences and quality settings.
SequenceCRF = 15, GOP = 50CRF = 55, GOP = 50
Compression
CR and Thrin
Decompression
Throut, PSNR, SSIM
Compression
CR and Thrin
Decompression
Thrin, PSNR, SSIM
CR[MB/s][fps][MB/s][fps]PSNR [dB]SSIMCR[MB/s][fps][MB/s][fps]PSNR [dB]SSIM
15.88142.04095.82746.240.983203.09269.776347.89836.950.916
24.79127.83688.72546.070.983120.34283.980390.411035.080.896
35.97145.54195.82746.050.984187.09294.683408.111536.140.926
45.41142.04088.72546.060.982230.01323.091408.111536.670.91
55.79142.04095.82746.110.98234.81312.388429.412137.520.918
65.56138.43988.72545.930.98227.83301.785422.311937.810.915
78.19156.244120.73446.610.983342.67340.796401.011338.510.935
86.57138.439102.92946.480.982204.67301.785411.711637.640.929
94.51138.43985.22445.940.982125.99283.980390.411034.940.891
104.94156.24485.22446.070.984170.24291.082404.611435.130.895
117.93159.745117.13346.090.979602.62362.0102429.412139.260.937
Table 6. CRs, throughputs, PSNR, SSIM for mono mode in sequence 3 with various values of GOP and CRF.
Table 6. CRs, throughputs, PSNR, SSIM for mono mode in sequence 3 with various values of GOP and CRF.
Compression
CR and Thr
Decompression
Throut, PSNR, SSIM
CRFGOPCR[MB/s][fps][MB/s][fps]PSNR [dB]SSIM
1518.8172.26145.03853.750.995
1039.03121.8103130.111046.070.987
2545.81125.4106136.011546.010.987
5051.19118.3100149.112645.850.986
55170.8273.362152.612939.210.959
10513.17184.5156365.530937.420.949
25863.78178.6151321.827236.870.943
50998.1153.8130372.631536.590.941
Table 7. Two car plates (extracted from stationary and moving cars in Figure 5) compressed with various CRF settings.
Table 7. Two car plates (extracted from stationary and moving cars in Figure 5) compressed with various CRF settings.
CRFCRStill ObjectMoving Object
12.4Electronics 13 03651 i001Electronics 13 03651 i002
107.7Electronics 13 03651 i003Electronics 13 03651 i004
2528.5Electronics 13 03651 i005Electronics 13 03651 i006
3592.1Electronics 13 03651 i007Electronics 13 03651 i008
45299Electronics 13 03651 i009Electronics 13 03651 i010
55718Electronics 13 03651 i011Electronics 13 03651 i012
631481Electronics 13 03651 i013Electronics 13 03651 i014
Table 8. Compression results of two car plates (extracted from still and moving cars from Figure 5) with various GOP and CRF settings.
Table 8. Compression results of two car plates (extracted from still and moving cars from Figure 5) with various GOP and CRF settings.
CRFGOP
Frame
Still ObjectMoving Object
15
(CR = 8.2)
(GOP = 50)
i G O P x = 50
PSNR: 45.82
SSIM: 0.985
Electronics 13 03651 i015Electronics 13 03651 i016
i G O P x + 1 = 1
PSNR: 50.87
SSIM: 0.995
Electronics 13 03651 i017Electronics 13 03651 i018
i G O P x + 1 = 5
PSNR: 47.42
SSIM: 0.991
Electronics 13 03651 i019Electronics 13 03651 i020
55
(CR = 341)
(GOP = 50)
i G O P x = 50
PSNR: 36.47
SSIM: 0.937
Electronics 13 03651 i021Electronics 13 03651 i022
i G O P x + 1 = 1
PSNR: 38.12
SSIM: 0.948
Electronics 13 03651 i023Electronics 13 03651 i024
i G O P x + 1 = 5
PSNR: 37.21
SSIM: 0.944
Electronics 13 03651 i025Electronics 13 03651 i026
Table 9. Compression results of road signs (extracted from sequence 7) with various GOP and CRF settings.
Table 9. Compression results of road signs (extracted from sequence 7) with various GOP and CRF settings.
CRFGOP
Frame
Road Sign 1
(Moving Object)
Road Sign 2
(Moving Object)
Road Sign 3
(Moving Object)
15
(CR = 8.2)
(GOP = 50)
i G O P x = 50
PSNR: 43.75
SSIM: 0.978
Electronics 13 03651 i027Electronics 13 03651 i028Electronics 13 03651 i029
i G O P x + 1 = 1
PSNR: 50.72
SSIM: 0.994
Electronics 13 03651 i030Electronics 13 03651 i031Electronics 13 03651 i032
i G O P x + 1 = 5
PSNR: 46.44
SSIM: 0.987
Electronics 13 03651 i033Electronics 13 03651 i034Electronics 13 03651 i035
55
(CR = 341)
(GOP = 50)
i G O P x = 50
PSNR: 34.56
SSIM: 0.911
Electronics 13 03651 i036Electronics 13 03651 i037Electronics 13 03651 i038
i G O P x + 1 = 1
PSNR: 37.85
SSIM: 0.947
Electronics 13 03651 i039Electronics 13 03651 i040Electronics 13 03651 i041
i G O P x + 1 = 5
PSNR: 36.31
SSIM: 0.925
Electronics 13 03651 i042Electronics 13 03651 i043Electronics 13 03651 i044
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pawłowski, P.; Piniarski, K. Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles. Electronics 2024, 13, 3651. https://doi.org/10.3390/electronics13183651

AMA Style

Pawłowski P, Piniarski K. Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles. Electronics. 2024; 13(18):3651. https://doi.org/10.3390/electronics13183651

Chicago/Turabian Style

Pawłowski, Paweł, and Karol Piniarski. 2024. "Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles" Electronics 13, no. 18: 3651. https://doi.org/10.3390/electronics13183651

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop