1. Introduction
Many advanced cameras use color filter arrays (CFA) comprised of Bayer structures to capture photons. Each sensor unit (pixel) in these cameras only records one of the three primary colors, and the output raw image of these cameras can be called CFA data. To accurately represent how humans see colors and brightness on a digital display, the raw image data captured by the camera need to be processed. The conversion pipeline includes several processing routines that constitute the image processing pipeline. Examples of these processing routines include white balance, demosaicing, tone mapping, and so on.
Figure 1 displays an image captured by a Nikon Z7. The raw data were downloaded and transformed into a 14-bit RGB color image using the proposed image processing pipeline. As depicted in
Figure 1c, RGB color images exhibit rich details in the darker regions. The contrast in these areas was enhanced using the TMO2 within our pipeline. Without such processing, the photograph, captured under identical conditions, would resemble
Figure 1a, where the details, such as those of the ladder shown in
Figure 1b, are significantly less distinct compared to
Figure 1d. This example underscores the importance of our image processing pipeline in preserving raw image information, which is crucial for applications like image compression and post-processing [
1,
2].
Certain image processing applications, including industrial surface detection, require the three channels of the raw image without nonlinear tone mapping. Storing or transmitting raw images without compression greatly increases the demand for storage space and transmission capacity compared to compressed formats like sRGB-jpeg. To address this, studies [
3,
4,
5,
6] have explored reversing the conversion from sRGB-jpeg back to raw images. However, this process is typically lossy and prone to errors, often rendering it unsuitable for CFA data. For micro imaging camera CFA images, FPGA technology has been employed to balance image quality with resource utilization [
7]. For CFA images of micro imaging cameras, FPGA is used to achieve a compromise between maximizing image quality and minimizing resource usage to obtain output images [
8]. Additionally, Docker/Singularity containers and scalable, modular, open-source image processing pipelines like Nextflow11 and Galaxy 9 are utilized for medical imaging [
9]. A specially designed three-stage pipeline enhances the accuracy and robustness of feature-based (VO) algorithms, with each stage specifically addressing performance-related issues.
The RGB color images produced by the camera are pivotal for subsequent image post-processing research, including enhancement, filtering, and detection tasks. The quality of images processed in the initial stages profoundly influences the later stages of image processing. Thus, the efficiency and effectiveness of the initial image processing stages are critical to ensure that the entire workflow is optimized to meet the output requirements of post-processed images. In this context, the work of [
10] with the Cascaded RAW and RGB Restoration Network (CR3Net) marks a significant advancement by leveraging both RGB images and their paired RAW counterparts, essential for tasks requiring high-fidelity RAW data preservation, such as reflection removal. Furthermore, ref. [
11] discusses using deep learning to reconstruct High Dynamic Range(HDR) images from Low Dynamic Range (LDR) images, highlighting the complexity and dependence on diverse, high-quality training data to ensure robust model generalization.
In the realm of image processing, numerous research topics continue to evolve, particularly concerning specific pipeline modules. For instance, while several denoise algorithms [
12,
13] primarily focus on the effects on RGB color images, camera noise reduction is typically applied directly to the output CFA [
14]. Moreover, certain demosaic algorithms [
15,
16,
17] derive CFA data from downsampled RGB images, resulting in CFA values that do not accurately reflect the linear relationship with light intensity inherent to true camera outputs. Similarly, while Tone Mapping Operators(TMO) [
18,
19,
20,
21] traditionally convert nonlinear HDR images into lower dynamic ranges, our image processing pipeline requires a TMO capable of adapting linear HDR images to nonlinear outputs. Often the experimental conditions and subjects used to test these algorithms do not align with their real-world applications. These modules, designed and tested in isolation, do not always consider their collective impact on the final image quality, leading to deviations from practical utility.
The challenges in pipeline design can be summarized as three key issues: preservation of image information, module independence, and error reduction in the inverse process of the image processing pipeline. Our contributions, which address these challenges, are three-fold:
We have designed a nearly lossless image pipeline that effectively recovers dynamic range values in the R, G, and B channels. This approach ensures high fidelity and reversibility in the conversion process from RAW to RGB, which is crucial for high-quality image processing such as in professional photography post-processing.
We conducted a detailed analysis of the relationship between image quality and the cascade modules in the pipeline. Our findings contribute to the understanding of how each module independently affects the overall imaging results, emphasizing the importance of module independence.
The implementation of a two-stage Tone Mapping Operator (TMO) optimizes the dynamic range and color representation more finely to match human visual perception. This structure not only aligns with human visual systems by adjusting the dynamic range through linear stretching but also preserves sensitive information during numerical rounding, which traditional HDR reconstruction processes often overlook.
Additionally, the introduction of an independent highlight processing module uniquely preserves highlight information, preventing the loss of detail that often occurs in conventional HDR processes. This feature is pivotal in maintaining the integrity of high-light areas in images, further enhancing the detailed preservation across all lighting conditions.
2. Related Work
The transformation of RAW data to RGB color images by cameras typically involves a CFA and a bespoke image processing pipeline. Previous investigations, such as those described by [
14], have systematically analyzed the various components and their sequential operations within such pipelines. In contrast, as depicted in
Figure 2, the architecture and essential elements of our newly proposed image processing pipeline exhibit substantial advancements over the configurations reported in [
14].
At present, the industry lacks a unified standard for image processing pipelines, with each camera manufacturer often developing proprietary systems tailored to their specific technological frameworks and market needs. Prominent software platforms like Adobe Photoshop, DCRAW, and LibRaw implement these pipelines to varying extents and capabilities. Specifically, Adobe Photoshop leverages the Adobe SDK to support its extensive suite of image manipulation tools. DCRAW serves as a pivotal open-source resource, detailed in [
1] and accessible as per [
22], which incorporates fundamental modules crucial for high-fidelity image processing, such as white balance, demosaicing, and Tone Mapping Operators (TMO).
LibRaw, another influential open-source library, provides robust capabilities for reading and processing RAW files from a broad spectrum of digital cameras. Its utility is particularly noted for its adaptability in managing diverse RAW formats, which are frequently updated and expanded by camera manufacturers. This adaptability makes LibRaw an indispensable tool for developers and researchers who demand precise control over raw image data to push the boundaries of image quality and analytical potential. The versatility of LibRaw in handling these formats exemplifies the critical need for flexible tools in the rapidly evolving domain of digital imaging.
HDR imaging presents significant challenges in maintaining the fidelity of rendered images to human visual perception, particularly when converting these to lower dynamic ranges. In response to these challenges, several methodologies have been proposed to optimize tone mapping techniques that facilitate the transition from high to low dynamic range images without sacrificing image quality [
23,
24,
25]. Specifically, ref. [
26] has innovated a local and adaptive linear filter that approximates the performance of traditional image processing pipelines, yielding RGB outputs that closely replicate those produced directly by digital cameras.
Furthermore, the study by [
22] introduces an advanced image processing pipeline employing down-sampled RGB images as inputs, rather than utilizing raw image data directly. This approach, while innovative, often results in information loss or the generation of undesirable color artifacts—outcomes that are suboptimal for both camera manufacturers and image-processing professionals. To address these issues, our research has developed a two-stage tone mapping process designed to preserve more information, particularly in image highlights, thereby reducing the likelihood of highlight clipping. This enhanced tone mapping capability ensures that even subtle details in the raw image data are retained, enhancing the quality of images available for subsequent post-processing tasks.
Our pipeline is uniquely configured to handle raw images stored in the Adobe Digital Negative (DNG) format, which encapsulates both the raw image data and essential camera parameters required for sophisticated processing tasks. Introduced by Adobe in 2003, the DNG format, alongside the Adobe DNG Converter, has been pivotal in standardizing the storage and accessibility of raw data across various camera manufacturers’ formats, including those proprietary to Fuji (.RAF), Nikon (.NEF), Sony (.ARW), and Canon (.CR2, .CR3). The DNG format’s compatibility with diverse CFA storage formats significantly broadens its applicability, making it an invaluable resource in contemporary digital image processing.
3. Method
3.1. Pre-Processing
In digital image processing, preliminary modules are essential for enhancing the quality of the captured images by addressing inherent noise and defects in the sensor data. One critical component of this stage is the black level subtraction, which mitigates the effects of dark current—a phenomenon where heat induces charge generation in the sensor elements, even in the absence of light. These charges, indistinguishable from those generated by actual scene illumination, manifest as noise in the captured image.
The process of black-level subtraction involves deducting a reference value, known as the black level, from the CFA output. This reference is typically determined by averaging the outputs of pixel sensors that are shielded from light and thus only subject to dark current. By correcting for this baseline noise, the fidelity of the image data to the real scene is substantially improved.
Concurrently, the pipeline addresses pixel defects through a mechanism known as bad point concealment. During the manufacturing process, certain sensors are identified as defective; their responses are either non-functional or aberrantly high. These defects are cataloged in a bad point table, which is then utilized by the image processing pipeline to automatically correct or interpolate data from surrounding pixels during image reconstruction, thus ensuring consistency and quality in the final image output.
This redesigned pipeline no longer necessitates manual intervention for bad point concealment in subsequent processing stages, streamlining operations and enhancing overall processing efficiency. This approach not only improves the visual quality of the images but also significantly reduces computational overhead by preemptively correcting systemic sensor errors.
3.2. White Balance
Color stability is a critical feature of the HVS, defined as the ability to perceive consistent colors under varying lighting conditions [
27,
28]. This phenomenon ensures that objects retain their perceived colors despite changes in the spectral composition of the ambient light. For instance, a white wall will appear consistently white to the HVS whether illuminated by the broad spectrum of sunlight or the narrower spectral emissions of fluorescent lighting, despite the significant differences in color temperature and spectral output [
29].
Digital camera sensors, however, do not inherently possess this capability of color stability. Their outputs are linearly related to the intensity of incoming light, leading to color deviations when the lighting conditions change. For example, images captured under a high-color temperature light source appear bluer, whereas those under low-color temperature conditions appear redder. To correct these deviations and more closely mimic HVS perception, digital cameras employ a process known as white balance. This process adjusts the color output of the camera to compensate for color temperature variations, thereby maintaining the perceived color consistency across different lighting conditions.
The technical mechanism underlying the white balance involves adjusting the gain values for each color channel based on the color temperature of the light source. These gain values are often derived from the camera’s image parameters stored within Digital Negative (DNG) files. Despite the proprietary nature of many camera systems and the undisclosed specifics of their internal white balance algorithms, the efficacy of these systems in producing high-quality images suggests that the calculated white balance gains are both precise and effective.
3.3. High-Light Processing
In digital imaging, the manipulation of highlight values is a critical concern, particularly when addressing the retention of HDR details. As demonstrated in
Figure 3, our system employs a 14-bit quantization, achieving a dynamic range from 0 to 16,383.
Table 1 delineates the parameters for the CFA, where
represents the maximum achievable value. Specifically, the maximum values for the R, G, and B channels are 16,183, 16,383, and 16,188, respectively. Post-application of white balance gains, the modified dynamic ranges for the R, G, and B channels are computed as follows:
where
defines the initial dynamic range,
represents the gain applied during white balance, and
is the new dynamic range of the CFA data. To quantify the impact of white balance adjustments on clipping in different color channels, we utilize:
Here, denotes the proportion of the dynamic range that is clipped, with representing the saturation threshold at 15,120. Analysis reveals significant clipping in the R and B channels—52.06% and 31.85%, respectively, which poses challenges for image post-processing where the preservation of highlight details is crucial. To mitigate this loss, our pipeline introduces an innovative processing scheme that focuses on retaining highlighted information rather than merely truncating it at high values.
Figure 3.
Example of output image by using different image processing pipelines. (a) The raw image converted to RGB color image used by DCRAW with unclipping highlight model; (b) the raw image converted to RGB color image used by DCRAW with clipping highlight model; (c) the raw image converted to RGB color image used by Adobe Photoshop 2020; (d) the raw image convert to RGB color image used by image processing pipeline that designed in this paper.
Figure 3.
Example of output image by using different image processing pipelines. (a) The raw image converted to RGB color image used by DCRAW with unclipping highlight model; (b) the raw image converted to RGB color image used by DCRAW with clipping highlight model; (c) the raw image converted to RGB color image used by Adobe Photoshop 2020; (d) the raw image convert to RGB color image used by image processing pipeline that designed in this paper.
The
Table 1 provides a summary of the dynamic ranges across the color channels, highlighting the extent of data retention and loss due to processing:
Table 1.
The information of dynamic range on
Figure 3.
Table 1.
The information of dynamic range on
Figure 3.
Channel | | | | | | |
---|
R | 16,183 | 400 | 1.9885 | 15,120 | 0–31,542 | 52.06% |
G | 16,383 | 400 | 1 | 15,120 | 0–15,983 | 0% |
B | 16,188 | 400 | 1.4053 | 15,120 | 0–22,187 | 31.85% |
As [
30] discusses, the white balance gain is typically calculated using the white dot of the image as a reference, ensuring that the values across the R, G, and B channels are balanced at highlight points. However, this approach can lead to discrepancies in dynamic range among the channels; particularly, the G channel might exhibit a reduced dynamic range compared to R and B due to differential gain adjustments. In response to the phenomenon of highlight overflow, DCRAW adopts a strategy where the maximum value of the G channel serves as a reference point, truncating excessive values in the R and B channels to prevent oversaturation.
To enhance the preservation of high dynamic values in the R and B channels that are often clipped in standard processing, our methodology incorporates a modified highlight overflow processing module. This new framework is depicted in
Figure 4, where the symbol of × represents the matrix dot product. The revised Color Filter Array (CFA) matrix post-weighted white balance adjustment exemplifies the sophisticated handling of color dynamics.
Further refining our image processing approach, we employ polynomial coefficients similar to those discussed in [
31]. This technique enables an effective non-reference assessment of image quality, particularly useful in our white balance adjustment process where highlight values are critical. Additionally, leveraging the reduced Gerschgorin circle method as outlined in [
31] aids in robustly modeling the image focus, ensuring the effectiveness of our methodology under various noise conditions.
To reduce the computational complexity while processing image data, we specifically consider only the pixels surrounding a 3 × 3 subblock. The inter-channel correlation within this subblock is utilized to compute the values for the G channel as follows:
represents the numerical value of the G channel in the Bayer format, which is obtained by downsampling the Bayer array to extract the G channel. represents the matrix of the G channel, calculated by averaging the surrounding R and B channels, providing a new estimated value for the G channel. The terms t, b, l, and r refer to the top, bottom, left, and right pixel positions closest to each pixel in the G channel, respectively. is derived by recalculating the values from the adjacent R and B channels.
Downsampling the G component. In the 2 × 2 sub-block of the Bayer filter, there are two green (G) components. The computation of one of these components is illustrated in
Figure 4, while the other is derived using an analogous procedure. The recalibration of white balance saturation levels involves the following set of formulas:
Here,
s denotes the saturation value drop control parameter, set by default to 0.99. This parameter’s optimization is elaborated in the subsequent section.
Table 2 lists some white level (WL) values, where Mask is a matrix defining regions above the white level, and iMask is its inverse, highlighting regions below this threshold. The formula employs a template matrix constructed through the dot product of G1 and G2.
The redesigned highlight processing module can retain the higher dynamic range of the R and B channels. In order to address the highlight overflow caused by increased dynamic ranges of R and B, the dynamic range of G is expanded. This allows the highlighted information of the image to be preserved. When the saturation value reaches the maximum quantization value (as shown in
Table 2, which premium cameras generally meet), the highlight processing module ensures that the R and B components are completely reversible, and 99% of the dynamic range in the G component is also completely reversible.
3.4. TMO
Standard displays typically render images in an 8-bit format, offering a dynamic range from 0 to 255. Yet, this range falls short of capturing the broader dynamic capabilities of modern cameras. For example, the Canon EOS R5 outputs images with a dynamic range from 0 to 16,383. This discrepancy presents a challenge in rendering high dynamic range (HDR) images on low dynamic range (LDR) displays, commonly addressed through the Tone Mapping Operator (TMO). The primary objective of tone mapping is to effectively compress dynamic ranges without sacrificing image quality. While direct linear scaling from 0 to 16,383 down to 0 to 255 is straightforward, it often results in significant information loss. This loss not only diminishes the quality of the displayed image but also leads to substantial deviations from the real scene as perceived by the human visual system (HVS). The role of the color scale mapping algorithm is crucial—it allows displays to reveal details in both dark and bright areas of images with extended dynamic ranges, thereby ensuring that HDR content can be adapted for conventional display technologies without compromising quality. Moreover, important details captured by cameras, such as subtle nuances in shadowed regions, may remain concealed in images as seen in
Figure 1b. These details, although not immediately visible, are preserved in the raw image data and can be emphasized through enhanced light contrast during post-processing. Image compression techniques utilize this raw data by selectively discarding redundant information according to the specific needs of the compression algorithm. This process not only achieves more efficient compression but also aids in reconstructing images that closely resemble real-world scenes. Consequently, the initial phase of information loss becomes critical as it influences the overall quality and realism of post-processed images, underscoring the need for detailed raw data in various image enhancement scenarios.
To preserve the original image data effectively, a two-stage tone mapping process is utilized to transition HDR images to LDR images, suitable for display on standard monitors. This complex tone-mapping process is represented mathematically as follows:
In this formula,
denotes the initial stage of tone mapping, referred to as TMO1 in
Figure 2, which handles the primary adjustments. The subsequent function,
, represents the second stage of adjustments, labeled as TMO2. TMO2 is specifically designed to transform linear HDR images, where pixel values are directly proportional to light intensity, into nonlinear HDR representations as outlined in the literature [
32]. Unlike their LDR counterparts, nonlinear HDR images retain a more comprehensive range of image data. These images mimic the nonlinear response of the human visual system (HVS) to light, preserving detailed information from both dark and bright areas of the scene. However, some data may remain hidden until required for display. In the process of adapting to a monitor’s dynamic range, some details are inevitably lost. TMO1 applies the default tone mapping protocol from JPEG-XT [
23]:
Here, r = 2.2, A is the mean logarithmic value of the HDR image(y), and H is the HDR pixel value, with L being the corresponding LDR pixel value. This tone mapping technique includes a response function that simulates the HVS response to varying light intensities, facilitating both gamma correction [
24] and effective color scale mapping. The resulting image, which aligns with comfortable HVS perception, can be readily displayed on a monitor.
However, not all color scale mappings implemented by JPEG-XT’s default tone mapping achieve optimal perceptual quality. Occasionally, after applying the default TMO of JPEG-XT to some HDR images, the output dynamic range may display minimum values significantly higher than 1, or maximum values much lower than 255. As shown in
Figure 5a, reliance solely on JPEG-XT’s default TMO can lead to substantial deviations from the real scene as perceived by the HVS. To address this issue, further adjustments to the dynamic range and enhancements to the brightness contrast in specific areas are essential. The application of TMO2 in the second stage, as illustrated in
Figure 5b,c, effectively stretches specified image areas to reveal more detailed information. This adjustment brings the displayed image closer to the actual scene, enhancing the visual experience as compared to the initial mapping shown in
Figure 5a.
3.5. Demosaic
The conversion of an image from a Bayer pattern to an RGB color image is termed demosaicing, a process often synonymous with image interpolation. This process involves estimating the values of the two absent color channels at each pixel based on the known value from the image’s single channel and the values of its surrounding pixels. Given the objectives of the pipeline discussed in this paper—specifically, to preserve the integrity of the raw image data and to ensure the approximate reversibility of the process—the designed image processing pipeline is meticulously developed to ensure that both the inputs and outputs of the demosaicing phase are represented as real numbers.
To ensure both high quality and manageable complexity in the output image during demosaicing, several schemes were rigorously evaluated and subsequently integrated into the image processing pipeline. The demosaicing algorithm described in [
15] is renowned for its ability to deliver high-quality results within short processing times. Additionally, the Multi-Level Residual Interpolation (MLRI) method [
16], which operates based on the Laplacian energy distribution model, and the Adaptive Residual Interpolation (ARI) [
17], which builds upon the foundational Residual Interpolation (RI) method [
33], have been particularly effective in minimizing artifacts. These methodologies not only enhance the visual fidelity of the RGB images but also streamline the processing workflow, thereby balancing efficiency with output quality.
As depicted in
Figure 6, discerning differences between output images demosaiced using the methods from [
15], MLRI, and ARI proves challenging when viewing the images as a whole. However, in the detailed ruler area of the image, color blocks generated by these different algorithms exhibit slight variations; notably, images processed with the method from [
15] display more incorrect colors. In
Figure 6e, pronounced quality degradation is evident, marked by clear color inaccuracies in the white stripe and ringing artifacts within the color ring. These issues are attributable to numerical errors that occur during the demosaicing process. Although the ringing artifacts in
Figure 6g are less severe than in
Figure 6f,
Figure 6g also shows some color errors and a pronounced zipper effect that blurs the boundaries between colors. These findings align with those reported in [
17], indicating that the overall quality of the MLRI algorithm is slightly inferior to that of ARI.
Furthermore, the practicality of the ARI algorithm for real-time applications is limited due to its processing time; demosaicing an 8192 × 5464 image with ARI takes about 30 min, which is impractical for real-time camera image processing pipelines or for use in end terminals. MLRI, on the other hand, offers a balance between quality and processing time, typically requiring 200 s to process an image of the same resolution. To enhance efficiency, we modified the implementation of the MLRI algorithm. Our statistical analysis revealed that the correlation between image pixel values follows a Gaussian distribution, suggesting that pixels in closer proximity exhibit a higher correlation. Consequently, pixels further from the demosaicing block exert only a minor influence on the block, which informed our approach to optimize the algorithm for faster processing while minimizing the impact on image quality.
To mitigate discontinuities at block edges that could introduce false colors, an overlapping blocking technique is employed. The entire image is divided into 256 × 256 subblocks with overlapping edges. Each subblock is then processed using the Multi-Level Residual Interpolation (MLRI) algorithm before being reassembled to form the complete image.
A comparison of
Figure 6c,d indicates that the HVS cannot detect any differences between the whole image processed solely with the MLRI algorithm and the segmented image processed in the same manner. To substantiate this observation, we enlarged the same area of the image as shown in
Figure 6b. As illustrated in
Figure 6g,h, no significant quality degradation or imaging differences are evident at the block edges, apart from a minor discrepancy in values. Moreover, the processing time for MLRI is significantly reduced to approximately 70 s by employing the overlapping block scheme, a substantial improvement from the earlier 200 s. Consequently, the more time-intensive Adaptive Residual Interpolation (ARI) algorithm is also adapted to utilize this overlapping blocking technique. This method not only ensures maintenance of image quality but also addresses the needs of practical applications effectively.
3.6. Post-Processing
In the post-processing part, in order to meet the final output requirements for color saturation and image original information adjustment, we use the following formula:
r is a controlling factor for color saturation. The greater the control factor, the deeper the saturation. Because matrix multiplication in color saturation adjustment causes exceeding of the dynamic range, the value out of range needs to be cut off, resulting in irreversible color saturation adjustment. Therefore, if pipeline reversibility is required, color saturation adjustment is not performed. The images in this paper are not subject to color saturation adjustment, but considering the needs of other users, we provided color saturation adjustment to users as an operator. Considering the difference in output color space, the imaging pipeline designed in this paper outputs the camera’s default color space but does not provide various color space transformations in post-processing.
4. Discussion
The image processing pipeline developed in this study aims to achieve module independence, preserve as much raw image data as possible, and ensure reversibility. An essential aspect of this development is the verification of output image quality from the designed pipeline. This paper conducts a comparative analysis between the internal pipeline of a camera, DCRAW, Adobe Photoshop, and the pipeline designed in this research.
The experiment primarily focuses on comparing the detail retention in the output images from each pipeline, with a specific emphasis on maintaining similar overall brightness and color. In areas where distinct object details are more visible through one pipeline, we see superior preservation of the raw image data by that pipeline. To evaluate the richness of detail in output images from various image processing pipelines, it is crucial to analyze details in both bright and dark shadow areas. Experimentally, it has been observed that images with pronounced details in highlighted areas tend to suppress these details.
As illustrated in
Figure 7, taken by the Fujifilm GFX 50S at an ISO setting of 12,800, the left side of the figure displays the full output image, while the right side shows an enlarged section of the image. This enlargement allows for a detailed comparison of the image processing outcomes. The default JPEG output from the camera, which is processed through the camera’s internal pipeline, along with outputs from DCRAW and Adobe Photoshop, are compared. It is difficult to discern differences in image details from the full picture; hence, the analysis focuses on a bright region of the image. As shown in
Figure 7b,d,f,g, the billboards, which are predominantly white in the images processed by JPEG, DCRAW, and Adobe Photoshop, lose crucial font and red appearance information.
In a previous analysis of the DCRAW image processing pipeline during stage 4, it was noted that DCRAW might truncate part of the information in highlighted areas to prevent overflow in the output image, leading to a loss of highlight detail. Experimental results indicate that both JPEG and Adobe Photoshop exhibit a similar reduction in highlight information, as seen with DCRAW.
In the dark shadow area, as depicted in
Figure 8, a Canon EOS R5 was utilized for shooting at an ISO setting of 100. To better analyze the information within the dark shadow region, we enhanced the brightness contrast of this area using the TMO2 algorithm to bring the shadows to approximately the same brightness level as the rest of the image.
From the initial comparison, it becomes evident that the output from DCRAW exhibits a white balance imbalance, introducing a magenta tint into the image. This issue likely stems from DCRAW’s requirement for specific white balance gains for each camera brand, rather than leveraging the camera’s built-in white balance settings which are typically selected by the photographer or determined via automatic white balance. Automatic white balance adjusts the gain for the R and B channels based on each specific image. If DCRAW fails to properly account for camera-specific parameters, it defaults to using generic white balance parameters, which may not be suitable for all scenarios. Consequently, continuous updates to the DCRAW library are essential to support new camera models. However, the quality of some RGB images output by DCRAW, as shown in
Figure 8, is subpar and deemed unacceptable for certain photographic studies. In contrast, the image processing pipeline designed in this study utilizes the white balance parameters provided by the camera, retaining them in the output Color Filter Array (CFA) file format. This strategy obviates the need for frequent library updates, as it only requires reading parameters from the CFA file.
Upon zooming into the dark shadow region, detailed imaging is apparent. From the fourth line of comparison, the JPEG output exhibits pseudo-colors in the dark shadow areas and noticeable noise on the color of the car’s iron plate. Some details are obscured in the outputs using Adobe Photoshop. The fifth line of comparison highlights color inaccuracies on the wheels and unrealistic imaging of the ground in the outputs from both JPEG and Adobe Photoshop. These observations underscore that the image processing pipeline developed in this paper is more effective at preserving raw image information in the dark shadow areas.
The pixel saturation value, as indicated in
Table 2, is typically equal to or slightly less than the maximum value. “Max” in
Table 2 represents the highest value achievable in Color Filter Array (CFA) data. When exposure output reaches the saturation value, it usually equals the maximum quantization level minus one. WL(org) denotes the original saturation value provided by the camera, while WL(new) refers to the recalculated value after adjusting for the non-clipping black level. For instance, the Fujifilm GFX 100, which operates with 14-bit quantization, has a possible maximum output value of 16,383, with the same figure as its saturation value. Highlighted areas in the output image, particularly after applying white balance gain, often contain numerous pixels that exceed this saturation threshold.
Figure 9 illustrates this condition, where the light areas in CFA data reach saturation, signaling a need to manage highlight overflow in the image output. DCRAW handles this overflow by shearing the excess pixel values. In contrast, the pipeline in this paper adjusts the saturation value by reducing it by 1% to establish a new saturation level. For specifics, refer to stage 4.
Figure 9 displays the output images after adjustments of 0% (no adjustment), 0.5%, and 1% to the saturation value. Observations show that without any reduction, the highlighted area turns magenta, as seen in
Figure 9a, indicating highlight overflow. A reduction of 0.5% renders the highlight section normal, whereas a 1% reduction ensures adequate buffer to prevent any overflow.
Nearly 1% adjustment in the G channel component of the white balance alters the image processing pipeline’s reversibility. The output of the inverse process is not the original CFA emitted by the camera but rather the CFA data with the black level subtracted. As the black level represents noise without significant image information, the subtracted CFA data more closely resembles the real data.
Figure 10b shows the statistics after this inverse process, highlighting a 1% reduction. The horizontal axis depicts the pixel numerical error ratio resulting from the inverse process, and the vertical axis shows the ratio of error pixels to the total pixel count in the image. The maximum error ratio ranges from 0.48% to 0.5%. If the saturation value reaches the maximum, it results in a 1% deviation. After the reverse process, the sheared value is restored to the median value, equating to an error of 0.5%. Moreover, from the perspective of pixel count, the total pixel error ratio reaches 0.8%, with a concentration in the low error value range. Typically, if pixels in an image reach the saturation point, the R, G, and B channels will likely be saturated. Following channel gain, the maximum CFA pixel values in the R channel are similar; the dynamic range of the CFA is primarily determined by the R and B channels. The pipeline in this study does not shear values in the R and B channels, thus fully retaining the dynamic range of the CFA after gain adjustment. Consequently, the dynamic range of the final output color image remains essentially intact.
In terms of evaluating the quality of output images, we employed several algorithms: the Tenengrad algorithm, the Gaussian Parade evaluation algorithm, and the LogGabor evaluation algorithm, all referenced in [
34]. The Tenengrad evaluation algorithm assesses image quality by analyzing the gradient of surrounding pixels in an image. Utilizing the difference measurement of structural degradation in both spatial and wavelet domains, the global clarity difference is measured through self-similarity between resolutions. The Gaussian Parade evaluation algorithm characterizes the local spatial structure of an image through a visual pattern that is based on directional selectivity. The LogGabor evaluation algorithm exploits the differential response of LogGabor filters to extract features that enhance clarity perception.
Figure 11 uses objective image quality assessment, where Tenengrad represents the Tenengrad algorithm in [
35], Gauss-para represents the Gaussian paragraph image quality assessment algorithm in [
34], and LogGabor represents the LogGabor filtered image quality assessment algorithm in [
34]. From the table values, the image processing pipeline designed in this article has a similar output image quality to Photoshop, while the output image quality score of DCRAW is relatively low. Both DCRAW’s image processing pipeline and Photoshop’s image processing pipeline process the highlight areas of the image, resulting in the inability to reverse the output RGB color image back to the CFA image. Meanwhile, when it is necessary to display hidden image information, Photoshop and DCRAW cannot obtain CFA images through inverse processing and then re-enlarge the output image based on the hidden information.
Figure 12 displays CFA image information for various highlight scenes. “Max” denotes the maximum value in the CFA data, “WL” represents the high saturation value, and “Pro” indicates the proportion of the remaining dynamic range. The proportion of the dynamic range in pixels above the saturation value ranges between 40% and 50%, which includes detailed high-image information. However, in the imaging pipeline designed in this article, this part can be almost entirely preserved, with only 1% being irreversible and unable to restore the original detail information, significantly surpassing the 40% value. This effectively demonstrates that our designed imaging pipeline can efficiently preserve highlight information.