Next Article in Journal
On the Possibility to Use the Charge Imbalance in Patients Undergoing Radiotherapy: A New Online, In Vivo, Noninvasive Dose Monitoring System
Next Article in Special Issue
Leaf Spot Attention Networks Based on Spot Feature Encoding for Leaf Disease Identification and Detection
Previous Article in Journal
Prediction of COVID-19 from Chest CT Images Using an Ensemble of Deep Learning Models
Previous Article in Special Issue
Revisiting Low-Resolution Images Retrieval with Attention Mechanism and Contrastive Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Layer Decomposition Learning Based on Gaussian Convolution Model and Residual Deblurring for Inverse Halftoning

Department of Software Convergence Engineering, Kunsan National University, Gunsan-si 54150, Korea
Appl. Sci. 2021, 11(15), 7006; https://doi.org/10.3390/app11157006
Submission received: 22 June 2021 / Revised: 19 July 2021 / Accepted: 26 July 2021 / Published: 29 July 2021
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)

Abstract

:
Layer decomposition to separate an input image into base and detail layers has been steadily used for image restoration. Existing residual networks based on an additive model require residual layers with a small output range for fast convergence and visual quality improvement. However, in inverse halftoning, homogenous dot patterns hinder a small output range from the residual layers. Therefore, a new layer decomposition network based on the Gaussian convolution model (GCM) and a structure-aware deblurring strategy is presented to achieve residual learning for both the base and detail layers. For the base layer, a new GCM-based residual subnetwork is presented. The GCM utilizes a statistical distribution, in which the image difference between a blurred continuous-tone image and a blurred halftoned image with a Gaussian filter can result in a narrow output range. Subsequently, the GCM-based residual subnetwork uses a Gaussian-filtered halftoned image as the input, and outputs the image difference as a residual, thereby generating the base layer, i.e., the Gaussian-blurred continuous-tone image. For the detail layer, a new structure-aware residual deblurring subnetwork (SARDS) is presented. To remove the Gaussian blurring of the base layer, the SARDS uses the predicted base layer as the input, and outputs the deblurred version. To more effectively restore image structures such as lines and text, a new image structure map predictor is incorporated into the deblurring network to induce structure-adaptive learning. This paper provides a method to realize the residual learning of both the base and detail layers based on the GCM and SARDS. In addition, it is verified that the proposed method surpasses state-of-the-art methods based on U-Net, direct deblurring networks, and progressively residual networks.

1. Introduction

Printers and copiers are bilevel output devices that reproduce images on a paper by generating homogenous dot patterns using inks or toners. The printed images are in fact bilevel; however, the human visual system with the characteristics of low-pass filtering allows the printed image to be perceived as a continuous-tone image. Digital halftoning is needed to create a halftoned image with uniform dot patterns from a continuous-tone image with discrete gray levels (e.g., 255 gray levels) [1]. The halftoned image determines the spatial position of the inks to be deposited on a paper or controls a laser beam to form a latent image on a photoconductor drum. Digital halftoning has been used in many applications, including animated GIF generation from videos [2], removal of contour artifacts in displays [3], video processing in electronic papers [4], and data hiding [5]. The typically used digital halftoning techniques are dithering, error diffusion, and direct binary search [6].
In inverse halftoning, a continuous-tone image with 255 gray levels or more is reconstructed from its halftoned version [7]. In other words, inverse halftoning is the reverse of digital halftoning. Inverse halftoning is required in several practical applications, such as bi-level data compression [8], watermarking [9,10], digital reconstruction of color comics [11], and high dynamic range imaging [12]. Inverse halftoning is an ill-posed problem with many possible solutions because digital halftoning is a many-to-one mapping. Many studies have been conducted over the last several decades, and various approaches have been introduced based on look-up tables [13], adaptive low-pass filtering [14], maximum-a-posterior estimation [15], local polynomial approximation and intersection of confidence intervals [16], and deconvolution [17]. Recently, machine learning approaches have been actively considered based on dictionary learning [18,19,20] and deep convolutional neural networks (DCNNs) [21,22,23,24,25].

1.1. Image Decomposition in Deep Learning Frameworks

Image decomposition, which is also known as layer separation in other fields, has been steadily used for image restoration [26], image enhancement [27], and image fusion [28]. Image decomposition is an approach for separating an input image into two or more layers with different gradient and illumination characteristics. Traditional image decomposition has been realized based on image transformations (e.g., wavelets) [29] and image pyramids [30] to achieve multiple resolutions. In addition, sparse representation [31], the Gaussian mixture model [32], and adaptive filtering methods such as bilateral [33] and guided-image filtering [34] have been used for two-layer separation, i.e., base and detail layers. In this study, the base layer corresponds to a layer whose brightness changes smoothly, resembling a low-pass-filtered image, whereas the detail layer refers to a high-pass filtered image whose brightness changes rapidly. The definition of the base and detail layers may vary based on the application field.
Recently, image decomposition approaches have been incorporated into deep learning frameworks. U-net [35], Laplacian-net [36], residual networks (RNs) [37,38], and progressive residual networks (PRNs) [23,25] are representative deep learning models that apply the concept of image decomposition. U-net and Laplacian-net primarily aim to realize multiple resolutions, whereas RNs and PRNs focus on predicting residual layers. In particular, the key factor for improving image quality and accelerating convergence in an RN is that the brightness range of the residual layer should be narrow. In other words, by narrowing the output range in which the solution exists, RNs can obtain the optimal solution more easily. Therefore, it is critical to design a residual layer with a narrow brightness range.

1.2. Residual Layer Design for Residual Learning

In an end-to-end manner, RNs and PRNs are learned to map an input image into a residual layer with a narrow output range. For image restoration, the difference image between the original and input images is considered as a residual layer. Residual learning is formulated as follows:
x ( r ) = f θ R N ( x i )   x o x i
where x i , x o , and x ( r ) denote the input image, original image, and predicted residual layer, respectively. Herein, parentheses in superscripts indicate predicted values. Bold and italic lowercase letters indicate vectors. f θ R N indicates the DCNN with parameter θ for estimating the residual layer. As shown in Equation (1), the output of the network is the residual, and it differs from those of conventional DCNNs that directly transform the input image x i to the original image x o with a relatively wide output range. In addition, the residual layer is designed as a difference image between x o and x i , as shown in Equation (1). This is because the measured input images can be modeled physically as the addition of original images and residual layers.
x i = x o + x r
where x i indicates the measured input images. For example, captured noisy images and rain images can be measured images. x r is the residual layer that contains artifacts such as noise and rain streaks. The residual layer x o x i , as shown in Equation (1), is derived from the additive model of Equation (2).
Previous studies [37,38] showed that using the difference image as a residual layer can effectively improve visual quality and increase convergence speed. For example, in image denoising, the noise layer is used as the residual layer, which corresponds to the difference image between the original and noisy images. In general, noise is assumed to exhibit a Gaussian distribution. This implies that most of the pixels in the noisy layer are zero. Therefore, the output range of the noise layer can be narrowed. In rain removal, the rain layer including only rain streaks is used as the residual layer, and it is obtained by subtracting the original image from the input rain image. Because the rain layer includes only rain streaks, a narrow output range can be guaranteed in the residual layer.

1.3. Residual Learning Problems for Inverse Halftoning

Digital halftoning is a nonlinear system that includes binary quantization. Therefore, the additive model, as shown in Equation (2), is no longer valid for digital halftoning, that is, x i   x o + x r . This means that residual learning, as shown in Equation (1), cannot be directly applied to inverse halftoning. More specifically, the halftoned image is a bilevel image composed of black and white dot patterns. If the residual layer is defined as the difference image between the original image and the input halftoned image, similar patterns that appear as black and white dot patterns can appear in the residual layer. Inevitably, a sudden change in brightness is accompanied by a residual layer. Hence, merely creating a residual layer based on image difference is not suitable for inverse halftoning.

1.4. Progressively Residual Learning Problems for Inverse Halftoning

Progressively residual learning (PRL) [23,25] can be an alternative for solving sudden changes in brightness, as mentioned in the previous subsection. In PRL, the base layer whose brightness changes smoothly is first recovered; subsequently, the remaining detail layer is predicted.
x ( d ) = f θ P R L _ r ( x ( b ) , x ( i ) )   x o x ( b ) ,   where   x ( b ) = f θ P R L _ b ( x i )
x ( b ) and x ( d ) indicate the predicted base and detail layers, respectively. For inverse halftoning, the input halftoned image x i cannot be used as the base layer. However, in PRL, the input halftoned image x i is first converted into the base layer x ( b ) through the pretrained DCNN f θ P R L _ b . The generated base layer resembles a low-pass-filtered image, and it can be considered as an approximation of the original image. If the detail layer, which is defined as x o x ( b ) , is used as the residual layer, then a narrow brightness range can be guaranteed. This implies that residual learning, f θ P R L _ r , is possible. The additive model of Equation (2) can be used reasonably with PRL for inverse halftoning. For reference, the input halftoned image x ( i ) can be used with the base layer x ( b ) , as shown in Equation (3), to estimate the detail layer, thereby compensating for information loss in the predicted base layer.
However, PRL [23,25] applied to inverse halftoning has not been able to present a new deep learning model from the viewpoint of creating base and detail layers. In PRL, f θ P R L _ b is trained to generate the base layer. However, the output images of f θ P R L _ b cannot be regarded as the base layer. Instead, the output images correspond to the final reconstructed images because they appear similar to the original images. Moreover, the predicted base layers appear better visually than the reconstructed images using traditional inverse halftoning methods based on dictionary learning [19] and look-up tables [13]. If the image quality of the base layers decreases to the level of Gaussian blurring of the original images, then conventional PRL cannot yield satisfactory results. In summary, the PRL hitherto developed for inverse halftoning merely applies inverse halftoning twice in succession.

1.5. Contributions

This paper presents three major points. In particular, a new method for creating base and detail layers based on the proposed structure-aware layer decomposition learning (SALDL) is introduced.
  • First, to design the base layer, a new statistical distribution of the image difference between a blurred continuous-tone image and a blurred halftoned image with a Gaussian filter with a narrow output range is shown. Based on this observation, the base layer is reconstructed using a new GCM-based residual subnetwork that predicts the difference between the blurred continuous-tone image and the blurred halftoned image; this method differs completely from the existing PRL [23,25], which uses an initial restored image from a DCNN for base layer generation.
  • Second, the detail layer is generated based on structure-aware residual learning that predicts the difference image between the predicted base layer and the original image. To more effectively enhance image structures such as edges and textures, an image structure map predictor, which was introduced in a previous study [24], is incorporated into the residual detail layer learning, resulting in structure-enhancing learning. In addition, the predicted base layer is the low-pass-filtered version of the original image. Therefore, the proposed residual detail learning should be used to deblur the base layer, i.e., to remove the blurring of the base layer. This implies that the deblurring strategy is adopted in the proposed residual detail learning, unlike the existing PRL.
  • Third, it is demonstrated that SALDL can be used to recover high-quality images from the predicted base layers whose quality is poor in terms of edge and texture representation. However, the existing PRL [23,25] cannot yield satisfactory results from the same base layers. This reveals that the existing PRL is not suitable for low-quality base layers. By contrast, the proposed structure-aware residual learning method is more effective for describing image structures. To our best knowledge, this is the first study that performed the abovementioned comparison, and the experimental results confirmed the feasibility of the proposed SALDL as a new PRL for inverse halftoning that surpasses state-of-the-art methods such as PRL, U-net, and DCNN.

2. Proposed SALDL Based on GCM

2.1. Motivations

Image decomposition is an approach for analyzing and reconstructing images. Image transformation (e.g., wavelet transformation), structure-adaptive filtering, and sparse coding have been considered as effective tools for realizing image decomposition. However, DCNNs have recently demonstrated excellent performance in image enhancement and restoration. Therefore, this study focuses on incorporating image decomposition into a deep learning framework for inverse halftoning. In particular, a new deep learning model to enable the residual learning of both the base and detail layers is introduced. As discussed in the Introduction, residual learning that directly maps an input image into the residual layer is not applicable to inverse halftoning because the additive model is no longer valid. Moreover, the output range of the residual layer cannot be narrowed, owing to the black-and-white dot patterns. PRL can be considered as an alternative for realizing image decomposition. However, the PRL that has hitherto been developed for inverse halftoning merely applies inverse halftoning twice in succession, since the quality level of the restored base layer is similar to that of the original image. In addition, the PRL merely uses initially reconstructed images through a DCNN for base layer generation; hence, the design of the base layer lacks novelty. Furthermore, existing PRL cannot recover textures and fine details from low-quality base layers. Hence, a new SALDL based on GCM is proposed herein.
Figure 1 shows the concept of image decomposition based on the proposed SALDL for inverse halftoning. Unlike traditional approaches such as wavelet transform and image pyramids, residual-learning-based image decomposition is proposed. In particular, novel GCM-based residual learning and structure-aware residual deblurring are introduced for base and detail layer generation, respectively. By adding the predicted base and detail layers, a continuous-tone image can be reconstructed from the input halftoned image. Details regarding the generation of the base and detail layers are provided below.

2.2. Residual Layer Design for Baser Layer Generation

Unlike the residual layer design based on the additive model of Equation (1), a new GCM is proposed herein to generate the residual of the base layer.
x r b = x o k g x i k g = ( x o x i ) k g
where x r b denotes the residual layer corresponding to the base layer. Herein, the base layer is defined as the Gaussian blurring of the input halftoned image, x i k g . Here, denotes the convolution operation, and k g indicates the Gaussian smoothing filter. Therefore, Equation (4) indicates that the residual layer corresponding to the base layer is defined as the image difference between the blurred original image and blurred halftoned image through Gaussian filtering. Compared with Equation (1), the proposed residual layer is the filtered version of x o x i . Hereinafter, the proposed model expressed as Equation (4) is referred to as GCM to differentiate it from the additive model expressed in Equation (1).
The main objective of residual learning is to narrow the output range. Whether the residual layer generated based on the GCM yields a narrow output range is yet to be elucidated. The histogram distribution for one sample image was analyzed to verify this. Figure 2 shows four images for generating two types of residual layers. The original, halftoned, blurred original, and blurred halftoned images are shown from left to right. Figure 3 shows a comparison of the histogram distributions for the two types of residual layers. One is the residual layer generated using the additive model, which subtracts the original image from the halftoned image. The other is the residual layer generated using the proposed GCM, which subtracts the blurred original image from the blurred halftoned image. As shown in the histogram distributions, the residual layer generated using the proposed GCM yielded a narrow output range compared with the conventional additive model, which yielded a wider output range. This is because the residual layer generated based on the additive model tends to exhibit textures that resemble dot patterns. Meanwhile, the proposed GCM utilizes Gaussian filtering to smooth out sudden changes that appear in halftoned images, thereby enabling the output range of the residual layer to be narrow.

2.3. GCM-Based Residual Subnetwork for Baser Layer Generation

To realize the proposed GCM for base layer generation, a GCM-based residual subnetwork was designed, as shown in Figure 4. To implement the proposed GCM, as shown in Equation (4), Gaussian filtering was first applied to the input halftoned image. In existing deep learning tools, it can be easily implemented through a convolution layer where the convolution filter is fixed as a Gaussian filter. The Gaussian-filtered halftoned image was passed through the GCM-based residual subnetwork to output the residual layer.
x ( r b ) = f θ G C M ( x i k g )
where x ( r b ) is the predicted residual layer for base layer generation, and f θ G C M denotes the GCM-based residual subnetwork to be trained. Herein, parentheses in superscripts indicate the predicted values. The standard deviation of the Gaussian filter k g was set to 1 and the filter size was 5 × 5 .
To train f θ G C M , a loss function is defined as follows:
L = 1 m i = 1 M x i ( r b ) x i r b 2
where i denotes a training sample, M is the batch size, and · is the l2-norm. Compared with the additive model, the proposed GCM-based residual subnetwork can narrow the output range of the residual layer.
For the pretrained GCM-based residual subnetwork, the base layer was generated as follows:
x ( b ) = x ( r b ) + x i k g
where x ( r b ) is the output of the pretrained GCM-based residual subnetwork f θ G C M , and x ( b ) is the predicted base layer. This equation indicates that the base layer is the sum of the Gaussian-filtered halftoned image added to the predicted residual layer through the GCM-based residual subnetwork. For reference, the entire architecture, as shown in Figure 4, was not trained. Based on heuristic experiments, it was discovered that the learning of the entire architecture did not yield good results.

2.4. Detail Layer Design

The predicted base layer is the approximation of the Gaussian-filtered original image.
x ( b )   x o k g
As shown in Figure 4, details such as textures and edges were absent in the predicted base layer; however, it contained the low-frequency components of the original image. Therefore, the detail layer to be predicted was designed based on the difference between the original image and the predicted base layer.
x d = x o x o k g x o x ( b )
The predicted base layer x ( b ) was regarded as an approximation of the Gaussian-filtered original image x o . This implies that the detail layer x d   contains textures and edges with small pixel values, and hence the brightness range of the detail layer is narrow. According to the detailed layer design based on the proposed GCM, residual learning can be performed for the detail layer.

2.5. Direct Deblurring Approach

The predicted base layer is the approximation of the Gaussian-filtered original image, as shown in Equation (8). Therefore, conventional image deblurring methods can be considered to directly reconstruct the original image from the predicted base layer. Conventional image deblurring methods can restore missing details by removing the Gaussian blurring of the predicted base layer. Image deblurring problems [28] can be formulated as follows:
a r g   min x x ( b ) x o k g 2 + λ j = 1 2 x o k h , j α
where k h , j indicates high-pass filters such as horizontal and vertical filters. α controls the sparsity, and λ is a constant to weight the regularization term [28]. In general, the motion kernel k g in Equation (10) is unknown; however, a Gaussian filter k g can be used for the motion kernel based on the proposed GCM. Additionally, the motion kernel can be estimated directly from the base layer. This case corresponds to blind image deblurring. It appears that conventional image deblurring can yield good results. However, some issues exist. A comparison between Figure 2 and Figure 4 shows that the predicted base layer differs from the blurred original image. In other words, textures and edges are missing, and noise is generated. In addition, the noises differed from the Gaussian random noise, which has been considered to solve the image deblurring problem. Therefore, conventional image deblurring methods are not suitable for restoring the original image from the predicted base layer.
In another image deblurring approach, deep learning tools are used. More specifically, the DCNN can be trained to transform the predicted base layer to the original continuous-tone image [39].
x ( o ) = f θ D D N ( x ( b ) )
where f θ D D N denotes the direct deblurring network (DDN), and x ( o ) is the reconstructed continuous-tone image. Because the predicted base layer x ( b ) is the Gaussian-blurred version of the original image, f θ D D N is regarded as a deblurring network. Because the predicted base layer has already lost some texture and sharpness, the input halftoned image x i can be used as additional information.

2.6. Proposed Layer Decomposition Learning

In addition to the DDN, as shown in Equation (11), the residual deblurring strategy can be adopted. It is noteworthy that the DDN and residual deblurring network (RDN) were derived from the proposed GCM. In other words, both are the proposed deep-learning architectures. The RDN estimates the detail layer from two types of images, i.e., the input halftoned image and the predicted base layer via residual learning. It appears that the RDN is similar to the conventional PRL [23,25]. However, the significant difference is that the deblurring strategy is adopted in the former. In other words, the predicted base layer is the Gaussian-filtered version of the original image, and the base layer is designed based on the GCM proposed for residual learning. This RDN can provide better performances than the DDN, owing to the effect of residual learning. However, this RDN is restricted in terms of recovering image structures clearly. Hence, a new subnetwork known as the image structure map predictor is incorporated in the proposed SALDL.
Figure 5 shows the entire architecture of the proposed SALDL, which comprises two subnetworks. One is the image structure map predictor (ISMP), and the other is the SARDS. The ISMP transforms the input halftoned image into a Laplacian map, which refers to an image obtained by convolving the original image and the Laplacian filter. An example of the predicted Laplacian map is shown on the right side of Figure 5. Even though the predicted base layer can be input to the image structure map predictor, in this case, the detailed representation is not satisfactorily restored because the predicted base layer has already lost some texture information. As shown in Figure 5, the input halftoned image contains more texture information than the predicted base layer.
The ISMP includes a pretrained subnetwork known as the initial reconstruction subnetwork (IRS). This subnetwork generates the initial reconstructed image from the input halftoned image. Because the input halftoned image is quantized, it is preferable to predict the image structures from the initial reconstructed image than from the halftoned image. In fact, the Laplacian map is the filtered version of the original image, which implies that the Laplacian map can be predicted by convolving the Laplacian filter with the initial reconstructed image. However, the initial reconstructed image differs from the original image; hence, more convolution and ReLU layers are required at the back of the IRS. Based on the experiments, it was confirmed that the accuracy of the Laplacian map decreased when the IRS was not adopted, rendering the predicted detail layer less accurate. Therefore, the IRS is key for increasing the accuracy of the ISMP. As shown in Figure 5, the initial reconstructed image was changed to increase the performance of the ISMP while learning the entire network.
The SARDS requires three input images: the predicted base layer, the Laplacian map, and the input halftoned image. The predicted Laplacian map was stacked on the top of the input halftoned image and the predicted base layer via a concatenation layer. Subsequently, it was input into the SARDS to estimate the detail layer.
x ( d ) = f θ s a r d s ( x ( b ) , x ( l ) , x i ) , x ( l ) = f θ i s m p ( x i )
where f θ s a r d s and f θ i s m p denote the proposed SARDS and ISMP, respectively. x ( l ) denotes the predicted Laplacian map, and x ( d ) denotes the predicted detail layer. In Equation (12), the Laplacian map is predicted from the input halftoned image, not the base layer. Based on experiments, it was discovered that the Laplacian map was not accurately estimated because the base layer contained missing information. The use of the Laplacian map provided subnetwork f θ s a r d s with spatial information regarding areas that were flat, lined, or textured. This information enabled the entire network to be trained by adapting to local image structures. Consequently, the texture representation of the detail layer can be improved and noisy dot patterns on flat areas can be effectively removed. The ISMP can be regarded as a type of attention network, whereas the predicted Laplacian map is in fact a spatial attention feature map.
The multiloss function was used to learn f θ s a r d s and f θ i s m p ; it is expressed as
L = 1 m i = 1 M ω 1 x i ( d ) x i d 2 + ω 2 x i ( l ) x i l 2
where i denotes the training sample, M the batch size, and ω the weight of the two subnetworks. As shown in Equation (12), the accuracy of f θ i s m p affects the accuracy of f θ s a r d s . Therefore, in this study, ω 1 and ω 2 were set to the value of 1.
For the trained f θ s a r d s and f θ i s m p , the final continuous-tone image was generated based on the additive model, i.e., x ( o ) = x ( d ) + x ( b ) . As mentioned in the Introduction, the additive model is not suitable for inverse halftoning. However, by generating a Gaussian-blurred version of the original image, layer decomposition learning based on GCM and SARDS can be applied to inverse halftoning.

3. Experimental Results

The proposed SALDL for inverse halftoning was implemented using MatConvNet [40] and trained with two 2080Ti GPUs on a Windows operating system. To evaluate the proposed method, it was compared with state-of-the-art deep learning methods based on DCNN [37], DDN [39], U-Net [35], and PRL [23,25]. In this study, a Gaussian-blurred halftoned image was used as the base layer in both the DNN and PRL methods to implement Equations (11) and (3), respectively: In other words, the same base layer was used for the pair comparison. This can reveal the effectiveness of the proposed method in recovering image structures, as compared to DDN and PRL. For performance evaluation, the peak signal-to-noise ratio (PSNR) and structure similarity (SSIM) [41] were used to measure the inverse of the MSE in a log space and the structure similarity between two images, respectively. For both the PSNR and SSIM, a higher value indicates higher quality. The source code of the proposed SALDL can be downloaded at https://github.com/cvmllab (accessed on 29 July 2021).

3.1. Training Data Collection

For training, public datasets [36] including General 100, Urban 100, BSDS100, and BSDS200 were used to prepare continuous-tone color images. The total number of continuous-tone color images was 500. General 100, urban 100, and BSDS200 were used for training, whereas BSDS100 was used for validation. The same training and validation sets were used to train all the deep-learning-based methods: the proposed SALDL, PRL, U-net, DDN, and DCNN. The three subnetworks of the GCM-based residual subnetwork, IRS, and SARDS used the same training and validation datasets. For digital halftoning, the continuous-tone color images were converted into grayscale images; subsequently, error diffusion [42] was used to transform the grayscale images into halftoned images. The Floyd–Steinburg filter [1,42] was used for error diffusion. The Laplacian operator was applied to the grayscale images to obtain Laplacian maps. To obtain the training patches, three types of patches were extracted randomly from the grayscale original images, Laplacian maps, and halftoned images. The extracted patch was of size 32 × 32. In this study, grayscale patches were used for training because error diffusion can be easily applied to them. To apply the proposed trained network to color images in the test phase, the color image was first separated into R, G, and B planes. Subsequently, the proposed network was applied to each plane independently.

3.2. Networking Training

All the subnetworks including the GCM-based residual subnetwork, ISMP, IRS, and SARDS were comprised of convolution and ReLU layers. Hereinafter, a pair comprising convolution and ReLU layers is known as a convolution block. In the subnetworks, m filters measuring 5 × 5 × c were used in the convolutional layers. Here, c represents the number of input channels. Table 1 shows the number of filters and channels used in the convolutional layers. In the input layer of the SARDS, c was set to 3 because three input channels (the base layer, the Laplacian map, and the halftoned image) were input to the input layer. The filters were initialized using a random number generator. The number of convolution blocks used in the GCM-based residual subnetwork, IRS, and SARDS was set to 16. The number of convolution blocks used in the ISMP except the IRS was 6. In other words, ISMP uses six more convolution blocks than IRS. To update the convolution filters, the mini-batch gradient descent algorithm [43] was used. The epoch number was 200, and the batch size was 64. Each epoch involved 1000 backpropagation iterations. The learning rate began at 10 5 and decreased linearly every 50 epochs to 10 6 . All loss functions were modeled by the l2 norm.

3.3. Visual Quality Evaluation

In this study, two datasets were tested. One was a small texture dataset, as shown in Figure 6, and the other was BSDS100. Because the proposed method is strong at expressing image structures owing to the use of SARDS, a small texture dataset was prepared to contain various types of image structures, including lines, curves, and regular patterns. The BSDS100 dataset was also tested to verify whether the proposed SALDL could improve the detail representation and dot elimination. Clearly, not all test images were included in the training dataset. Figure 7 shows the experimental results for a small dataset. As shown in the red boxes, the proposed method describes the image structures more accurately. In addition, the overall sharpness of the images was better. In particular, the lines of the pants were restored in more detail and were sharper (as shown in the first row) when using the proposed method. The second row shows more clearly expressed cactus thorns. The third row shows the textures on the palm and the hair accessory in more detail. As shown in the fourth, fifth, sixth, and seventh rows, text including the license plate, rip outline, straw, and Gogh’s eyes, respectively, were restored more clearly. Moreover, as shown in the blue box in the fifth row, the proposed method suppressed noisy dots on flat areas, unlike the case involving the conventional DCNN [37] and U-Net [35] methods. The blue box in the last row shows that the proposed method can reproduce smooth skin tones in the face areas, whereas the face areas reconstructed using other methods appeared rougher and noisier. Figure 8 shows other experimental results for the BSDS100 dataset. Similar effects were observed. In other words, in the red box, sharper curves were restored using the proposed method.
By comparing the proposed method with the DDN/PRL methods, it was verified that the additional use of the ISMP can improve performance for detailed representation and dot elimination. The DDN directly predicts the continuous-tone images from the base layers, as shown in Equation (11). Because the base layers are predicted, some information may be lost. Hence, the flat areas of the reconstructed images appeared slightly noisy, and their sharpness can be further improved. The PRL method uses input halftone images to increase the amount of information for residual learning, as shown in Equation (3). Therefore, the PRL method can provide results with improved image quality, as compared to the DDN method. However, the PRL method lacks image structure representation. In addition, the existing PRL cannot produce satisfactory results from the same base layers. This reveals that the architecture of the existing PRL is not suitable for low-quality base layers. Hence, the proposed SALDL uses the ISMP to identify Laplacian maps from the input halftoned images. Figure 9 shows the Laplacian maps predicted by the ISMP. In this figure, texture lines are detected well, which means that the predicted Laplacian map provides the SARDS with spatial information regarding areas that are flat, lined, or textured. This information enables the proposed SALDL to be adaptive to local image structures. Consequently, the texture representation of the detail layer can be improved, and noisy dot patterns on flat areas can be effectively suppressed. The ISMP can be regarded as a type of attention network, and the predicted Laplacian map is a spatial attention feature map.
Based on Equations (3) and (11), the DDN and PRL methods use the base layers generated using the proposed GCM-based residual subnetwork. The DDN is one of the deep learning architectures proposed for inverse halftoning because it was derived from the GCM proposed to predict Gaussian-blurred images. In the existing PRL methods, no specific models exist for the residual learning of the base layer. In addition, the existing PRL cannot produce satisfactory results from low-quality base layers. To our best knowledge, this study is the first to perform the abovementioned comparison, and the experimental results confirmed that the proposed SALDL can be used as a new deep learning model for inverse halftoning that enables residual learning for both the base and detail layers by incorporating image decomposition into the deep learning framework.
Table 2 and Table 3 show the results of the PSNR and SSIM evaluations for the small texture and BSDS100 datasets, respectively. As expected, the proposed SALDL method demonstrated the best performance among all the methods, and it surpassed the state-of-the-art inverse halftoning methods based on deep learning. This indicates that the proposed image decomposition model is effective in obtaining high-quality continuous-tone images from halftone images. The proposed base layer design, based on the GCM, enables residual learning by narrowing the output brightness range. The structure-aware residual deblurring strategy can remove the blurring of the predicted base layer and restore the image structures effectively. The proposed SALDL is a new PRL for inverse halftoning. By contrast, the PSNR and SSIM of the DDN and PRL were lower than those of the proposed method. This confirmed that the DDN and PRL were restricted in terms of restoring the original images from low-quality base layers. Table 2 and Table 3 show that the average PSNR of the U-net was slightly better than that of the PRL. This implies that the U-net is an extremely effective model for inverse halftoning. In other words, decomposing input halftoned images into multiple resolutions is an extremely effective approach. If the SRDAS and GCM-based residual subnetworks are built similarly to U-net, then the performance of the proposed method may be improved.

4. Conclusions

A new SALDL method for inverse halftoning was proposed. First, a new residual learning method based on the Gaussian convolution model was introduced for base layer generation. Compared to the additive model, which has been used for image denoising and rain removal, this Gaussian convolution model utilizes a statistical distribution in which the image difference between the blurred original image and blurred halftone image with a Gaussian filter can possess a narrow brightness range. Second, a structure-aware residual deblurring strategy was presented. To remove the Gaussian blurring of the base layer and recover the image structures effectively, an image structure map predictor was designed to estimate the image structures from halftone patterns. This image structure map predictor enabled the entire network to be trained adaptively to local image structures; hence, noisy dot patterns on flat areas were suppressed and local image structures such as lines and text were described precisely. The experimental results confirmed that the proposed method surpassed state-of-the-art inverse halftoning methods based on deep learning, such as U-net, DCNN, DDN, and PRL. In addition, it was verified that the proposed image decomposition model was extremely effective in obtaining high-quality continuous-tone images from input halftone images.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C1010405).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The BSD datasets can be found here https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/ (accessed on 29 July 2021).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Donghui, L.; Takuma, K.; Takahiko, H.; Midori, T.; Kaku, S. Texture-aware error diffusion algorithm for multi-level digital halftoning. J. Imaging Sci. Technol. 2020, 64, 50410-1–50410-9. [Google Scholar]
  2. Wang, Y.; Huang, H.; Wang, C.; He, T.; Wang, J.; Nguyen, M.H. GIF2Video: Color dequantization and temporal interpolation of GIF images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1419–1428. [Google Scholar]
  3. Do, H.C.; Cho, B.G.; Chien, S.I.; Tae, H.S. Improvement of low gray-level linearity using perceived luminance of human visual system in PDP-TV. IEEE Trans. Consum. Electron. 2005, 51, 204–209. [Google Scholar]
  4. Kao, W.-C.; Liu, C.-H.; Liou, S.-C.; Tsai, J.-C.; Hou, G.-H. Towards video display on electronic papers. J. Display Technol. 2016, 12, 129–135. [Google Scholar] [CrossRef]
  5. Son, C.-H.; Choo, H. Watermark detection of clustered halftoned images via learned dictionary. Signal Process. 2014, 102, 77–84. [Google Scholar] [CrossRef]
  6. Lieberman, D.J.; Allebach, J.P. A dual interpretation for direct binary search and its implications for tone reproduction and texture quality. IEEE Trans. Image Process. 2000, 9, 1950–1963. [Google Scholar] [CrossRef]
  7. Son, C.H. Inverse halftoning based on sparse representation. Opt. Lett. 2012, 37, 2352–2354. [Google Scholar] [CrossRef] [PubMed]
  8. Guo, J.-M.; Prasetyo, H.; Wong, K. Halftoning-based block truncation coding image restoration. J. Vis. Commun. Image Represent. 2016, 35, 193–197. [Google Scholar] [CrossRef]
  9. Guo, J.-M. Watermarking in dithered halftone images with embeddable cells selection and inverse halftoning. Signal Process. 2008, 88, 1496–1510. [Google Scholar] [CrossRef]
  10. Son, C.-H.; Lee, K.; Choo, H. Inverse color to black-and-white halftone conversion via dictionary learning and color mapping. Inf. Sci. 2015, 299, 1–19. [Google Scholar] [CrossRef]
  11. Kopf, J.; Lischinski, D. Digital reconstruction of halftoned color comics. ACM Trans. Graph. 2012, 31, 140. [Google Scholar] [CrossRef]
  12. Remez, T.; Litany, O.; Bronstein, A. A picture is worth a billion bits: Real-time image reconstruction from dense binary threshold pixels. In Proceedings of the IEEE International Conference on Computational Photography, Evanston, IL, USA, 13–15 May 2016; pp. 1–9. [Google Scholar]
  13. Zhang, E.; Zhang, Y.; Duan, J. Color inverse halftoning method with the correlation of multi-color components based on extreme learning machine. Appl. Sci. 2019, 9, 841. [Google Scholar] [CrossRef] [Green Version]
  14. Kite, T.D.; Damera-Venkata, N.; Evans, B.L.; Bovik, A.C. A fast, high-quality inverse halftoning algorithm for error diffused halftones. IEEE Trans. Image Process. 2000, 9, 1583–1592. [Google Scholar] [CrossRef]
  15. Stevenson, R. Inverse halftoning via MAP estimation. IEEE Trans. Image Process. 1997, 6, 574–583. [Google Scholar] [CrossRef] [PubMed]
  16. Foi, A.; Katkovnik, V.; Egiazarian, K.; Astola, J. Inverse halftoning based on the anisotropic LPA-ICI deconvolution. In Proceedings of the International TICSP Workshop on Spectral Methods and Multirate Signal Processing, Vienna, Austria, 11–12 September 2004; pp. 49–56. [Google Scholar]
  17. Son, C.-H.; Choo, H. Iterative inverse halftoning based on texture-enhancing deconvolution and error-compensating feedback. Signal Process. 2013, 93, 1126–1140. [Google Scholar] [CrossRef]
  18. Freitasa, P.G.; Fariasb, M.C.Q.; Araújo, A.P.F. Enhancing inverse halftoning via coupled dictionary training. Signal Process. Image Commun. 2016, 49, 1–8. [Google Scholar] [CrossRef]
  19. Son, C.-H.; Choo, H. Local learned dictionaries optimized to edge orientation for inverse halftoning. IEEE Trans. Image Process. 2014, 23, 2542–2556. [Google Scholar]
  20. Zhang, Y.; Zhang, E.; Chen, W.; Chen, Y.; Duan, J. Sparsity-based inverse halftoning via semi-coupled multi-dictionary learning and structural clustering. Eng. Appl. Artif. Intell. 2018, 72, 43–53. [Google Scholar] [CrossRef]
  21. Jimenez, F.P.; Miyatake, M.N.; Medina, K.T.; Perez, G.S.; Meana, H.P. An inverse halftoning algorithms based on neural networks and atomic functions. IEEE Latin Am. Trans. 2017, 15, 488–495. [Google Scholar] [CrossRef]
  22. Hou, X.; Qiu, G. Image companding and inverse halftoning using deep convolutional neural networks. arXiv 2017, arXiv:1707.00116. [Google Scholar]
  23. Xia, M.; Wong, T.-T. Deep inverse halftoning via progressively residual learning. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; pp. 523–539. [Google Scholar]
  24. Son, C.-H. Inverse halftoning through structure-aware deep convolutional neural networks. Signal Process. 2020, 173, 1–7. [Google Scholar] [CrossRef] [Green Version]
  25. Yuan, J.; Pan, C.; Zheng, Y.; Zhu, X.; Qin, Z.; Xiao, Y. Gradient-guided residual learning for inverse halftoning and image expanding. IEEE Access 2020, 8, 50995–51007. [Google Scholar] [CrossRef]
  26. Kang, L.-W.; Lin, C.-W.; Fu, Y.-H. Automatic single-image-based rain steaks removal via image decomposition. IEEE Trans. Image Process. 2012, 21, 1742–1755. [Google Scholar] [CrossRef] [PubMed]
  27. Lim, J.; Heo, M.; Lee, C.; Kim, C.-S. Contrast enhancement of noisy low-light images based on structure-texture-noise decomposition. J. Vis. Commun. Image Represent. 2017, 45, 107–121. [Google Scholar] [CrossRef]
  28. Son, C.-H.; Zhang, X.-P. Layer-based approach for image pair fusion. IEEE Trans. Image Process. 2016, 25, 2866–2881. [Google Scholar] [CrossRef] [PubMed]
  29. Starck, J.-L.; Fadili, J.; Murtagh, F. The undecimated wavelet decomposition and its reconstruction. IEEE Trans. Image Process. 2007, 16, 297–309. [Google Scholar] [CrossRef] [Green Version]
  30. Paris, S.; Hasinoff, S.W.; Kautz, J. Local laplacian filters: Edge-aware image processing with a Laplacian pyramid. Commun. ACM 2015, 53, 81–91. [Google Scholar] [CrossRef]
  31. Stark, J.L.; Elad, M.; Donoho, D.L. Image decomposition via the combination of sparse representation and a variational approach. IEEE Trans. Image Process. 2005, 14, 2675–2681. [Google Scholar] [CrossRef] [Green Version]
  32. Li, Y.; Tan, R.T.; Guo, X.; Lu, J.; Brown, M.S. Single image rain steak decomposition using layer priors. IEEE Trans. Image Process. 2017, 26, 3874–3885. [Google Scholar] [CrossRef]
  33. Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color Images. In Proceedings of the IEEE International Conference on Computer Vision, Bombay, India, 4–7 January 1998; pp. 839–846. [Google Scholar]
  34. Li, S.; Kang, X.; Hu, J. Image fusion with guided filtering. IEEE Trans. Image Process. 2013, 27, 2864–2875. [Google Scholar]
  35. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
  36. Lai, W.; Huang, J.; Ahuja, N.; Yang, M. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
  37. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [Green Version]
  38. Fu, X.; Huang, J.; Zeng, D.; Huang, Y.; Ding, X.; Paisley, J. Removing rain from single images via a deep detail network. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1–9. [Google Scholar]
  39. Hradiš, M.; Kotera, J.; Zemčík, P.; Šroubek, F. Convolutional neural networks for direct text deblurring. In Proceedings of the British Machine Vision Conference, Swansea, UK, 7–10 September 2015; pp. 6.1–6.13. [Google Scholar]
  40. Vedaldi, A.; Lenc, K. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the ACM international conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 689–692. [Google Scholar]
  41. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structure similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
  42. Kwon, J.-H.; Son, C.-H.; Cho, Y.-H.; Ha, Y.-H. Text-enhanced error diffusion using multiplicative parameters and error scaling factor. J. Imaging Sci. Technol. 2006, 50, 437–447. [Google Scholar] [CrossRef]
  43. Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Figure 1. Concept of image decomposition based on proposed SALDL for inverse halftoning.
Figure 1. Concept of image decomposition based on proposed SALDL for inverse halftoning.
Applsci 11 07006 g001
Figure 2. Original, halftoned, blurred original, and blurred halftoned images (left to right).
Figure 2. Original, halftoned, blurred original, and blurred halftoned images (left to right).
Applsci 11 07006 g002
Figure 3. Histogram comparison of two types of residual layers generated using the conventional additive model and the proposed GCM.
Figure 3. Histogram comparison of two types of residual layers generated using the conventional additive model and the proposed GCM.
Applsci 11 07006 g003
Figure 4. GCM-based residual subnetwork for base layer generation.
Figure 4. GCM-based residual subnetwork for base layer generation.
Applsci 11 07006 g004
Figure 5. Proposed SALDL for inverse halftoning.
Figure 5. Proposed SALDL for inverse halftoning.
Applsci 11 07006 g005
Figure 6. Small texture dataset.
Figure 6. Small texture dataset.
Applsci 11 07006 g006
Figure 7. Experimental results for a small texture dataset: halftoned images, images reconstructed using DCNN [37], images reconstructed using U-net [35], images reconstructed using DDN [39], images reconstructed using PRL [23,25], images reconstructed using the proposed SALDL method, and original images (left to right).
Figure 7. Experimental results for a small texture dataset: halftoned images, images reconstructed using DCNN [37], images reconstructed using U-net [35], images reconstructed using DDN [39], images reconstructed using PRL [23,25], images reconstructed using the proposed SALDL method, and original images (left to right).
Applsci 11 07006 g007
Figure 8. Experimental results for BSDS100.
Figure 8. Experimental results for BSDS100.
Applsci 11 07006 g008
Figure 9. Laplacian maps predicted using ISMP.
Figure 9. Laplacian maps predicted using ISMP.
Applsci 11 07006 g009
Table 1. Number of filters and channels used in convolutional layers.
Table 1. Number of filters and channels used in convolutional layers.
LayersInput LayerLast Convolution LayerConvolution Block (Number)
Subnetworks
GCM-based residual subnetwork c = 1 ,   m = 64 c = 64 ,   m = 1 c = 64 ,   m = 64 ( 16 )
IRS c = 1 ,   m = 64 c = 64 ,   m = 1 c = 64 ,   m = 64 (16)
ISMP including IRS c = 1 ,   m = 64 c = 64 ,   m = 1 c = 64 ,   m = 64 (22)
SARDS c = 3 ,   m = 64 c = 64 ,   m = 1 c = 64 ,   m = 64 (16)
Table 2. Performance evaluation for the small texture dataset.
Table 2. Performance evaluation for the small texture dataset.
MethodsProposed MethodU-Net [35]DCNN [37]DDN [39]PRL [23,25]
Test
Images
PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
125.9430.83525.5630.81525.1810.80824.7470.78325.6390.818
225.9160.91325.5900.90425.3950.90025.1390.89425.6320.905
326.0130.87825.2470.85724.8100.84624.3080.82625.3790.859
429.9510.90129.2620.87328.6080.85428.9970.86428.8430.866
531.9740.98131.9010.97931.8180.97931.0640.97831.4880.979
626.3730.90925.8200.89925.3700.89024.9740.8825.8140.896
731.6010.98131.2480.97931.0840.97930.5220.97731.0690.979
828.6590.96927.9920.96627.2750.95926.6980.95327.8230.963
931.1450.95330.5390.94830.2370.94929.5170.93330.4490.942
1030.2810.93929.6010.93029.2140.92828.7210.91429.5810.929
1124.8530.85924.0980.83223.7380.82823.3880.80524.2580.839
1225.6540.81624.7180.75124.441 0.73924.2740.74124.9040.771
1333.3810.96633.3020.96433.2820.96432.4260.95932.7770.961
1429.9010.84629.6310.84029.7530.83229.2530.82229.6450.833
1527.1190.90426.8780.89726.7550.89426.4220.8926.8410.898
AVG.28.584 0.91028.0930.89627.7970.89027.363 0.881 28.009 0.896
Table 3. Performance evaluation for BSDS100 dataset.
Table 3. Performance evaluation for BSDS100 dataset.
MethodsProposed MethodU-Net [35]DCNN [37]DDN [39]PRL [23,25]
MetricsPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
Scores28.6510.89628.2620.88628.0290.88427.5180.86828.2060.885
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Son, C.-H. Layer Decomposition Learning Based on Gaussian Convolution Model and Residual Deblurring for Inverse Halftoning. Appl. Sci. 2021, 11, 7006. https://doi.org/10.3390/app11157006

AMA Style

Son C-H. Layer Decomposition Learning Based on Gaussian Convolution Model and Residual Deblurring for Inverse Halftoning. Applied Sciences. 2021; 11(15):7006. https://doi.org/10.3390/app11157006

Chicago/Turabian Style

Son, Chang-Hwan. 2021. "Layer Decomposition Learning Based on Gaussian Convolution Model and Residual Deblurring for Inverse Halftoning" Applied Sciences 11, no. 15: 7006. https://doi.org/10.3390/app11157006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop