**1. Introduction**

As realistic media are widespread in various image processing areas, image compression is one of the key technologies to enable real-time applications with limited network bandwidth. While image compression techniques, such as joint photographic experts group (JPEG) [1], web picture [2], and high-efficiency video coding main still picture [3], can achieve significant compression performances for efficient image transmission and storage [4], they lead to undesired compression artifacts due to lossy coding because of quantization. These artifacts generally affect the performance of image restoration methods in terms of super-resolution [5–10], contrast enhancement [11–14], and edge detection [15–17].

Reduction methods for compression artifacts were initially studied by developing a specific filter inside the compression process [18]. Although these approaches can efficiently remove ringing artifacts [19], the improvement in image regions is limited at high frequencies. Examples of such approaches include deblocking-oriented approaches [20,21], wavelet transforms [22,23], and shape-adaptive discrete cosine transforms [24]. Recently, artifacts reduction (AR) networks using deep learning have been developed with various deep neural networks (DNNs), such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and generative adversarial networks (GANs). Because CNN [25] can efficiently extract feature maps with deep and

**Citation:** Lee, Y.; Park, S.-h.; Rhee, E.; Kim, B.-G.; Jun, D. Reduction of Compression Artifacts Using a Densely Cascading Image Restoration Network. *Appl. Sci.* **2021**, *11*, 7803. https://doi.org/10.3390/app11177803

Academic Editor: Oscar Reinoso García

Received: 30 June 2021 Accepted: 23 August 2021 Published: 25 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

cascading structures, CNN-based artifact reduction (AR) methods can achieve visual enhancement in terms of peak signal-to-noise ratio (PSNR) [26], PSNR including blocking effects (PSNR-B) [27,28], and structural similarity index measures (SSIM) [29].

Despite the developments of AR, most CNN-based approaches tend to design the heavy network architecture by increasing the number of network parameters and operations. Because it is difficult to deploy such heavy models on hand-held devices operated on low complexity environments, it is necessary to design the lightweight AR networks. In this paper, we propose a lightweight CNN-based artifacts reduction model to reduce the memory capacity as well as network parameters. The main works of this study are summarized as follows:


The remainder of this paper is organized as follows: in Section 2, we review previous studies related to CNN-based artifact reduction methods. In Section 3, we describe the proposed method. Finally, in Sections 4 and 5, we present the experimental results and conclusions, respectively.

#### **2. Related Works**

Due to the advancements in deep learning technologies, research of low-level computer vision, such as super-resolution (SR) and image denoising, has been combined with a variety of CNN architectures to provide higher image restoration than that of conventional image processing. Dong et al. proposed an artifact reduction convolutional neural network (ARCNN) [30], which consists of four convolutional layers and trains an end-to-end mapping from a compressed image to a reconstructed image. After the advent of ARCNN, Mao et al. [31] proposed a residual encoder–decoder network, which conducts encoding and decoding processes with symmetric skip connections in stacking convolutional and deconvolutional layers. Chen et al. [32] proposed a trainable nonlinear reaction diffusion, which is simultaneously learned from training data through a loss-based approach with all parameters, including filters and influence functions. Zhang et al. [33] proposed a denoising convolutional neural network (DnCNN), which is composed of a combination of 17 convolutional layers with a rectified linear unit (ReLU) [34] activation function and batch normalization for removing white Gaussian noise. Cavigelli et al. [35] proposed a deep CNN for image compression artifact suppression, which consists of 12 convolutional layers with hierarchical skip connections and a multi-scale loss function.

Guo et al. [36] proposed a one-to-many network, which is composed of many stacked residual units, with each branch containing five residual units and the aggregation subnetwork comprising 10 residual units. Each residual unit uses batch normalization, ReLU activation function, and convolutional layer twice. The architecture of residual units is found to improve the recovery quality. Tau et al. [37] proposed a very deep persistent memory network with a densely recursive residual architecture-based memory block that adaptively learns the different weights of various memories. Dai et al. [38] proposed a variable-filter-size residual-learning CNN, which contains six convolutional layers and concatenates variable-filter-size convolutional layers. Zhang et al. [39] proposed a dualdomain multi-scale CNN with an auto-encoder, dilated convolution, and discrete cosine


Although most of the aforementioned methods demonstrate better AR performance, they tend to possess more complicated network structures on account of the large number of network parameters needed and heavy memory consumption. Table 1 lists the properties of the various AR networks and compares their advantages and disadvantages.


**Table 1.** Properties among the artifact reduction networks.

For the network component, a residual network [45] was designed for shortcut connections to simplify identity mapping, and outputs were added to the outputs of the stacked layers. A densely connected convolutional network [46] directly connects all layers with one another based on equivalent feature map sizes. The squeeze-and-excitation (SE) network [47] is composed of global average pooling and a 1 × 1 convolutional layer. These networks use the weights of previous feature maps, and such weights are applied to previous feature maps to generate the output of the SE block, which can be provided to subsequent layers of the network. In this study, we propose an AR network to combine with those networks [45–47] for better image restoration performance than the previous methods.
