**3. Proposed Method**

#### *3.1. Overall Architecture of DCRN*

Figure 1 shows the overall architecture of the proposed DCRN to remove compression artifacts caused by JPEG compression. The DCRN consists of the input layer, a densely cascading feature extractor, a channel attention block, and the output layer. In particular, the densely cascading feature extractor contains three densely cascading blocks to exploit the intermediate feature maps within sequential dense networks. In Figure 1, *W* × *H* and *C* are the spatial two-dimensional filter size and the number of channels, respectively. The convolution operation of the *i*-th layer is denoted as *Hi* and calculates the output feature maps (*Fi*) from the previous feature maps (*Fi*−1), as shown in Equation (1):

$$F\_i = H\_i(F\_{i-1}) = \delta(\mathcal{W}\_i \* F\_{i-1} + B\_i),\tag{1}$$

where *δ*, *Wi*, *Bi*, and ∗ represent the parametric ReLU function as an activation function, filter weights, biases, and the notation of convolution operation, respectively. After extracting the feature maps of the input layer, densely cascading feature extractor generates *F*5,

as expressed in Equation (2). As shown in Figure 2, a densely cascading (DC) block has two convolutional layers, five dense layers, and a bottleneck layer. To train the network effectively and reduce overfitting, we designed dense layers that consist of a variable number of channels. Dense layers 1 to 4 consist of 16 channels and the final dense layer consists of 64 channels. The DC block operation *HDC i*is presented in Equation (2):

$$\underbrace{\begin{aligned} \text{Weist} \\ \text{Test} \\ \text{\$\underbrace{\text{\text{\textquotedblleft}I}\_{\text{in}}}\_{\text{\textquotedblleft}I} \end{\text{\textquotedblright}} \underbrace{\begin{aligned} \text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblright} \\ \text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblright} \\ \text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblright} \end{aligned}}\_{\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblright}} \end{aligned}}^{\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblright}\text{\textquotedblright}}\text{\textquotedbl\\\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedblleft}\text{\textquotedbl}}\text{\textquotedbl}\text{\textqu$$

$$F\_3 = H\_3^{DC}(F\_2) = H\_3^{DC}(H\_2^{DC}(H\_1^{DC}(F\_0)).\tag{2}$$

**Figure 1.** Overall architecture of the proposed DCRN. Symbol '+' indicates the element-wise sum.

**Figure 2.** The architecture of a DC block.

Then, each DC block output is concatenated with the output of the input layer feature map operations. After concatenating both the output feature maps from all DC blocks and the input layer, the bottleneck layer calculates *F*5 to reduce the number of channels of *F*4, as in Equation (3):

$$F\mathfrak{s} = H\mathfrak{s}(F\_4) = H\mathfrak{s}(\left[F\_3, F\_2, F\_1, F\_0\right]).\tag{3}$$

As shown in Figure 3, a channel attention (CA) block performs the global average pooling (GAP) followed by two convolutional layers and the sigmoid function after the output from the densely cascading feature extractor is passed to it. The CA block can discriminate the more important feature maps, and it assigns different weights to each feature map in order to adapt feature responses. After generating *F*6 through the CA block, an output image is generated from the element-wise sum between the skip connection (*F*0) and the feature maps (*F*6).

**Figure 3.** The architecture of a CA-block. '*σ*' and ' ⊗' indicate sigmoid function and channel-wise product, respectively.
