**Yongchae Kim and Hiroyuki Kudo \***

Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Tennoudai 1-1-1, Tsukuba 305-8573, Japan; acevip12@hotmail.com

**\*** Correspondence: kudo@cs.tsukuba.ac.jp

Received: 16 May 2020; Accepted: 18 June 2020; Published: 20 June 2020

**Abstract:** We propose a new class of nonlocal Total Variation (TV), in which the first derivative and the second derivative are mixed. Since most existing TV considers only the first-order derivative, it suffers from problems such as staircase artifacts and loss in smooth intensity changes for textures and low-contrast objects, which is a major limitation in improving image quality. The proposed nonlocal TV combines the first and second order derivatives to preserve smooth intensity changes well. Furthermore, to accelerate the iterative algorithm to minimize the cost function using the proposed nonlocal TV, we propose a proximal splitting based on Passty's framework. We demonstrate that the proposed nonlocal TV method achieves adequate image quality both in sparse-view CT and low-dose CT, through simulation studies using a brain CT image with a very narrow contrast range for which it is rather difficult to preserve smooth intensity changes.

**Keywords:** image reconstruction; computed tomography; compressed sensing; nonlocal total variation; sparse-view CT; low-dose CT; proximal splitting; row-action; brain CT image

### **1. Introduction**

Nonlocal Total Variation (TV) [1–6] was proposed as an improved version of ordinary TV. Nonlocal TV can use a weighting function (e.g., the weight of nonlocal means filter) by taking the intensity difference between the pixel pair into account, and can obtain higher image quality than the ordinary TV that uses only pairs of adjacent pixels.

Since G. Gilboa and S. Osher (2009) [5] proposed nonlocal operator, nonlocal TV has been widely applied to image reconstruction problems in sparse-view CT and low-dose CT [1–4].

H. Kim et al. (2016) [2] applied nonlocal TV to sparse-view CT and showed that nonlocal TV improves image quality over ordinary TV and incorporating the reweighted L1 norm into nonlocal TV further improves tissue contrast and structural details. Following this, K. Kim et al. (2017) [3] applied nonlocal TV to low-dose CT and showed nonlocal TV is effective for low-dose noise (Poisson noise).

D. Lv et al. (2019) [4] proposed a hybrid prior distribution (called NLTG prior) that combines nonlocal TV with Gaussian distribution. Additionally, they showed the possibility of applying NLTG prior to a large class of image reconstruction problems, especially when reference images were available.

However, the most existing nonlocal TV studies [2–4] are based on the first-order derivative, and still contain the staircase artifact problem as a potential drawback. Since the first-order derivative is too sensitive to the pixel values, even linear intensity changes are detected as false edges, which leads to staircase artifacts in the same way as local TV does [7–9].

On the other hand, higher-order derivatives (the second-order or more) possesses a potential risk that, as the order of differentiation is larger, its ability enhance image edges is smaller leading to an image blurring problem.

In previous studies, to overcome the disadvantage of using only the first-order or only the second-order, Bredies et al. (2010) [10] proposed Total Generalized Variation (TGV) that involves and balances higher-order derivatives and showed impressive results in the denoising problem. After that work, Ranftl et al. (2014) [11] proposed a nonlocal version of TGV.

Our proposed method is also a combined method of the first and second order derivatives. We define the regularization term as a weighted sum of two terms, where the first term is the ordinary nonlocal TV based on the first-order and the second term is based on the second-order. Additionally, in this paper, we introduce a newly discovered idea called nonlocal Total K-Split Variation, which is based on the second-order derivative using nonlocal regularization. This idea allows us to preserve smooth intensity changes well.

Furthermore, to accelerate the iterative algorithm associated with the proposed nonlocal TV, we propose a specially designed proximal splitting based on Passty's framework. In this proximal splitting, as the number of dividing the cost function into small subfunctions increases, the faster convergence is achieved [12–14]. The structure of final iterative algorithm is row-action type with respect to both the data-fidelity term and the regularization term [15–17], which converges to a minimizer very quickly [18–22].

Finally, in order to demonstrate the performances of combining the first and second order derivatives in reconstructed images, we use a brain CT image, in which a very narrow contrast range is used to display the image, where it is very difficult to preserve smooth intensity changes. Also, simulation studies are performed for both the sparse-view CT and the low-dose CT. We demonstrate that the proposed nonlocal TV method achieves adequate image quality within a small number of iterations.

### **2. Methodology**

### *2.1. Problem Definition*

We define the following unconstrained cost function:

$$\begin{array}{c} \text{argmin} \mathbf{J}(\stackrel{\rightarrow}{\mathbf{x}}) = f(\stackrel{\rightarrow}{\mathbf{x}}) + \boldsymbol{\mu}(\stackrel{\rightarrow}{\mathbf{x}}) = \left\| \boldsymbol{A}\stackrel{\rightarrow}{\mathbf{x}} - \stackrel{\rightarrow}{\mathbf{b}} \right\|\_{2}^{2} + \beta \boldsymbol{\omega} \left\| \boldsymbol{W}\stackrel{\rightarrow}{\mathbf{x}} \right\|\_{1'}^{1} \tag{1} \end{array} \tag{1}$$

where *f* → *x* = *A*<sup>→</sup> *x* − → *b* 2 <sup>2</sup> is the data-fidelity term, and *u* → *x* = βω *W*<sup>→</sup> *x* 1 <sup>1</sup> is the regularization term, and *<sup>A</sup>* = \* a*ij*+ is the *I* × *J* system matrix, β is the hyper-parameter to control regularization strength, and ω is the weight of regularization term, and *W* is the sparsifying transform to make *W*<sup>→</sup> *x* sparse. Image reconstruction is an inverse problem to recover the image of attenuation coefficients <sup>→</sup> *<sup>x</sup>* = *x*1, *x*2, ... , *xJ T* from the measured projection data <sup>→</sup> *b* = (*b*1, *b*2, ... , *bI*) *T*.

In the sparse-view CT [23,24], by using the projection data corresponding to less than 100 directions (the conventional CT uses 1000–2000 projection data), the equation *A*<sup>→</sup> *<sup>x</sup>* <sup>=</sup> <sup>→</sup> *b* becomes severely underdetermined, i.e., the dimension *J* of unknowns <sup>→</sup> *x* is larger than the dimension *I* of measurements <sup>→</sup> *b* (*I* < *J*). In this case, the regularization term acts to avoid the ill-posed problem by introducing the prior knowledge that most components of the vector *W*<sup>→</sup> *x* are close to 0.

On the other hand, in the low-dose CT, the equation *A*<sup>→</sup> *<sup>x</sup>* <sup>=</sup> <sup>→</sup> *b* becomes inconsistent due to Poisson noise <sup>→</sup> *n* (*A*<sup>→</sup> *x* − → *b* = <sup>→</sup> *n*). In this case, the regularization term helps to reduce the effect of Poisson noise → *n* by a smoothing.

First, we begin by the modified anisotropic nonlocal TV based on the first-order derivative expressed as

$$\text{Nonlocal } TV = \mathfrak{u}^{TV} \left( \overrightarrow{\mathbf{x}} \right) = \beta t \sum\_{j=1}^{I} \sum\_{j' \in \Omega} \omega\_{j j'} \| \mathbf{x}\_{j} - \mathbf{x}\_{j'} \|, \tag{2}$$

where *xj* is the intensity in the pixel *j*, and *xj* is the intensity in a distant pixel *j* , and ω*jj* is the weight of smoothing assigned for each pixel pair *xj*, *xj* .

Next, we describe a newly discovered regularization term based on the second-order derivative called nonlocal Total K-split Variation (TKV). The main idea is to consider more derivatives around *xj* and *xj* as

$$Nomlocal\ TKV = u^{TKV} \left(\overleftarrow{\mathbf{x}}\right) = \frac{\beta(1-t)}{8} \sum\_{j=1}^{J} \sum\_{j' \in \Omega} \sum\_{k=1}^{8} \omega\_{j'l'} \left| \left(\mathbf{x}\_{j} - \mathbf{x}\_{j\_{k}}\right) - \left(\mathbf{x}\_{j'} - \mathbf{x}\_{j'\_{k}}\right) \right| \tag{3}$$

where *xjk* is the adjacent pixel value of *xj*, *xj <sup>k</sup>* is the adjacent pixel value of *xj* . We remark that the TKV term is divided into a sum of eight terms dependent of the direction to take the pixel difference as shown in Figure 1.

**Figure 1.** Definition of the pixel location in the proposed regularization term corresponding to *k* = 1, 2, 3, ··· , 7, 8.

In the proposed method, we assume that the reconstructed image <sup>→</sup> *x* is close to piecewise-polynomial of first order. Under this assumption, our proposed regularization term is designed to include both the first and second order derivatives as

$$\mathbf{u}\left(\overrightarrow{\mathbf{x}}\right) = \beta \sum\_{j=1}^{J} \sum\_{j' \in \Omega} \omega\_{j'l} \left| t \left| \mathbf{x}\_j - \mathbf{x}\_{j'} \right| + \frac{(1-t)}{8} \sum\_{k=1}^{8} \left| \left( \mathbf{x}\_j - \mathbf{x}\_{j\mathbf{i}} \right) - \left( \mathbf{x}\_{j'} - \mathbf{x}\_{j'\_k} \right) \right| \right|, \left( 0 < t < 1 \right), \tag{4}$$

where the proposed regularization term is a combination of two terms (*u* → *x* = *<sup>u</sup>TV*<sup>→</sup> *x* + *<sup>u</sup>TKV*<sup>→</sup> *x* ). Additionally *t* is the trade-off parameter between nonlocal TV and TKV. If *t* is large, the reconstructed image becomes closer to that of nonlocal TV. If *t* is small, the reconstructed image becomes closer to that of nonlocal TKV. To match the strength of the first term into that of the second term, we divide the

second-order derivative based nonlocal TKV into 8 directions. Figure 1 shows the location of pixels appearing in *<sup>u</sup>TV*<sup>→</sup> *x* and *uTKV*<sup>→</sup> *x* for *k* = 1, 2, 3, ··· , 7, 8.

### *2.2. Accelerated Algorithm Using the Proximal Splitting with Passty's Framework*

To accelerate the iterative algorithm to minimize the cost function, we propose a specially designed proximal splitting with Passty's framework. The ordinary proximal splitting is able to minimize a cost function consisting of a sum of only two component terms, where the proximity operator corresponding to each subfunction is applied alternately [12,13]. On the other hand, by using Passty's framework which is not well-known in the image reconstruction community, a cost function can be divided into a number of much simpler subfunctions [14]. This results in considerable benefits for optimization problems appearing in CT reconstruction. By applying the proximity operator corresponding to each subfunction alternately, the proposed algorithm possesses a form of row-action type [15–17], which converges to a minimizer very quickly [18–22].

We begin by a brief review the proximal splitting with Passty's framework. Let us consider a convex minimization problem formulated as

$$\underset{\stackrel{\text{argmin}}{\rightleftarrows}}{\text{argmin}} J(\stackrel{\text{\textquotedblleft}}{\text{x}}),\tag{5}$$

where *J* → *x* is a lower semi-continuous (lsc) convex function.

The proximity operator corresponding to the function *J* → *x* is defined as

$$\overrightarrow{\mathbf{x}} = \operatorname{prox}\_{al}(\overrightarrow{\mathbf{z}}) \equiv \operatorname\*{argmin}\_{\overrightarrow{\mathbf{x}}} (f(\overrightarrow{\mathbf{x}}) + \frac{1}{2\alpha} \|\overrightarrow{\mathbf{x}} - \overrightarrow{\mathbf{z}}\|\_2^2),\tag{6}$$

where α is the parameter called step-size. We note that *J* → *x* can be a non-differentiable function such as component terms of TV or nonlocal TV. The proximity operator is a non-expansive mapping such that its fixed-points <sup>→</sup> *x* satisfying <sup>→</sup> *x* = *prox*α*<sup>J</sup>* → *x* coincides with a minimizer of *J* → *x* for any α > 0. Therefore, the minimization problem of *J* → *x* can be solved by the iterative formula expressed as → *x* (*n*+1) = *prox*α*<sup>J</sup>* → *x* (*n*) (i.e., proximal algorithm). Next, we are going to explain the proximal splitting with Passty's framework.

**[Passty's framework]** Let us consider the case where *J* → *x* can be divided into a sum of subfunctions as:

$$J(\overrightarrow{\dot{\mathbf{x}}}) = \sum\_{i=1}^{I} J\_i(\overrightarrow{\dot{\mathbf{x}}}).\tag{7}$$

The iterative algorithm can be constructed by applying the proximity operator corresponding to each subfunction *Ji* → *x* (*i* = 1, 2, ... , *I*) as below

$$\stackrel{\rightarrow}{\mathbf{x}}^{(n+1)} = \operatorname{prox}\_{a^{(n)}l\_1} \cdots \operatorname{prox}\_{a^{(n)}l\_2} \cdot \operatorname{prox}\_{a^{(n)}l\_1} \Big(\stackrel{\rightarrow}{\mathbf{x}}^{(n)}\Big). \tag{8}$$

Furthermore, let us consider the case where *Ji* → *x* is a sum of two subfunctions, like our cost function including data-fidelity term and regularization term as

$$J\_i(\overrightarrow{\dot{\mathbf{x}}}) = f\_i(\overrightarrow{\dot{\mathbf{x}}}) + u(\overrightarrow{\dot{\mathbf{x}}}).\tag{9}$$

The update can be constructed by applying two operators corresponding to each subfunction alternately as below

$$\stackrel{\rightarrow}{\mathbf{x}}^{(n+1)} = \operatorname{prox}\_{\alpha^{(n)}\mu} \cdot \operatorname{prox}\_{\alpha^{(n)}f} \stackrel{\rightarrow}{\mathbf{x}}^{(n)} \text{.}\tag{10}$$

Finally, we show in Algorithm 1 the optimization model applied to this paper:


In Passty's framework, by increasing the number to divide the cost function into smaller subfunctions, the better convergence can be expected. In this paper, this division is performed as follows. First, the data-fidelity term is divided as shown in Equation (7) in such a way that each subfunction *fi* → *x* contains only a single term corresponding to projection data *bi*. Therefore, the final algorithm can be designed in the form of a row-action type algorithm such as ART method.

We mention that *u* → *x* can also be divided into a sum of many subfunctions. In this paper, we divide *u* → *x* as finely as possible similarly to the case of the data-fidelity term. This idea leads to a significant benefit to simplify the processing of nonlocal TV+TKV term as well as improving the convergence speed. With respect to the regularization term *u* → *x* , we perform a division described in Section 2.3.2.

### *2.3. Optimization*

In this section, we focus on how to divide the cost function and how to derive the resulting iterative algorithm.

### 2.3.1. Update the Data-Fidelity Term

The data-fidelity term can be divided into *I* subfunctions as below

$$f(\vec{x}) = \left\| A\vec{x} - \vec{b} \right\|\_2^2 = \sum\_{l=1}^{l} f\_l(\vec{x}) = \sum\_{l=1}^{l} (\vec{a}\_l^T \vec{x} - b\_l)^2,\tag{11}$$

where *I* is the number of projection data, and <sup>→</sup> *a <sup>i</sup>* is *i*-th row vector of the system matrix *A*, and *bi* is *i*-th component of projection data. Furthermore, we note that *fi* → *x* is a subfunction corresponding to the data-fidelity term.

The minimization problem for the data-fidelity term can be defined as

$$\overrightarrow{\mathbf{x}}^{(n,j+1)} = \operatorname{prox}\_{\mathbf{a}^{(n)}f\_i} \left( \overrightarrow{\mathbf{x}}^{(n,j)} \right) = \operatorname\*{argmin}\_{\overrightarrow{\mathbf{x}}} \left\{ \left( \overrightarrow{a}\_i^T \overrightarrow{\mathbf{x}} - b\_i \right)^2 + \frac{1}{2\alpha^{(n)}} \| \overrightarrow{\mathbf{x}} - \overrightarrow{\mathbf{x}}^{(n,j)} \|\_{2}^{2} \right\}. \tag{12}$$

By introducing a slack variable , the above minimization problem for each subfunction *fi* → *x* can be converted into the constrained minimization below

$$\min\_{\vec{x}, \vec{x}} \ (\hat{z} - b\_t)^2 + \frac{1}{2\alpha^{(\mathfrak{n})}} \left\| \vec{\chi} - \vec{\chi}^{(\mathfrak{n}, t)} \right\|\_2^2 \quad \text{s.t. } \vec{z} = \vec{a}\_t^T \vec{\chi}. \tag{13}$$

The Lagrange function can be defined by

$$L\_{f\_i}(\overrightarrow{\mathbf{x}}, z, \lambda) = (z - b\_i)^2 + \frac{1}{2\alpha^{(n)}} \|\overrightarrow{\mathbf{x}} - \overrightarrow{\mathbf{x}}^{(n, i)}\|\_2^2 + \lambda \left(z - \overrightarrow{a}\_i^T \overrightarrow{\mathbf{x}}\right). \tag{14}$$

where λ is the Lagrange multiplier called the dual variable.

Finally, by optimizing the Lagrange function, we obtain the following expression of row-action type iteration.

$$\overrightarrow{\mathbf{x}}^{(n;i+1)} = \overrightarrow{\mathbf{x}}^{(n;i)} + \alpha^{(n)} \frac{b\_i - \overrightarrow{a}\_i^T \overrightarrow{\mathbf{x}}^{(n;i)}}{1/2 + \alpha^{(n)} \|\overrightarrow{a}\_i\|\_2^2} \overrightarrow{a}\_{i\prime}, \\ \alpha^{(n)} = \frac{a\_0}{1 + \varepsilon n^{\prime}} \{i \in \mathcal{R}\_{[l]}\},\tag{15}$$

(α<sup>0</sup> : initial value of step size, <sup>ε</sup> : deceleration rate of step-size, *<sup>R</sup>*[*I*] : ordered subsets

where α(*n*) is the step-size parameter to control the convergence, and *n* is the number of main iterations. After updating all elements of projection data, *n* is increased by 1. The step-size α(*n*) is diminished gradually to zero as the iteration proceeds (i.e. diminishing step-size rule). In Passty's framework, it is known that this diminishing contributes to ensuring an exact convergence to a minimizer, thereby, avoiding the so-called limit cycle problem. Furthermore, we introduce a random-access order of projection data *R*[*I*] to enable a fast convergence within 20–30 iterations [19,20]. The mathematical detail to derive the update formula in Equation (15) has been already described in our previous studies [19–22].

### 2.3.2. Update the Regularization Term

We describe how to divide the regularization term. The modified anisotropic nonlocal TV possesses the following structure and is called L1 based group LASSO here. The L1 based group LASSO possesses a very simple structure, where each absolute value term can be considered a group element.

$$\mu^{TV}\{\overrightarrow{\mathbf{x}}\} = \beta t \sum\_{j=1}^{I} \sum\_{j' \in \Omega} \omega\_{j j'} \|\mathbf{x}\_{j} - \mathbf{x}\_{j'}\| = \beta t \sum\_{\mathcal{S}=1}^{G} \left[\omega\_{j j \circ} \|\overrightarrow{\mathbf{x}}\, \overrightarrow{\mathbf{x}}\, -\overrightarrow{\mathbf{x}}\, \gamma\right] \|\_{\mathcal{S}}^{1} = \beta t \sum\_{\mathcal{S}=1}^{G} \omega\_{\mathcal{S}}^{TV}\{\overrightarrow{\mathbf{x}}\}, \quad (j = \mathbf{g}). \tag{16}$$

where *g* is the group index, and the group index *g* corresponds to the pixel index *j*, and the TV term can be divided into *G* groups (*G* subfunctions). Furthermore, the group itself becomes a subfunction *uTV g* → *x* .

In the case of the TKV term, it can be divided into (*G* × 8) subfunctions as below

$$u^{TKV}(\overrightarrow{\mathbf{x}}) = \frac{\beta(1-t)}{8} \sum\_{k=1}^{8} \sum\_{g=1}^{G} \left[ \omega\_{\overrightarrow{j}\overrightarrow{\gamma}} \left\| \left( \overrightarrow{\mathbf{x}}\_{\overrightarrow{j}} - \overrightarrow{\mathbf{x}}\_{\overrightarrow{k}} \right) - \left( \overrightarrow{\mathbf{x}}\_{\overrightarrow{j}'} - \overrightarrow{\mathbf{x}}\_{\overrightarrow{j}'\_{k}} \right) \right\|\_{1}^{1} \right]\_{\mathbf{g}} = \frac{\beta(1-t)}{8} \sum\_{k=1}^{8} \sum\_{g=1}^{G} u^{TKV}\_{\overrightarrow{g}} \left( \overrightarrow{\mathbf{x}} \right). \tag{17}$$

The detailed structure of the TV term is shown in Figure 2. Among the elements of the group, the pixel *j* is common and distant pixel *j* has different values for each other.

**Figure 2.** Raster scanning during the update (*j* = 1, 2, 3, ... ).

The sequential update is related to raster scanning.

In the TV term, when assuming that *xj* and *xj* are updated simultaneously, *xj* is updated sequentially *J* times, and *xj* is updated once respectively.

In the case of TKV term, when assuming that *xj*, *xjk* , *xj* , *xj <sup>k</sup>* are updated simultaneously, *xj* is updated sequentially (*J* × 8) times, and *xj* is updated sequentially eight times. Following this, *xjk* and *xj <sup>k</sup>* (the adjacent pixels) are updated once respectively.

**[Update the TV term]** First, we consider updating the pixel *j*.

> @ The proximity operator for a subfunction can be defined as follows

$$\underset{\overrightarrow{\mathbf{x}}\_{\vec{j}}}{\text{argmin}} (\boldsymbol{\omega}\_{\vec{j}\vec{\boldsymbol{\gamma}}} \|\overrightarrow{\mathbf{x}}\_{\vec{j}} - \overrightarrow{\mathbf{x}}\_{\vec{j}'}^{(n)}\|\_{1}^{1} + \frac{1}{2\alpha^{(n)}} \|\overrightarrow{\mathbf{x}}\_{\vec{j}} - \overrightarrow{\mathbf{x}}\_{\vec{j}}^{(n,\vec{j}')}\|\_{2}^{2}),\tag{18}$$

where *x* (*n*) *j* is the current updated solution as a constant approximation, which is the image updated from the data-fidelity term. The above L1 norm minimization problems can be solved with the following soft-thresholding.

$$\begin{aligned} \, \, \, \, S\_{\nabla} \Big( \mathbf{x}\_{j}^{(n,\prime)} \big) &= \mathbf{x}\_{j}^{(n,\prime)} - \begin{cases} \qquad \nabla\_{\prime} & \tau > \nabla \\ \qquad - \nabla\_{\prime} & \tau < -\nabla \\ \qquad \tau \tau \nu & (otherwise) \end{cases}, \\\, \Big( \, \, \tau\_{TV} = (\mathbf{x}\_{j}^{(n,\prime)} - \mathbf{x}\_{j\prime}^{(n)}), \nabla = \alpha^{(n)} \beta t \omega\_{j\prime\prime} \rangle \,. \end{aligned} \tag{19}$$

where *S*<sup>∇</sup> *x* (*n*,*j* ) *j* is the soft-thresholding function. The solution (*S*<sup>∇</sup> *x* (*n*,*j* ) *j* ) corresponding to otherwise is simply *x* (*n*) *j* .

For better convergence, although it is possible to update only pixel *j*, we update the pixel *j* and *j* simultaneously.

*Sensors* **2020**, *20*, 3494

For updating the pixel *j* and *j* , we further divide a subfunction *uTV g* → *x* into two subfunctions as below

$$\begin{split} \boldsymbol{\mu}^{TV}\left(\overrightarrow{\mathbf{x}}\right) &= \boldsymbol{\beta}t \sum\_{\mathcal{S}'=1}^{\mathcal{G}} \boldsymbol{\mu}\_{\mathcal{S}'}^{TV}\left(\overrightarrow{\mathbf{x}}\right) = \boldsymbol{\beta}t \sum\_{\mathcal{S}'=1}^{\mathcal{G}} \left[\boldsymbol{\alpha}\_{\mathcal{I}\dot{\mathcal{Y}}'} \|\overrightarrow{\mathbf{x}}\right\|\_{\mathcal{I}} - \overrightarrow{\mathbf{x}}\,\_{\mathcal{I}'}\left\|\boldsymbol{1}\right\|\_{\mathcal{S}} \\ &= \boldsymbol{\beta}t \sum\_{\mathcal{S}'=1}^{\mathcal{G}} \left[\left\|\boldsymbol{\alpha}\_{\mathcal{I}\dot{\mathcal{Y}}'} \left\|\overrightarrow{\mathbf{x}}\right\|\_{\mathcal{I}} \overrightarrow{\mathbf{x}}\right\|\_{\mathcal{I}}^{1}\right]\_{\mathcal{X}\_{\mathcal{I}}} + \left[\boldsymbol{\alpha}\_{\mathcal{I}\dot{\mathcal{Y}}'} \left\|\overrightarrow{\mathbf{x}}\right\|\_{\mathcal{I}} \overrightarrow{\mathbf{x}}\right]\_{\mathcal{I}}^{1} \bigg|\_{\mathcal{I}} \\ &= \boldsymbol{\beta}t \sum\_{\mathcal{S}'=1}^{\mathcal{G}} \left[p\left(\overrightarrow{\mathbf{x}}\,\_{\mathcal{I}}\right) + p\left(\overrightarrow{\mathbf{x}}\,\_{\mathcal{I}'}\right)\right]\_{\mathcal{S}'} . \end{split} \tag{20}$$

where *p* → *x j* = [ ]*xj* , *p* → *x j* = [ ]*xj* .

The proximity operator for each subfunction (*prox*α(*n*)*<sup>p</sup>* → *x* (*n*,*j* ) *j* , *prox*α(*n*)*<sup>p</sup>* → *x* (*n*) *j* ) can be defined as follows

$$\begin{aligned} \underset{\stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}}\_{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}}}{\arg\min} (\boldsymbol{\omega}\_{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}} \left\| \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}}\_{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}} - \frac{\stackrel{\scriptstyle \mathbf{x}(\boldsymbol{n}^{\prime}\_{\boldsymbol{f}})}{2} + \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}\_{\boldsymbol{f}}^{\prime\prime}}}{2} \right\|\_{1}^{1} + \frac{1}{2\alpha^{(n)}} \left\| \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}}\_{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}} - \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}\_{\boldsymbol{f}}^{(n,\prime)}} \right\|\_{2}^{2}),\\ \underset{\stackrel{\scriptstyle \mathbf{x}}{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}}}{\arg\min} (\boldsymbol{\omega}\_{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}} \left\| \cdots \xrightarrow{\scriptstyle \mathbf{x}}{\mathbf{x}\_{\boldsymbol{f}}} + \frac{\stackrel{\scriptstyle \mathbf{x}(\boldsymbol{n}^{\prime}\_{\boldsymbol{f}})}{2} + \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}\_{\boldsymbol{f}}^{(n)}}}{2} \right\|\_{1}^{1} + \frac{1}{2\alpha^{(n)}} \left\| \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}}\_{\stackrel{\scriptstyle \mathbf{y}}{\boldsymbol{\beta}}} - \stackrel{\scriptstyle \mathbf{x}}{\mathbf{x}\_{\boldsymbol{f}}^{(n)}} \right\|\_{2}^{2}),\end{aligned} \tag{21}$$

where *x* (*n*,*j* ) *<sup>j</sup>* and *x* (*n*) *j* are the current updated solution as a constant approximation, which is the image updated from the data-fidelity term.

Finally, the above L1 norm minimization problems can be solved with the following soft-thresholding.

$$\begin{split} &S\mathbf{v}\big(\mathbf{x}\_{j}^{(n,\prime)}\big) = \mathbf{x}\_{j}^{(n,\prime)} - \left( \begin{array}{c} \nabla \ \prime \\ \ \cdot \end{array} \begin{array}{c} \tau > \nabla \\ \ \cdot \end{array} \right) \\ &S\_{\nabla}\big(\mathbf{x}\_{j}^{(n)}\big) = \mathbf{x}\_{j}^{(n)} + \left( \begin{array}{c} \nabla \ \prime \\ \ \cdot \end{array} \begin{array}{c} \tau > \nabla \\ \ \cdot \end{array} \right) \\ &\tau\_{TV} \quad \tau > \nabla \end{array} \tag{22} \\ &\tau\_{TV} = \left( \mathbf{x}\_{j}^{(n,j')} - \mathbf{x}\_{j'}^{(n)} \right) / 2, \quad \nabla = \mathbf{a}^{(n)} \beta t \boldsymbol{a} \boldsymbol{\nu}\_{jj} \text{'}. \end{split} \tag{23}$$

where *S*<sup>∇</sup> *x* (*n*,*j* ) *j* and *S*<sup>∇</sup> *x* (*n*) *j* are soft-thresholding functions. By dividing a subfunction and updating for each variable as shown in Equations (20)–(22), the solution (*S*<sup>∇</sup> *x* (*n*,*j* ) *j* , *S*<sup>∇</sup> *x* (*n*) *j* ) corresponding to otherwise becomes the average of *x* (*n*,*j* ) *<sup>j</sup>* and *x* (*n*) *j* ((*x* (*n*,*j* ) *<sup>j</sup>* + *x* (*n*) *j* )/2). Compared to Equation (19), this weak average that occurs otherwise can reduce the error in convergence and improve the stability of convergence.

**[Update the TKV term]** We update the pixel *j*, *jk*, *j* , *j <sup>k</sup>* simultaneously. For updating the pixel *j*, *jk*, *j* , *j k* , we further divide a subfunction *uTKV g* → *x* into four subfunctions as below

*<sup>u</sup>TKV*<sup>→</sup> *x* = <sup>β</sup>(1−*t*) 8 8 *k*=1 *G g*=1 *uTKV g* → *x* = <sup>β</sup>(1−*t*) 8 8 *k*=1 *G g*=1 ω*jj* 3 3 3 → *<sup>x</sup> <sup>j</sup>* <sup>−</sup> <sup>→</sup> *x jk* − → *x j* <sup>−</sup> <sup>→</sup> *x j k* 3 3 3 1 1 *g* = <sup>β</sup>(1−*t*) 8 8 *k*=1 *G g*=1 ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎣ ⎡ ⎢⎢⎢⎢⎢⎢⎣ ω*jj* 3 3 3 3 3 3 ( → *x <sup>j</sup>*− → *x j k* )−( → *x j* − → *x j k* ) 4 3 3 3 3 3 3 1 1 ⎤ ⎥⎥⎥⎥⎥⎥⎦ *xj* + ⎡ ⎢⎢⎢⎢⎢⎢⎣ ω*jj* 3 3 3 3 3 3 ( → *x <sup>j</sup>*− → *x j k* )−( → *x j* − → *x j k* ) 4 3 3 3 3 3 3 1 1 ⎤ ⎥⎥⎥⎥⎥⎥⎦ *xj k* + ⎡ ⎢⎢⎢⎢⎢⎢⎣ ω*jj* 3 3 3 3 3 3 ( → *x <sup>j</sup>*− → *x j k* )−( → *x j* − → *x j k* ) 4 3 3 3 3 3 3 1 1 ⎤ ⎥⎥⎥⎥⎥⎥⎦ *xj* + ⎡ ⎢⎢⎢⎢⎢⎢⎣ ω*jj* 3 3 3 3 3 3 ( → *x <sup>j</sup>*− → *x j k* )−( → *x j* − → *x j k* ) 4 3 3 3 3 3 3 1 1 ⎤ ⎥⎥⎥⎥⎥⎥⎦ *xj k* ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ *g* = <sup>β</sup>(1−*t*) 8 8 *k*=1 *G g*=1 *q* → *x j* + *q* → *x jk* + *q* → *x j* + *q* → *x j k* . *g* , (23)

where *q* → *x j* = []*xj* , *q* → *x jk* = []*xj k* , *q* → *x j* = []*xj* , *q* → *x j k* = []*xj k* . (*n*,*j* ,*k*) 

The proximity operator for each subfunction (*prox*α(*n*)*<sup>q</sup>* → *x j* , *prox*α(*n*)*<sup>q</sup>* → *x* (*n*) *jk* , *prox*α(*n*)*<sup>q</sup>* → *x* (*n*,*k*) *j* , *prox*α(*n*)*<sup>q</sup>* → *x* (*n*) *j k* ) can be defined as follows

argmin → *x j* (ω*jj* 3 3 3 3 3 3 3 → *<sup>x</sup> <sup>j</sup>* <sup>−</sup> <sup>3</sup> → *x* (*n*,*j* ,*k*) *<sup>j</sup>* <sup>+</sup><sup>→</sup> *x* (*n*) *j <sup>k</sup>* <sup>+</sup><sup>→</sup> *x* (*n*,*k*) *j* − → *x* (*n*) *j k* 4 3 3 3 3 3 3 3 1 1 + <sup>1</sup> <sup>2</sup>α(*n*) → *<sup>x</sup> <sup>j</sup>* <sup>−</sup> <sup>→</sup> *x* (*n*,*j* ,*k*) *j* 2 2), argmin → *x j k* (ω*jj* 3 3 3 3 3 3 3 <sup>−</sup> <sup>→</sup> *x jk* + → *x* (*n*,*j* ,*k*) *<sup>j</sup>* +3 → *x* (*n*) *j <sup>k</sup>* − → *x* (*n*,*k*) *j* <sup>+</sup><sup>→</sup> *x* (*n*) *j k* 4 3 3 3 3 3 3 3 1 1 + <sup>1</sup> <sup>2</sup>α(*n*) → *<sup>x</sup> jk* <sup>−</sup> <sup>→</sup> *x* (*n*) *jk* 2 2), argmin → *x j* (ω*jj* 3 3 3 3 3 3 3 <sup>−</sup> <sup>→</sup> *x j* + → *x* (*n*,*j* ,*k*) *j* − → *x* (*n*) *j <sup>k</sup>* <sup>+</sup><sup>3</sup> → *x* (*n*,*k*) *j* <sup>+</sup><sup>→</sup> *x* (*n*) *j k* 4 3 3 3 3 3 3 3 1 1 + <sup>1</sup> <sup>2</sup>α(*n*) → *x j* <sup>−</sup> <sup>→</sup> *x* (*n*,*k*) *j* 2 2), argmin → *x j k* (ω*jj* 3 3 3 3 3 3 3 → *x j k* − − → *x* (*n*,*j* ,*k*) *<sup>j</sup>* <sup>+</sup><sup>→</sup> *x* (*n*) *j <sup>k</sup>* <sup>+</sup><sup>→</sup> *x* (*n*,*k*) *j* +3 → *x* (*n*) *j k* 4 3 3 3 3 3 3 3 1 1 + <sup>1</sup> <sup>2</sup>α(*n*) → *x j <sup>k</sup>* <sup>−</sup> <sup>→</sup> *x* (*n*) *j <sup>k</sup>* 2 2), (24)

where <sup>→</sup> *x* (*n*,*j* ,*k*) *<sup>j</sup>* , → *x* (*n*) *jk* , → *x* (*n*,*k*) *j* , → *x* (*n*) *j <sup>k</sup>* are the current updated solution as a constant approximation, which is the image updated from the TV term.

Finally, the above L1 norm minimization problems can be solved with the following soft-thresholding.

$$\begin{split} \operatorname{S\_{V}}\Big(\mathbf{x}\_{j}^{(n,\mathcal{J}',k)}\big) &= \mathbf{x}\_{j}^{(n,\mathcal{J}',k)} - \begin{cases} \nabla, & \tau > \nabla \\ & -\nabla, \ \tau < -\nabla \\ \tau\_{\rm{TK}V} & \text{(otherwise)} \end{cases} \\ \operatorname{S\_{V}}\Big(\mathbf{x}\_{j}^{(n)}\big) &= \mathbf{x}\_{j}^{(n)} + \begin{pmatrix} \nabla, & \tau > \nabla \\ & -\nabla, & \tau < -\nabla \\ & -\nabla, & \tau < -\nabla \\ \tau\_{\rm{TK}V} & \text{(otherwise)} \end{pmatrix} \\ \operatorname{S\_{V}}\big(\mathbf{x}\_{j'}^{(n,k)}\big) &= \mathbf{x}\_{j'}^{(n,k)} + \begin{cases} \mathbf{V}, & \tau > \nabla \\ & -\nabla, & \tau < -\nabla \\ & \tau\_{\rm{TK}V} & \text{(otherwise)} \end{cases} \\ \operatorname{S\_{V}}\big(\mathbf{x}\_{j}^{(n)}\big) &= \mathbf{x}\_{j'}^{(n)} - \begin{pmatrix} \mathbf{x}\_{\top}^{(n)} \\ \mathbf{V}, & \tau > \nabla \end{pmatrix} \\ \operatorname{\tau\_{\rm{TK}V}} &= \begin{pmatrix} \mathbf{x}\_{j}^{(n,\mathcal{J})} - \mathbf{x}\_{j}^{(n)} - \mathbf{x}\_{j'}^{(n,k)} + \mathbf{x}\_{j'}^{(n)} \end{pmatrix} / 4, \quad \operatorname{\nabla } \alpha^{(n)}\beta(1-t)a\_{jj'}/8 \end{split}$$

where *S*<sup>∇</sup> *x* (*n*,*j* ,*k*) *j* , *S*<sup>∇</sup> *x* (*n*) *jk* , *S*<sup>∇</sup> *x* (*n*,*k*) *j* , *S*<sup>∇</sup> *x* (*n*) *j k* are soft-thresholding function.

### 2.3.3. The Weight

In this paper, we used the weight of nonlocal means filter [25] as

$$\omega\_{\hat{\boldsymbol{j}}\hat{\boldsymbol{j}}'} = \frac{\exp(-\max(\left\|B(\mathbf{x}\_{\hat{\boldsymbol{j}}}) - B(\mathbf{x}\_{\hat{\boldsymbol{j}}'})\right\|\_{2}^{2} - 2\sigma^{2}, 0)/h^{2})}{\sum\_{\hat{\boldsymbol{j}}' \in \Omega} \exp(-\max(\left\|B(\mathbf{x}\_{\hat{\boldsymbol{j}}}) - B(\mathbf{x}\_{\hat{\boldsymbol{j}}'})\right\|\_{2}^{2} - 2\sigma^{2}, 0)/h^{2})},\tag{26}$$

where *B*(*xj*) − *B*(*xj* ) 2 <sup>2</sup> means the average Euclidean distance between patches (*B xj* , *B xj* ) centered in an interest pixel *xj* and a distant pixel *xj* .

Theoretically, the weight must be a fixed value as a hyper-parameter of the regularization term. However, there have been previous studies showing that reweighting at each iteration contributes to better image quality [2,9]. Additionally, the larger the size of weight (Ω), the better the performance of removing artifacts or noise. As long as the computer is capable of processing, we recommend increasing the size of the weight. However, if the size of the weight is too large, the calculation cost will increase enormously as compared with the image quality improvement. Therefore, it is important to decide the size of the weight while paying attention to cost performance considering the level of artifacts or noise. Figure 3 shows the cost performance of changing the size of weight.

(**a**) Image quality improvement (PSNR) (**b**) Computation time [ms]

**Figure 3.** Cost performance of changing the size of weight. (**a**) Image quality improvement (PSNR), (**b**) Computation time [ms].

Next, we describe the proper estimation of the weight. If the ground truth image is known, the weight can be calculated from the ground truth image. However, in the image reconstruction, only the projection data is given and the reconstructed image and the weight must be estimated simultaneously from the projection data. Therefore, we construct the optimization by alternating the estimation of the reconstructed image and the weight.

We show the reweighting process of optimization including the data-fidelity term and regularization term:

$$\begin{cases} \text{For } i = 1, 2, 3, \dots, I \text{ ( $I$  is the number of projection data)}\\ \text{( $1$ ) Update the data term by Equation (15):} & \vec{x}^{(n, l+1)} = \operatorname{prox}\_{a^{(n)}f\_{l}}(\vec{x}^{(n,l)});\\ \text{if} \{\!(i \bmod S) == 0\} \\ \text{( $2$ ) Calculate the weight } \boldsymbol{\omega}\_{j\'} \text{ from } \vec{x}^{(n,l+1)};\\ \text{(3) Update the TV term by Equation (22):} \\ \text{(4) Update the TRV term by Equation (25):} \\ \text{ $\vec{x}^{(n+1,l)} \gets \vec{x}^{(n,l+1)}$ } \end{cases} \tag{27}$$

where the weight is calculated once as common to nonlocal TV and TKV and the weight is calculated from the image updated from the data-fidelity term. The span parameter *S* determines how often regularization is performed in the main iteration *n*. If *S* is small, many regularization updates are performed in one iteration. Theoretically, the smaller the *S*, the more accurate the convergence. However, since nonlocal TV has a large amount of computational complexity, it is desirable to determine an appropriate value of *S*.

Figure 4 shows how the size of weight (Ω) influences image quality and computation time.

**Figure 4.** Demonstration of how the size of weight (Ω) influences image quality and computation time.

### **3. Experimental Results**

We performed simulation studies using a brain CT image. The reason behind using the brain image is as follows. In the brain CT imaging, the reconstructed images are shown with a compressed gray scale range much larger compared to the other CT imaging like chest imaging or abdominal CT imaging. Therefore, preserving smooth intensity changes and avoiding the staircase artifacts are much important in the brain case. Additionally, simulation studies were performed for both the sparse-view CT (the number of projection data was 64) and the low-dose CT (the number of photons was <sup>3</sup> <sup>×</sup> <sup>10</sup>6). The reconstructed image consisted of 512 <sup>×</sup> 512 pixels, where the pixel size was 0.0585 (cm2). We compressed the range showing the reconstructed images to [7.82, 62.30] HU, where this contrast range was determined based on the contrast range used in clinical brain CT imaging. To evaluate image quality, standard RMSE, PSNR, SSIM values were used as metrics. The number of iterations in image reconstructions was 20 for nonlocal TV, TKV, and TV+TKV, which was determined by the fact that changes in image were small enough with this iteration number. We also showed the reconstructed images by the standard Filtered Back-Projection (FBP), and differences in image quality by changing values of the hyper-parameter *t* (i.e., the trade-off parameter between nonlocal TV (first derivative) term and the TKV (second derivative) term). The ground truth image and the FBP reconstructions are shown in Figure 5. The reconstructed images in the case of sparse-view CT are shown in Figure 6. In Figure 7, we show the used brain CT image with three display gray-scale ranges, from which we observe that the staircase artifacts are severe when the range of display gray-scale range is small. The reconstructed images in the case of low-dose CT are shown in Figure 8. In Figures 9–11, we show convergence properties of our iterative algorithm based on Passty's proximal splitting framework. In Figures 12 and 13, to show the effect of acceleration by Passty's proximal splitting, we incorporated the TV+TKV term into SIRT (simultaneous iterative reconstruction technique) which is a non-row-action method (a type of the standard iterative algorithm) and compared this with row-action based on our proposed nonlocal TV+TKV. From these figures, it can be observed that our algorithm converged very quickly. It is well-known that the standard iterative algorithms such as Chambolle–Pock [26] and proximal gradient algorithms require several hundreds of iteration up to the convergence. The benefit of our iterative algorithm mainly originates from the fact that our algorithm is of row-action type.

**Figure 5.** (**a**) Ground truth, (**b**) FBP with 64 projection data with no noise, (**c**) FBP with 256 projection data with the number of photon counts 3 <sup>×</sup> 106. All images are displayed with the same window of [7.82, 62.30] HU.

**Figure 6.** The reconstructed images of sparse-view CT (64 projection data with no noise). (**a**) Ground truth, (**b**) Nonlocal TV (*t* = 1.0), (**c**) Nonlocal TKV (*t* = 0.0), (**d**) Nonlocal TV+TKV (*t* = 0.3) were compared. All images are displayed with the same window of [7.82, 62.30] HU.

**Figure 7.** Demonstration of appearance of the staircase artifacts with various gray-scale ranges in displaying the brain CT image. (**a**) Window Width [−346.27, 416.40] HU, (**b**) Window Width [−128.36, 198.49] HU, (**c**) Window Width [7.82, 62.30] HU.

**Figure 8.** The reconstructed images of low-dose CT (256 projection data and the number of photon counts 3×106). (**a**) Ground truth, (**b**) Nonlocal TV (*<sup>t</sup>* = 1.0), (**c**) Nonlocal TKV (*<sup>t</sup>* = 0.0), (**d**) Nonlocal TV+TKV (*t* = 0.3) were compared. All images are displayed with the same window of [7.82, 62.30] HU.

(**a**) Sparse-view CT (**b**) Low-dose CT

**Figure 9.** RMSE for each iteration. (**a**) Sparse-view CT, (**b**) Low-dose CT.

(**a**) Sparse-view CT (**b**) Low-dose CT

**Figure 10.** PSNR for each iteration. (**a**) Sparse-view CT, (**b**) Low-dose CT.

(**a**) Sparse-view CT (**b**) Low-dose CT

**Figure 11.** SSIM for each iteration. (**a**) Sparse-view CT, (**b**) Low-dose CT.

**Figure 12.** The reconstructed images of sparse-view CT (64 projection data with no noise). SIRT nonlocal TV+TKV and row-action accelerated nonlocal TV+TKV (our proposed method) were compared. All images are displayed with the same window of [7.82, 62.30] HU.

**Figure 13.** The reconstructed images of low-dose CT (256 projection data and the number of photon counts 3 <sup>×</sup> 106). SIRT nonlocal TV+TKV and row-action accelerated nonlocal TV+TKV (our proposed method) were compared. All images are displayed with the same window of [7.82, 62.30] HU.

### **4. Discussion**

The experimental results Table 1 showed that the reconstructed image by nonlocal TV+TKV was closest to the ground truth image, with good RMSE, PSNR and SSIM values. Furthermore, no isolated points caused by outliers of the soft-thresholding, which often appear in the TV-class reconstruction methods, were visible. In the sparse-view CT, the result of nonlocal TV (*t* = 1.0) using only the first-order derivative was of very high-contrast, but the staircase artifacts appeared in the smooth intensity changes. In other words, the region with small intensity changes was over-smoother by the regularization as if the oil-painting. In the low-dose CT, the result of nonlocal TV showed isolated points in the region closer to the center of the image. In both the sparse-view CT and the low-dose CT, nonlocal TKV (*t* = 0.0) using only the second-order derivative was able to preserve fine soft tissues well including textures and low-contrast objects, however, it suffered from the blurring in the edge parts. This is because, by using a large weight in the second-order derivative, the threshold value (τ*TKV*) of the soft-thresholding operation becomes very small (i.e., τ*TV* τ*TKV*).


