**Input and Initialization:**

Training data **<sup>X</sup>** ∈ R*N*×*L*, iteration number *<sup>r</sup>*, initial value *<sup>λ</sup><sup>i</sup>* = 0. **Output:**

Sparse coefficients **y**, and thresholding values *λ<sup>i</sup>*


For j=1:*L*

Compute all the possible values for *f*(*μi*) + *g*(*μi*) by

$$\begin{aligned} f(\boldsymbol{\upmu}\_i^T \mathbf{X}\_{j+1}) &= f(\boldsymbol{\upmu}\_i^T \mathbf{X}\_j) + \boldsymbol{\upmu}\_{i(j+1)}^2 \\ g(\boldsymbol{\upmu}\_i^T \mathbf{X}\_{j+1}) &= ||\overline{\mathbf{y}}\_i - \boldsymbol{\upmu}\_i^T \mathbf{X} + \boldsymbol{\uptext{sgn}}(\boldsymbol{\upupmu}\_i^T \mathbf{X}) \boldsymbol{\uplambda}\_i||\_2^2 - \beta\_{k+1} \end{aligned}$$

where *<sup>β</sup>k*+<sup>1</sup> = *<sup>β</sup><sup>k</sup>* + [*yi*(*k*+1) − *<sup>ψ</sup><sup>T</sup> <sup>i</sup>* **<sup>x</sup>***k*+<sup>1</sup> + *sgn*(*ψ<sup>T</sup> <sup>i</sup>* **x***k*+1)*λi*] 2, *<sup>β</sup>*<sup>1</sup> = [*yi*<sup>1</sup> − *<sup>ψ</sup><sup>T</sup> <sup>i</sup>* **<sup>x</sup>**<sup>1</sup> + *sgn*(*ψ<sup>T</sup> <sup>i</sup>* **x**1)*λi*] 2. Denote them as a vector *ν*.

End for


**End for**

#### *4.2. Dictionary Pair Update Phase*

To obtain **Ψ**, we solve the following problem with all other variables fixed:

$$\hat{\Psi} = \operatorname\*{arg\,min}\_{\Psi} \|\Psi - \mathcal{S}\_{\lambda}(\mathbf{Y}^T \mathbf{X})\|\_{F}^{2} + \frac{\gamma\_{3}}{\gamma\_{1}} \|\Psi - \mathbf{F}^{-1} \Phi\|\_{F}^{2} \tag{32}$$

Such problem is a highly nonlinear optimization due to the definition of S*λ*. Here we solve **Ψ** columnwisely by updating each column of **Ψ**.

For each *ψ<sup>i</sup>* , we solve the following subproblem:

$$\hat{\boldsymbol{\Psi}}\_{i} = \underset{\boldsymbol{\Psi}\_{i}}{\text{arg min}} \, \|\overline{\mathbf{y}}\_{i} - \mathcal{S}\_{\lambda\_{i}}(\boldsymbol{\Psi}\_{i}^{T}\mathbf{X})\|\_{2}^{2} + \frac{\gamma\_{3}}{\gamma\_{1}} \|\boldsymbol{\Psi}\_{i} - \mathbf{F}^{-1}\boldsymbol{\Phi}\_{i}\|\_{2}^{2} \tag{33}$$

We denote <sup>J</sup>*<sup>i</sup>* and <sup>J</sup><sup>ˆ</sup> *<sup>i</sup>* as the indices set as before. Set the elements of **y***<sup>i</sup>* corresponding to the indices Jˆ *<sup>i</sup>* to be zeros and denote the new vector as **z***i*. This operation leads to a consequence that *ψ<sup>T</sup> <sup>i</sup>* **<sup>X</sup>**J<sup>ˆ</sup> *<sup>i</sup>*≈0. Then we solve the following quadratic optimization problem that is easy to solve with least squares.

$$\hat{\boldsymbol{\Psi}}\_{i} = \operatorname\*{arg\,min}\_{\boldsymbol{\Psi}\_{i}} \|\boldsymbol{\Xi}\_{i} - \boldsymbol{\Psi}\_{i}^{T}\boldsymbol{\mathsf{X}}\|\_{2}^{2} + \frac{\gamma\_{3}}{\gamma\_{1}} \|\boldsymbol{\Psi}\_{i} - \boldsymbol{\mathsf{F}}^{-1}\boldsymbol{\Phi}\_{i}\|\_{2}^{2} \tag{34}$$

The optimization problem to pursue **Φ** is formulated as

$$\boldsymbol{\hat{\Phi}} = \underset{\boldsymbol{\Phi}}{\arg\min} \|\mathbf{X} - \boldsymbol{\Phi}\mathbf{Y}\|\_F^2 + \gamma\_3 \|\mathbf{Y} - \mathbf{F}^{-1}\boldsymbol{\Phi}\|\_F^2 \tag{35}$$

$$\text{s.t.} \sqrt{A} \le \delta\_{\Phi} \le \sqrt{B} \tag{36}$$

where the frame operator **F** is given by **ΦΦ***<sup>T</sup>* and **F**−<sup>1</sup> is defined as Equation (22). The target function then becomes

$$\|\|\mathbf{X} - \Phi\mathbf{Y}\|\|\_F^2 + \eta\_3 \|\|\mathbf{Y} - \frac{2}{A+B}(2\mathbf{I} - \frac{2\Phi\Phi^T}{A+B})\Phi\|\|\_F^2 \tag{37}$$

which is denoted by *h*(**Φ**). We apply the gradient descent method to unconstraint version of Problem (35) and then project the solution to the feasible space. The gradient is given by a very complicated form as follows

$$\nabla h(\boldsymbol{\Phi}) = (\mathbf{X} - \boldsymbol{\Phi}\mathbf{Y})\mathbf{Y}^T - \gamma\_3 \{ \frac{4}{a} h(\boldsymbol{\Phi}) + \frac{4}{a^2} [\boldsymbol{\Phi}\boldsymbol{\Phi}^T h(\boldsymbol{\Phi}) + \boldsymbol{\Phi}h(\boldsymbol{\Phi})^T \boldsymbol{\Phi} + h(\boldsymbol{\Phi})\boldsymbol{\Phi}\boldsymbol{\Phi}^T] \}. \tag{38}$$

In order to reduce the complexity, the gradient can also be computed with the fixed **F** calculated in the previous step of the ADM. Then at the *k*-th iteration, the gradient can be written as

$$
\nabla h(\boldsymbol{\Phi}^{k}) = \left(\mathbf{X} - \boldsymbol{\Phi}^{k-1}\mathbf{Y}\right)\mathbf{Y}^{T} - \gamma\_{3}\mathbf{F}^{(-\mathbf{1})}\left(\mathbf{Y}^{k} - \mathbf{F}^{T}\boldsymbol{\Phi}^{k-1}\right) \tag{39}
$$

where **<sup>F</sup>** <sup>=</sup> **<sup>Φ</sup>***k*−1**Φ**(*k*−1)*<sup>T</sup>* . The descent step length can be obtained by optimizing the problem min *<sup>θ</sup> <sup>h</sup>*(**<sup>Φ</sup>** <sup>+</sup> *<sup>θ</sup>*∇*h*(**Φ**)) with fixed **<sup>F</sup>**, which is given by

$$\hat{\theta} = \frac{<\mathbf{a}, \mathbf{b}> + \gamma\_3 < \mathbf{c}, \mathbf{d}>}{\|\mathbf{a}\|\_F^2 + \gamma\_3\|\mathbf{c}\|\_F^2} \tag{40}$$

where **<sup>a</sup>** <sup>=</sup> <sup>∇</sup>*h*(**Φ**)**Y**, **<sup>b</sup>** <sup>=</sup> **<sup>X</sup>** <sup>−</sup> **<sup>Φ</sup>Y**, **<sup>c</sup>** <sup>=</sup> **<sup>F</sup>**−1∇*h*(**Φ**), **<sup>d</sup>** <sup>=</sup> **<sup>Ψ</sup>** <sup>−</sup> **<sup>F</sup>**−1**Φ**. For the frame condition <sup>√</sup>*<sup>A</sup>* <sup>≤</sup> *<sup>δ</sup>*<sup>Φ</sup> <sup>≤</sup> <sup>√</sup>*B*, we apply a SVD decomposition **<sup>Φ</sup>** <sup>=</sup> **<sup>U</sup>ΣV***<sup>T</sup>* and map the singular values to the interval [ <sup>√</sup>*A*, <sup>√</sup>*B*] linearly. We denote the mapped singular matrix as **<sup>Σ</sup>**<sup>ˆ</sup> and reconstruct **<sup>Φ</sup>** by **<sup>Φ</sup>** <sup>=</sup> **<sup>U</sup>Σ**<sup>ˆ</sup> **<sup>V</sup>***T*.

We summarize our algorithm in Algorithm 2.

**Algorithm 2** Dictionary pair learning algorithm
