*3.3. Learning Model of Dictionary Pair*

Assuming **<sup>X</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>L</sup>* is the training data with signal vectors **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*N*, *<sup>i</sup>* <sup>=</sup> 1, 2, ... , *<sup>L</sup>*, as its columns. The dictionary pair learning model can be written as

$$\min\_{\Phi, \mathbf{Y}, \lambda, \mathbf{Y}} \|\mathbf{X} - \Phi \mathbf{Y}\|\_F^2 + \gamma\_1 \|\mathbf{Y} - \mathcal{S}\_\lambda(\mathbf{Y}^T \mathbf{X})\|\_F^2 + \gamma\_2 \|\mathbf{Y}\|\_0 + \gamma\_3 \|\mathbf{Y} - \mathbf{F}^{-1} \Phi\|\_F^2 \tag{20}$$
 
$$\text{s.t.}\\\sqrt{A} \le \eta\_\Phi \le \sqrt{B}$$

However, the Problem (20) is difficult to solve. First, the inverse of the frame operator **F** has no closed-form explicit expression. Secondly, the thresholding operator is a highly nonlinear operator which makes the optimization with respect to *λ* hard to optimize.

Apparently, the Problem (20) is difficult to solve as the existence of the inverse of **F**. Fortunately, the matrix **F**−<sup>1</sup> can be expressed as a convergent series [36] which is formulated as

$$\mathbf{F}^{-1} = \frac{2}{A+B} \sum\_{k=0}^{\infty} (\mathbf{I} - \frac{2\mathbf{F}}{A+B})^k \tag{21}$$

Here, we truncated the series at *k* = 1 to make a tradeoff between computational complexity and approximation accuracy. It is formulated as

$$\mathbf{F}^{-1} \approx \frac{2}{A+B} + \frac{2}{A+B}(\mathbf{I} - \frac{2\mathbf{F}}{A+B}) = \frac{2}{A+B}(2\mathbf{I} - \frac{2\mathbf{F}}{A+B})\tag{22}$$

In this way, once the frame bounds are given, the inverse of **F** can be calculated easily. Then the optimization problem for training RIP-dictionary pair is formulated as

$$\min\_{\boldsymbol{\Phi}, \boldsymbol{\Psi}, \boldsymbol{\lambda}, \boldsymbol{\Upsilon}} \left\lVert \mathbf{X} - \boldsymbol{\Phi}\mathbf{Y} \right\rVert\_{F}^{2} + \gamma\_{1} \left\lVert \mathbf{Y} - \mathcal{S}\_{\lambda}(\mathbf{Y}^{T}\mathbf{X}) \right\rVert\_{F}^{2} + \gamma\_{2} \left\lVert \mathbf{Y} \right\rVert\_{0} + \gamma\_{3} \left\lVert \mathbf{Y} - \frac{2}{A+B}(2\mathbf{I} - \frac{2\mathbf{F}}{A+B})\boldsymbol{\Phi} \right\rVert\_{F}^{2} \tag{23}$$
 
$$\text{s.t. } \sqrt{A} \le r\_{\Phi} \le \sqrt{B}$$

where S*λ*(·) is the elementwise thresholding operator. There are two basic thresholding methods: The hard thresholding method whose thresholding operator defines as S*λ*(·) → *max*(|·|− *λ*, 0) and the soft thresholding whose operator is defined as S*λ*(·) → *sgn*(·)*max*(|·|− *λ*, 0). Both of the two operator are are non-convex and highly discontinuous which lead to big challenges to solve Problem (23). The mean reason is the fact that the update of the thresholding values *λ* causing non-smooth changes to the cost function. To solve this difficulty, we design an alternative direction method via global search and least square that will be introduce in Section 4.1.

### **4. Dictionary Pair Learning Algorithm**

In this subsection, we propose the two-phase iterative algorithm for dictionary pair learning by dividing Problem (23) into two subproblems: The sparse coding phase which updates the sparse coefficients **Y** and thresholding values *λ*, and the dictionary pair update phase which computes **Φ** and **Ψ**.
