**Process:**


For *j*=1:*L*

Compute all the possible values for *f*(*λi*) + *g*(*λi*) by *f*(*ψ<sup>T</sup> <sup>i</sup>* **<sup>x</sup>***j*) = *<sup>f</sup>*(*ψ<sup>T</sup> <sup>i</sup>* **<sup>x</sup>***j*−1)+(*yij*)<sup>2</sup> ; *<sup>l</sup>*(*ψ<sup>T</sup> <sup>i</sup>* **x***j*) = *l*(*ψ<sup>T</sup> <sup>i</sup>* **<sup>x</sup>***j*−1)+(*yij* <sup>−</sup> *<sup>ψ</sup><sup>T</sup> <sup>i</sup>* **<sup>x</sup>***j*)2; *<sup>g</sup>*(*ψ<sup>T</sup> <sup>i</sup>* **<sup>x</sup>***j*) = *<sup>ψ</sup><sup>T</sup> <sup>i</sup>* **<sup>x</sup>**<sup>2</sup> <sup>2</sup> − *<sup>l</sup>*(*ψ<sup>T</sup> <sup>i</sup>* **x***j*).; Denote them as a vector *ν*. End for

4: Sort the elements of |*ψ<sup>i</sup> <sup>T</sup>***X**| and the columns of **<sup>X</sup>** in descending order of *<sup>ν</sup>*. Denote the first and second samples as **<sup>x</sup>***i*<sup>1</sup> and **<sup>x</sup>***i*<sup>2</sup> . Set *<sup>λ</sup><sup>i</sup>* <sup>=</sup> <sup>|</sup>*ψ<sup>i</sup> T***x***i* <sup>1</sup> <sup>|</sup>+|*ψ<sup>i</sup> T***x***i* 2 | <sup>2</sup> . **End for**

Such a problem is a highly nonlinear optimization problem, due to the definition of S*λ*. We (columnwise) solve **Ψ** by updating each column of **Ψ** while fixing others. The product **ΦΨ***<sup>T</sup>* can be written as

$$\boldsymbol{\Phi}\boldsymbol{\Psi}^{T} = \sum\_{p=1}^{N} \boldsymbol{\Psi}\_{p}\boldsymbol{\Phi}\_{p}^{T} = \boldsymbol{\Psi}\_{i}\boldsymbol{\Phi}\_{i}^{T} - (\mathbf{I} - \sum\_{p \neq i}^{N} \boldsymbol{\Psi}\_{p}\boldsymbol{\Phi}\_{p}^{T}).\tag{19}$$

For each *ψ<sup>i</sup>* , we solve the following subproblem:

$$\min\_{\boldsymbol{\Psi}\_{i}} \|\overline{\boldsymbol{\mathfrak{y}}}\_{i} - \boldsymbol{\mathcal{S}}\_{\boldsymbol{\lambda}\_{i}}(\boldsymbol{\upmu}\_{i}^{T}\boldsymbol{\mathsf{X}})\|\_{2}^{2} + \frac{\eta\_{3}}{\eta\_{1}}\|\boldsymbol{\upmu}\_{i}\boldsymbol{\upphi}\_{i} - \boldsymbol{\mathsf{z}}\|\_{2\prime}^{2} \tag{20}$$

where **<sup>z</sup>** <sup>=</sup> **<sup>I</sup>** <sup>−</sup> <sup>∑</sup>*<sup>N</sup> <sup>p</sup>*=*<sup>i</sup> <sup>ψ</sup>pφ<sup>T</sup> <sup>p</sup>* . We denote J*<sup>i</sup>* to be the indices (as before), and then separate the problem into the two following sub-problems:

$$\hat{\Psi}\_i^1 = \underset{\Psi\_i}{\text{arg min}} \sum\_{j \in \mathcal{J}\_i} (y\_{ij} - \psi\_i^T \mathbf{x}\_j)^2 + \frac{\eta\_1}{\eta\_3} \|\boldsymbol{\Psi}\_i \boldsymbol{\Phi}\_i^T - \mathbf{z}\|\_{2'}^2 \tag{21}$$

$$\hat{\boldsymbol{\Psi}}\_{i}^{2} = \underset{\parallel \boldsymbol{\Psi}\_{i} \parallel\_{2} = 1}{\text{arg min}} \sum\_{j \in \{1, \dots, L\} \backslash \mathcal{I}i} (\boldsymbol{\Psi}\_{i}^{T} \mathbf{x}\_{j})^{2},\tag{22}$$

where *yij* denotes the (*i*, *j*)th entry of **Y** and **x***<sup>i</sup>* denotes the *i*th column of **X**. Equation (21) is a quadratic optimization, while Equation (22) has a closed form solution given by the normalized singular vector corresponding to the smallest singular value of **X**J<sup>ˆ</sup> . Based on the solutions of the two sub-problems, we give the solution of (20) as the average of the two solutions; that is, *ψ*ˆ *<sup>i</sup>* = <sup>1</sup> <sup>2</sup> ( <sup>ˆ</sup>*ψ*<sup>1</sup> *<sup>i</sup>* <sup>+</sup> <sup>ˆ</sup>*ψ*<sup>1</sup> *<sup>i</sup>* <sup>2</sup> <sup>ˆ</sup>*ψ*<sup>2</sup> *i* ). Please note that the second solution is added with the magnitude of the norm of the first solution, as (21) serves as a dominant term for the Ψ subproblem, while the solution of (22) maintains no energy but direction.

The **Φ** Subproblem

With fixed **Y**, *λ*, and **Ψ**, the model to obtain **Φ** is given by

$$\min\_{\Phi} \|\|\mathbf{X} - \Phi\mathbf{Y}\|\|\_{F}^{2} + \eta\_{3} \|\|\Phi\mathbf{Y}^{T} - \mathbf{I}\|\|\_{F}^{2}$$
 
$$\text{s.t. } \Phi \Phi^{T} = \mathbf{I}. \tag{23}$$

We convert (24) to an optimization problem which is formulated as

$$\min\_{\Phi} \|\mathbf{X} - \Phi \mathbf{Y}\|\_F^2 + \eta\_3 \|\Phi(\Phi - \mathbf{Y})^T\|\_F^2. \tag{24}$$

We denote the target function (24) by *h*(**Φ**) and apply the gradient descent method to the unconstrained version of (24) and project the solution to the feasible space. The gradient is given by

$$\begin{split} \nabla h(\boldsymbol{\Phi}) &= (\boldsymbol{\Phi}\mathbf{Y} - \mathbf{X})\mathbf{Y}^T + \eta\_3[\boldsymbol{\Phi}(\boldsymbol{\Phi} - \mathbf{Y})^T(\boldsymbol{\Phi} - \mathbf{Y}) + \boldsymbol{\Phi}(\boldsymbol{\Phi} - \mathbf{Y})^T\boldsymbol{\Phi}] \\ &= (\boldsymbol{\Phi}\mathbf{Y} - \mathbf{X})\mathbf{Y}^T + \eta\_3\boldsymbol{\Phi}(\boldsymbol{\Phi} - \mathbf{Y})^T(2\boldsymbol{\Phi} - \mathbf{Y}). \end{split} \tag{25}$$

We summarize our overall algorithm in Algorithm 2.

**Algorithm 2:** Transform pair learning algorithm.

#### **Input and Initialization:**

Training data **X**, frame bound (*A*, *B*), iteration *num*.

Build frames **<sup>Φ</sup>** <sup>∈</sup> <sup>R</sup>*M*×*<sup>N</sup>* and **<sup>Ψ</sup>** <sup>∈</sup> <sup>R</sup>*M*×*N*, either by using random entries or using N

randomly chosen data.

#### **Output:**

Frames **Φ**, **Ψ**, Sparse coefficients **Y**, and thresholding values *λ*

**Process: For** l=1:*num*

```
Sparse Coding Step:
```
1: Compute the sparse coefficients **Y** and the thresholding values *λ* via **Algorithm**(1).

#### **Frame Update Step:**

2: Update **<sup>Ψ</sup>** columnwise. Compute **<sup>W</sup>** = S*λ*(**Ψ***T***X**).

For *i* = 1 : *M*

Denote <sup>J</sup><sup>ˆ</sup> *<sup>i</sup>* as the indices of zeros in the *i* th column of **W**. Set *ψ<sup>T</sup> <sup>i</sup>* **<sup>X</sup>**J<sup>ˆ</sup> *<sup>i</sup>* = 0. Compute *ψ<sup>i</sup>* via

(21) and (22).

End For

3: Update **Φ** via Gradient Descent, which is given as (25) and the step length is usually set to 0.01. **End for**

#### **4. Image Denoising**

We introduce a novel problem formulation for signal denoising by applying the data-driven redundant transform DRTPF. Image denoising aims to reconstruct a high-quality image I from its noise corrupted version L, which is formulated as L = I + **n** where **n** is a noisy signal. For a signal satisfying the DRTPF, the denoising model based on DRTPF is formulated as

$$\{\mathcal{Z}, \hat{\mathbf{Y}}, \boldsymbol{\lambda}\} = \min\_{\mathcal{Z}, \{\mathbf{y}\_i\}\_{i=1}^N, \boldsymbol{\lambda}} \|\mathcal{L} - \mathcal{Z}\|\_F^2 + \gamma \sum\_i \|\mathbf{R}\_i \mathcal{Z} - \boldsymbol{\Phi} \mathbf{y}\_i\|\_F^2 + \gamma\_1 \sum \|\mathbf{y}\_i - \mathcal{S}\_{\boldsymbol{\lambda}}(\mathbf{Y}^T \mathbf{R}\_i \mathcal{Z})\|\_F^2 + \gamma\_2 \sum \|\mathbf{y}\_i\|\_{\mathbb{D}}.\tag{26}$$

where **R***<sup>i</sup>* is an operator that extracts the *i*th patch of the image I, **y***<sup>i</sup>* is the *i*th column of **Y**, and *<sup>λ</sup>* denotes a vector [*λ*1, *<sup>λ</sup>*2, ··· , *<sup>λ</sup>M*] with *<sup>λ</sup><sup>j</sup>* operating on the *<sup>j</sup>*th element of **<sup>Ψ</sup>***T***R***i*I. On the right side of Equation (26), the first term is the global force, which demands proximity between the degraded image L and its high-quality version I. The other terms are the local constraints, which ensure that every patch at location *i* satisfies the DRTPF. This formulation assumes that the noise image L can be approximated by a noiseless image <sup>I</sup><sup>ˆ</sup> whose patch extracted by **<sup>R</sup>***<sup>i</sup>* can be sparsely represented by the given transforms **Φ** and **Ψ**.

To solve Problem (26), we apply Algorithm 1 to obtain the sparse coefficients **Y** and the threshold values *λ*. We mainly state the iterative method to obtain I.

Denote **<sup>d</sup>***<sup>k</sup>* <sup>=</sup> **<sup>Ψ</sup>***T***R***i*I*k*−1. We set **<sup>O</sup>***<sup>k</sup>* as an index set that satisfies <sup>|</sup>**d***<sup>k</sup> <sup>l</sup>* | ≤ *<sup>λ</sup>l*, *<sup>l</sup>* ∈ **<sup>O</sup>***k*. Set **<sup>u</sup>***<sup>k</sup>* ∈ R*<sup>M</sup>* - <sup>1</sup> *<sup>l</sup>* ∈ **<sup>O</sup>***k*,

as a vector with elements **u***<sup>k</sup> <sup>l</sup>* = 0 *otherwise*. Then, the non-convex and non-smooth thresholds

can be removed, with the substitution **<sup>y</sup>***<sup>i</sup>* − S*λ*(**Ψ***T***R***i*I*k*) <sup>≈</sup> **<sup>y</sup>***<sup>i</sup>* <sup>−</sup> **<sup>Ψ</sup>***T***R***i*I*<sup>k</sup>* **<sup>u</sup>***<sup>k</sup>* Thus, in the *<sup>k</sup>*th step, the problem that needs to be solved can be expressed as

$$\mathbb{E}\{\hat{\mathcal{Z}}^{k}\} = \min\_{\mathcal{Z}^{k-1}} \|\mathcal{L} - \mathcal{Z}^{k-1}\|\_{\mathcal{F}}^{2} + \gamma \sum\_{i} \|\mathbf{R}\_{i}\mathcal{Z}^{k-1} - \boldsymbol{\Phi}\mathbf{y}\_{i}\|\_{\mathcal{F}}^{2} + \gamma\_{1} \sum\_{i} \|\mathbf{y}\_{i} - \mathbf{Y}^{T}\mathbf{R}\_{i}\mathcal{Z}^{k-1} \odot \mathbf{u}^{k}\|\_{\mathcal{F}}^{2} \tag{27}$$

where is pointwise multiplication. This convex problem can be easily solved by the gradient descent algorithm.

We summarize the restoration algorithm in Algorithm 3.

#### **Algorithm 3:** Denoising algorithm.

#### **Input**

Training dictionaries **Φ**, **Ψ**, iteration number *r*, a degraded image L, set I<sup>0</sup> = L.

**Output:**

The high-quality image <sup>I</sup><sup>ˆ</sup>

1: Compute **Y** and *λ* via the method in Algorithm 1.

.

**For** k=1:*r*

2: Compute **<sup>d</sup>***<sup>k</sup>* <sup>=</sup> **<sup>Ψ</sup>***T***R***i*I*k*−1. Set **<sup>O</sup>***<sup>k</sup>* as an index set that satisfies <sup>|</sup>**d***<sup>k</sup> <sup>l</sup>* | ≤ *<sup>λ</sup>l*, *<sup>l</sup>* ∈ **<sup>O</sup>***k*. Set

$$\mathbf{u}\_l^k = \begin{cases} 1 & l \in \mathbf{O}^k, \\ 0 & \text{otherwise}. \end{cases}$$

3: Solve Problem (27) via the gradient descent algorithm.

**End for**
