*4.1. Sparse coding phase*

In this subsection, we discuss how to calculate the sparse coefficients **Y** and the threshold values *λ* with given **Φ** and **Ψ** under our SSM-NTF model.

Given a pair of dictionaries **Φ** and **Ψ**, calculating **Y** and *λ* from **X** is formulated as:

$$\mathbb{E}\left\{\hat{\mathbf{Y}},\hat{\boldsymbol{\lambda}}\right\} = \min\_{\mathbf{Y},\boldsymbol{\lambda}} \|\mathbf{X} - \boldsymbol{\Phi}\mathbf{Y}\|\_F^2 + \gamma\_1 \|\mathbf{Y} - \mathbf{S}\_{\boldsymbol{\lambda}}(\mathbf{Y}^T\mathbf{X})\|\_F^2 + \gamma\_2 \|\mathbf{Y}\|\_0 \tag{24}$$

We pursue the two variables alternatively. Firstly, with fixed *λ*, we obtain the sparse coefficients **Y** by solving Problem (24) through OMP [9] as it can be easily convert to the classical synthesis sparse expression min **<sup>Z</sup>** − **DY**<sup>2</sup> *<sup>F</sup>*, *<sup>s</sup>*.*t*.**<sup>Y</sup>**<sup>0</sup> where **<sup>Z</sup>** = [**<sup>X</sup>** <sup>√</sup>*γ*1S*λ*(**Ψ***T***X**)] and **<sup>D</sup>** = [**<sup>Φ</sup>** <sup>√</sup>*γ*1**I**].

Secondly, the pursue of *λ* is equivalent to solving the following problem

$$\hat{\lambda} = \operatorname\*{arg\,min}\_{\lambda} \|\mathbf{Y} - \mathcal{S}\_{\lambda}(\mathbf{Y}^T \mathbf{X})\|\_F^2 \tag{25}$$

which can be decomposed into *M* individual optimization problems

$$\lambda\_i = \underset{\lambda\_i}{\text{arg min}} \|\overline{\mathbf{y}}\_i - \mathcal{S}\_{\lambda\_i}(\boldsymbol{\Psi}\_i^T \mathbf{X})\|\_{2'}^2 \\ i = 1, \dots, M. \tag{26}$$

where *ψ<sup>i</sup>* is the column of **Ψ**. From the definition of soft thresholding operator, we can know that the function of Problem (26) is discrete. By denoting the data indices set that remains intact after the thresholding as <sup>J</sup>*i*, we split the data **<sup>X</sup>** into two parts: **<sup>X</sup>**J*<sup>i</sup>* and **<sup>X</sup>**J<sup>ˆ</sup> *<sup>i</sup>* such that

$$\mathcal{S}\_{\lambda\_i}(\boldsymbol{\Psi}\_i^T \mathbf{X}^{\mathcal{I}\_i}) = \boldsymbol{\Psi}\_i^T \mathbf{X}^{\mathcal{I}\_i} - \text{sgn}(\boldsymbol{\Psi}\_i^T \mathbf{X}^{\mathcal{I}\_i}) \lambda\_i \tag{27}$$

$$\mathcal{S}\_{\lambda\_i}(\boldsymbol{\Psi}\_i^T \mathbf{X}^{\mathcal{J}\_i}) = 0. \tag{28}$$

where <sup>J</sup><sup>ˆ</sup> *<sup>i</sup>* is a supplementary to the intact indices J*<sup>i</sup>* which turn the all elements to zero. It is clear to know that the variables <sup>J</sup>*<sup>i</sup>* and <sup>J</sup><sup>ˆ</sup> *<sup>i</sup>* are both functions of *λ<sup>i</sup>* without explicit expressions which leads to a large challenge in optimization.

In order to solve Problem (26), an intermediate variable *μ<sup>i</sup>* is necessarily to introduced to separate the whole problem into two parts: The update of the indices <sup>J</sup>*<sup>i</sup>* and <sup>J</sup><sup>ˆ</sup> *<sup>i</sup>* (determined by *μi*) and the update of the explicit thresholding value *λi*. Then Problem (26) can be transformed to another optimization problem:

$$\{\hat{\lambda}\_i, \hat{\mu}\_i\} = \operatorname\*{arg\,min}\_{\lambda\_i} \|\overline{\mathbf{y}}\_i^{\hat{\mathcal{I}}\_i}\|\_2^2 + \|\overline{\mathbf{y}}^{\mathcal{I}\_i} - [\Psi\_i^T \mathbf{X}^{\mathcal{I}\_i} - \operatorname{sgn}(\Psi\_i^T \mathbf{X}^{\mathcal{I}\_i})\lambda\_i] \|\_2^2 + \tau/2\|\lambda\_i - \mu\_i\|\_2^2 \tag{29}$$

where <sup>J</sup>*<sup>i</sup>* and <sup>J</sup><sup>ˆ</sup> *<sup>i</sup>* are two functions of the intermediate variable *μi*.

At the *k*-th step, to obtain *μi*, we solve Problem (29) with *λ<sup>i</sup>* fixed and denote the functions as *<sup>f</sup>*(*μi*) + *<sup>g</sup>*(*μi*) + *<sup>l</sup>*(*μi*) where *<sup>f</sup>*(*μi*) = **<sup>y</sup>**J<sup>ˆ</sup> *i <sup>i</sup>* <sup>2</sup> <sup>2</sup>, *<sup>g</sup>*(*μi*) = **<sup>y</sup>**J*<sup>i</sup>* − [*ψ<sup>T</sup> <sup>i</sup>* **<sup>X</sup>**J*<sup>i</sup>* − *sgn*(*ψ<sup>T</sup> <sup>i</sup>* **<sup>X</sup>**J*i*)*λi*]<sup>2</sup> <sup>2</sup>, *l*(*μi*) = *<sup>τ</sup>*/2*<sup>λ</sup><sup>i</sup>* − *<sup>μ</sup><sup>i</sup>*<sup>2</sup> <sup>2</sup>. Optimizing this expression is obviously non-trivial as the target function is non-convex and highly discontinuous. Actually, with *λ<sup>i</sup>* fixed, the minimization of *f*(*μi*) + *g*(*μi*) can be globally solved due to its discrete finite nature. In another word, if a series of candidate terms of *μ<sup>i</sup>* are given, the global search is guaranteed to succeed.

Once a *λ<sup>i</sup>* is given, the *f*(*μi*) + *g*(*μi*) will be a piecewise constant function. It means that the function values remain unchanged within a series of intervals which are determined by |*ψ<sup>T</sup> <sup>i</sup>* **<sup>x</sup>***i*|, *<sup>i</sup>* = *<sup>i</sup>*1, *<sup>i</sup>*2, ... , *il*. Therefore, |*ψ<sup>T</sup> <sup>i</sup>* **x***i*|, *i* = 1, 2, ... , *L* can be taken as a portion of candidate terms of *μi*. For the function ł(*μi*), it is clear that it minimizes at *μ<sup>i</sup>* = *λ<sup>i</sup>* and monotonically increases with the increasing distance between ł(*μi*) and the given ł(*λi*). So, to minimize ł(*μi*), we only need to choose the closest point in the feasible region.

Without loss of generality, we assume that all the |*ψ<sup>T</sup> <sup>i</sup>* **x***i*|, *i* = 1, 2, ... , *L* are ascending ordered and the corresponding signals are in the same order. We compute all the possible values of *f*(*μi*) + *g*(*μi*) by

$$\begin{aligned} f(\boldsymbol{\Psi}\_i^T \mathbf{X}\_{j+1}) &= f(\boldsymbol{\Psi}\_i^T \mathbf{X}\_j) + \boldsymbol{\mathcal{Y}}\_{i(j+1)}^2 \\ \mathcal{G}(\boldsymbol{\Psi}\_i^T \mathbf{X}\_{j+1}) &= \|\overline{\mathbf{y}}\_i - \boldsymbol{\Psi}\_i^T \mathbf{X} + \text{sgn}(\boldsymbol{\Psi}\_i^T \mathbf{X}) \boldsymbol{\lambda}\_i\|\_2^2 - \beta\_{k+1} \end{aligned} \tag{30}$$

where *<sup>β</sup>k*+<sup>1</sup> = *<sup>β</sup><sup>k</sup>* + [*yi*(*k*+1) − *<sup>ψ</sup><sup>T</sup> <sup>i</sup>* **<sup>x</sup>***k*+<sup>1</sup> + *sgn*(*ψ<sup>T</sup> <sup>i</sup>* **x***k*+1)*λi*] 2, *<sup>β</sup>*<sup>1</sup> = [*yi*<sup>1</sup> − *<sup>ψ</sup><sup>T</sup> <sup>i</sup>* **<sup>x</sup>**<sup>1</sup> + *sgn*(*ψ<sup>T</sup> <sup>i</sup>* **x**1)*λi*] 2. Sort |*ψ<sup>T</sup> <sup>i</sup>* **x***i*|, *i* = 1, 2, ... , *L* in descending order of *f*(*μi*) + *g*(*μi*), and every two adjacent values form an interval on which the function value remains unchanged. In another word, the objective function *f*(*μi*) + *g*(*μi*) + *l*(*μi*) is minimized at the point closest to *λ<sup>i</sup>* in the interval. Thus, compute all the minimizer values on every interval and the minimum must be the optimal result.

With *μ<sup>i</sup>* fixed, we solve the following problem in order to pursue *λ<sup>i</sup>* :

$$\{\hat{\lambda}\_i\} = \operatorname\*{arg\,min}\_{\lambda\_i} \|\overline{\mathbf{y}}^{\mathcal{I}\_i} - [\Psi\_i^T \mathbf{X}^{\mathcal{I}\_i} - \operatorname{sgn}(\Psi\_i^T \mathbf{X}^{\mathcal{I}\_i})\lambda\_i] \|\_{2}^{2} + \operatorname{\tau}/2 \|\lambda\_i - \mu\_i\|\_{2}^{2} \tag{31}$$

This is a standard continuous convex function that can be easily sovled by least square.

We summarize our sparse coding method in Algorithm 1.

