**Appendix C. Estimation of** *<sup>L</sup>***1-Optimal Sparsity Parameter** <sup>λ</sup>*k***(***t***)**

This section aims to facilitate spectral dictionaries with adaptive sparse coding. First, the CMF model is defined as the following terms:

**W** = **<sup>I</sup>** <sup>⊗</sup> **<sup>W</sup>**1(ω) . . .**<sup>I</sup>** <sup>⊗</sup> **<sup>W</sup>**2(ω) . . . ··· . . .**<sup>I</sup>** <sup>⊗</sup> **<sup>W</sup>***K*(ω) , *e <sup>j</sup>*φ(*t*) = *e <sup>j</sup>*φ1(*t*). . . ··· . . .*e j*φ*K*(*t*) **y** = vec(**Y**) = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ **Y**1(:) ... **Y**2(:) ... . . . ... **Y***K*(:) ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ , **h** = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ **H**1(*t*) ... **H**2(*t*) ... . . . ... **H***K*(*t*) ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ , λ = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ λ1(*t*) ... λ2(*t*) ... . . . ... λ*K*(*t*) ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ , φ = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ φ1(:, *t*) ... φ2(:, *t*) ... . . . ... φ*K*(:, *t*) ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ **A** = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ **W**◦*e <sup>j</sup>*φ(*t*) **0** ... **0 0 W**◦*e <sup>j</sup>*φ(*t*) **<sup>0</sup>** . . . . . . **0 W**◦*e <sup>j</sup>*φ(*t*) **0 0** ... **0 W**◦*e j*φ(*t*)*<sup>t</sup>* ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ (A13)

where "⊗" and "◦" are the Kronecker product and the Hadamard product, respectively. The term vec(·) denotes the column vectorization and the term **I** is the identity matrix. The goal is then set to compute the regularization parameter λ*k*(*t*) related to each **H***k*(*t*). To achieve the goal, the parameter *p* in Equation (A3) was set at 1 to acquire a linear expression (in λ*k*(*t*)). In consideration of the noise variance σ2, Equation (A3) can concisely be rewritten as:

$$F(\underline{\mathbf{h}}, \underline{\mathbf{A}}) = \frac{1}{2\sigma^2} \underline{\mathbf{y}} - \overline{\mathbf{A}} \underline{\mathbf{h}}\_F^2 + \underline{\mathbf{A}}^T \underline{\mathbf{h}} - (\log \underline{\mathbf{A}})^T \underline{\mathbf{1}} \tag{A14}$$

where the **h** and λ terms indicate vectors of dimension *R* × 1 (i.e., *R* = *F* × *T* × *K*), and the superscript '**T**' is used to denote complex Hermitian transpose (i.e., vector (or matrix) transpose), followed by complex conjugate. The Expectation–Maximization (EM) algorithm is used to determine λ and **h** is the hidden variable, where the log-likelihood function can be optimized with respect to λ. The log-likelihood function satisfies the following [12]:

$$\ln p(\underline{\mathbf{y}} \, \Big| \, \underline{\mathbf{A}}, \overline{\mathbf{A}}, \sigma^2) \ge \int Q(\underline{\mathbf{h}}) \ln \left( \frac{p(\underline{\mathbf{y}}, \underline{\mathbf{h}} \Big| \, \underline{\mathbf{A}}, \overline{\mathbf{A}}, \sigma^2)}{Q(\underline{\mathbf{h}})} \right) d\underline{\mathbf{h}} \tag{A15}$$

by applying the Jensen's inequality for any distribution *Q*(**h**). The distribution can simply verify the posterior distribution of **h** that maximizes the right-hand side of Equation (A19) is given by *Q*(**h**) = *p* **h y**, <sup>λ</sup>, **<sup>A</sup>**, <sup>σ</sup><sup>2</sup> . The posterior distribution in the form of the Gibbs distribution is proposed as follows:

$$Q(\underline{\mathbf{h}}) = \frac{1}{Z\_h} \exp[-F(\underline{\mathbf{h}})] \text{ where } Z\_h = \int \exp[-F(\underline{\mathbf{h}})] d\underline{\mathbf{h}} \tag{A16}$$

The term *F*(**h**)in Equation (A16) as the function of the Gibbs distribution is essential for simplifying the adaptive optimization of λ. The maximum-likelihood (ML) estimation of λ can be decomposed as follows:

$$\begin{split} \underline{\underline{\mathbf{u}}}^{\mathrm{ML}} &= \underset{\underline{\underline{\mathbf{u}}}}{\arg\max} \ln p(\underline{\underline{\mathbf{y}}} | \underline{\underline{\mathbf{A}}}, \overline{\underline{\mathbf{A}}}, \sigma^2) \\ &= \underset{\underline{\underline{\mathbf{A}}}}{\arg\max} \int Q(\underline{\mathbf{h}}) \Big( \ln p(\underline{\underline{\mathbf{y}}} | \underline{\underline{\mathbf{A}}}, \overline{\underline{\mathbf{A}}}, \sigma^2) + \ln p(\underline{\underline{\mathbf{h}}} | \underline{\underline{\mathbf{A}}}) \Big) d\underline{\underline{\mathbf{h}}} \\ &= \underset{\underline{\underline{\mathbf{A}}}}{\arg\max} \int Q(\underline{\underline{\mathbf{h}}}) \ln p(\underline{\underline{\mathbf{h}}} | \underline{\underline{\mathbf{A}}}) d\underline{\underline{\mathbf{h}}} \end{split} \tag{A17}$$

In the same way,

$$\begin{split} \sigma^{2}\_{ML} &= \underset{\sigma^{2}}{\arg\max} \ln p(\underline{\mathbf{y}} \mid \underline{\mathbf{A}}, \overline{\mathbf{A}}, \sigma^{2}) \\ &= \underset{\sigma^{2}}{\arg\max} \int Q(\underline{\mathbf{h}}) \Big( \ln p(\underline{\mathbf{y}} \mid \underline{\mathbf{h}}, \overline{\mathbf{A}}, \sigma^{2}) + \ln p(\underline{\mathbf{h}} | \underline{\mathbf{A}}) \Big) d\underline{\mathbf{h}} \\ &= \underset{\sigma^{2}}{\arg\max} \int Q(\underline{\mathbf{h}}) \ln p(\underline{\mathbf{y}} \mid \underline{\mathbf{h}}, \overline{\mathbf{A}}, \sigma^{2}) d\underline{\mathbf{h}} \end{split} \tag{A18}$$

Individual element of **H** is required to be exponentially distributed with independent decay parameters that delivers *p*(**h** λ) <sup>=</sup> & *g* <sup>λ</sup>*<sup>g</sup>* exp −λ*ghg* , thus Equation (20) obtains

$$\underline{\Lambda}^{ML} = \underset{\underline{\Lambda}}{\text{arg}\, \underline{\text{max}}} \int \mathcal{Q}(\underline{\mathbf{h}}) \Big( \ln \lambda\_{\mathcal{J}} - \lambda\_{\mathcal{J}} h\_{\mathcal{J}} \Big) d\underline{\mathbf{h}} \tag{A19}$$

The term **h** denotes the dependent variable of the distribution *Q*(**h**) whereas other parameters are assumed to be constant. As such, the λ optimization in (A19) is derived by differentiating the parameters within the integral with respect to **h**. As a result, the functional optimization of λ then obtains

$$
\lambda\_{\mathcal{S}} = \frac{1}{\int h\_{\mathcal{S}} Q(\underline{\mathbf{h}}) d\underline{\mathbf{h}}} \tag{A20}
$$

where *g* = 1, 2, ... ,*R*, λ*<sup>g</sup>* denotes the *gth* element of λ. The iterative update for σ<sup>2</sup> *ML* is given by

$$\begin{split} \sigma\_{\text{ML}}^2 &= \arg\max \int \mathbb{Q}(\underline{\mathbf{h}}) \Big( \frac{-N\_0}{2} \ln \Big( \pi \sigma^2 \Big) - \frac{1}{2\sigma^2} \| \underline{\mathbf{y}} - \overline{\mathbf{A}} \underline{\mathbf{h}} \|^2 \Big) d\underline{\mathbf{h}} \\ &= \frac{1}{N\_0} \int \mathbb{Q}(\underline{\mathbf{h}}) \Big( \| \underline{\mathbf{y}} - \overline{\mathbf{A}} \underline{\mathbf{h}} \|^2 \Big) d\underline{\mathbf{h}} \end{split} \tag{A21}$$

where *p* **y h**, **<sup>A</sup>**, <sup>σ</sup><sup>2</sup> = πσ<sup>2</sup> −*N***0**/**<sup>2</sup>** exp − 1/2σ<sup>2</sup> **y** − **Ah** 2 and *N*<sup>0</sup> = *K* × *T*. However, the integral forms in Equations (A20) and (A21) are complex to compute and analyzed analytically. Thus, an approximation to *Q*(**h**) is exploited. Notice that the solution **h** naturally splits its elements into distinct subsets **h***<sup>M</sup>* and **h***<sup>P</sup>* consisting of components ∀*<sup>m</sup>* ∈ *M* such that *hm* > 0 and components ∀*<sup>p</sup>* ∈ *P* such that *hP* = 0. Hence, this can be derived as follows:

$$F(\underline{\mathbf{h}}, \underline{\mathbf{A}}) = F(\underline{\mathbf{h}}\_{M'} \underline{\mathbf{A}}\_M) + F(\underline{\mathbf{h}}\_{P'} \underline{\mathbf{A}}\_P) + G \tag{A22}$$

Defined *F* **h***M*, λ*<sup>M</sup>* = <sup>1</sup> <sup>2</sup>σ<sup>2</sup> **y** − **A***M***h***<sup>M</sup>* <sup>2</sup> + λ**<sup>T</sup>** *<sup>M</sup>***h***<sup>M</sup>* − (log λ) **T** *<sup>M</sup>***1***M*, *F* **h***P*, λ*<sup>P</sup>* = <sup>1</sup> <sup>2</sup>σ<sup>2</sup> **y** − **A***P***h***<sup>P</sup>* <sup>2</sup> + λ**T** *<sup>P</sup>***h***<sup>P</sup>* − (log λ) **T** *<sup>P</sup>***1***P*, and *<sup>G</sup>* <sup>=</sup> <sup>1</sup> 2σ<sup>2</sup> 2 **A***M***h***<sup>M</sup>* **T A***P***h***<sup>P</sup>* − **y** 2 . Here, the term **y** <sup>2</sup> is a constant and the cross-term **A***M***h***<sup>M</sup>* **T A***P***h***<sup>P</sup>* measures the orthogonality between **A***M***h***<sup>M</sup>* and **A***P***h***P*, where **A***<sup>M</sup>* and **A***<sup>P</sup>* denote the sub-matrix of **A** that corresponds to **h***<sup>M</sup>* and **h***P*. To obtain a simplified expression in Equation (A22), the *F*(**h**) function can be approximated as *F*(**h**, λ) ≈ *F* **h***M*, λ*<sup>M</sup>* + *F* **h***P*, λ*<sup>P</sup>* and the *G* can be safely discounted since its value is typically much smaller than *F* **h***M*, λ*<sup>M</sup>* and *F* **h***P*, λ*<sup>P</sup>* . Thus, the approximation of *Q*(**h**) can be expressed as

$$\begin{array}{lcl} Q(\underline{\mathbf{h}}, \underline{\mathbf{h}}) &= \frac{1}{\mathcal{Z}\_{\mathbb{h}}} \exp\left[-F(\underline{\mathbf{h}}, \underline{\mathbf{h}})\right] \\ &\approx \frac{1}{\mathcal{Z}\_{\mathbb{h}}} \exp\left[-\left(F(\underline{\mathbf{h}}\_{\mathcal{M}}, \underline{\mathbf{h}}\_{\mathcal{M}}) + F(\underline{\mathbf{h}}\_{\mathcal{P}'}, \underline{\mathbf{h}}\_{\mathcal{P}})\right)\right] \\ &= \frac{1}{\mathcal{Z}\_{\mathcal{M}}} \exp\left[-F(\underline{\mathbf{h}}\_{\mathcal{M}'}, \underline{\mathbf{h}}\_{\mathcal{M}})\right] \\ &\frac{1}{\mathcal{Z}\_{\mathcal{P}}} \exp\left[-\left(F(\underline{\mathbf{h}}\_{\mathcal{P}'}, \underline{\mathbf{h}}\_{\mathcal{P}})\right)\right] \\ &= Q\_{\mathcal{M}}(\underline{\mathbf{h}}\_{\mathcal{M}}) Q\_{\mathcal{P}}(\underline{\mathbf{h}}\_{\mathcal{P}}) \end{array} \tag{A23}$$

Defining *ZM* = ' exp- −*F* **h***M*, λ*<sup>M</sup>* .*d***h***<sup>M</sup>* and *ZP* <sup>=</sup> ' exp- −*F* **h***P*, λ*<sup>P</sup>* . *<sup>d</sup>***h***P*. With the purpose of characterizing *QP* **h***P* , some positive deviation to **h***<sup>P</sup>* is needed to be allowed for, whereas the **h***<sup>P</sup>* values will reject all negative values due to CMF only accepting zero and positive values. Thus, **h***<sup>P</sup>* admits zero and positive values in *QP* **h***P* . The approximation of the distribution *QP* **h***P* is then utilized in the Taylor expansion as the *maximum a posterior probability* (MAP) estimate. Therefore, with **h**MAP, one obtains

$$\begin{split} Q\_{\mathbf{P}} \Big( \underline{\mathbf{h}}\_{\mathcal{P}} \ge 0 \Big) &\propto \exp\Big\{- \Big[ \Big( \frac{\partial \mathcal{F}}{\partial \underline{\mathbf{h}}} \Big) \Big|\_{\underline{\mathbf{h}}} \Big]\_{\mathrm{MAP}}^{\mathrm{T}} \Big\|\_{\underline{P}}^{\mathrm{T}} \mathbf{h}\_{\mathcal{P}} - \mathop{\mathsf{L}}\Big[ \underline{\mathbf{h}}\_{\mathcal{P}}^{\mathrm{T}} \overline{\mathbf{C}}\_{P} \underline{\mathbf{h}}\_{\mathcal{P}} \Big\} \\ &= \ \mathrm{exp}\Big[ - \Big( \overline{\mathbf{C}} \underline{\mathbf{h}}^{\mathrm{MAP}} - \frac{1}{\sigma^{2}} \overline{\mathbf{A}}^{\mathrm{T}} \underline{\mathbf{y}} + \underline{\mathbf{A}} \Big)\_{\mathrm{P}}^{\mathrm{T}} \underline{\mathbf{h}}\_{\mathcal{P}} - \frac{1}{2} \underline{\mathbf{h}}\_{\mathcal{P}}^{\mathrm{T}} \overline{\mathbf{C}}\_{P} \underline{\mathbf{h}}\_{\mathcal{P}} \Big] \end{split} \tag{A24}$$

where **C***<sup>P</sup>* = <sup>1</sup> <sup>σ</sup><sup>2</sup> **<sup>A</sup><sup>T</sup>** *<sup>P</sup>***A***<sup>P</sup>* and **C** = <sup>1</sup> <sup>σ</sup><sup>2</sup> **<sup>A</sup><sup>T</sup> A**. The integration of the term *QP* **h***P* in Equation (A24) is hard to derive in its closed form expression for analytical evaluation, which subsequently prohibits inference of the sparsity parameters. A fixed form distribution is employed for computing variational approximate *QP* **h***P* . As a result, the closed form expression is obtained. Subsequently, the term **h***<sup>P</sup>* only takes on nonnegative values, so a suitable fixed form distribution is to use the factorized exponential distribution given by

$$\left\{\hat{Q}\_{\mathcal{P}}\left(\underline{\mathbf{h}}\_{\mathcal{P}}\geq 0\right)\right\}=\prod\_{p\in\mathcal{P}}\frac{1}{\mathcal{u}\_{p}}\exp\left(\frac{-\mathcal{h}\_{p}}{\mathcal{u}\_{p}}\right)\tag{A25}$$

By minimizing the Kullback–Leibler divergence between *QP* and *Q*ˆ *<sup>P</sup>*, the variational parameters **<sup>u</sup>** = \* *up* + where ∀*p* ∈ *P* can be derived as:

$$\begin{split} \underline{\mathbf{u}} &= \arg = \min\_{\underline{\mathbf{u}}} \mathbb{Q}\_{P}(\underline{\mathbf{h}}\_{P}) \ln \frac{\mathbb{Q}\_{P}(\underline{\mathbf{h}}\_{P})}{Q\_{P}(\underline{\mathbf{h}}\_{P})} d\underline{\mathbf{h}}\_{P} \\ &= \arg = \min\_{\underline{\mathbf{u}}} \hat{Q}\_{P}(\underline{\mathbf{h}}\_{P}) \Big[ \ln \hat{Q}\_{P}(\underline{\mathbf{h}}\_{P}) - \ln Q\_{P}(\underline{\mathbf{h}}\_{P}) \Big] d\underline{\mathbf{h}}\_{P} \end{split} \tag{A26}$$

Solving Equation (A26) for *up* leads to the following update [37]:

$$u\_p \leftarrow u\_p \frac{-\hat{b}\_p + \sqrt{\hat{b}\_p^2 + 4\frac{(\mathbf{C}\underline{\mathbf{u}})\_p}{\tilde{u}\_p}}}{2\mathrm{(\hat{C}\underline{\mathbf{u}})\_p}}\tag{A27}$$

The approximate distribution for components **h***<sup>M</sup>* can be obtained by substituting *F* **h***M*, λ*<sup>M</sup>* into *QM* **h***<sup>M</sup>* as follows:

$$\begin{array}{rcl}Q\_{M}\{\underline{\mathbf{h}}\_{M}\} &=& \frac{1}{\mathcal{Z}\_{M}}\exp\left[-F\{\underline{\mathbf{h}}\_{M}\,\underline{\lambda}\_{M}\}\right] \\ & \propto \exp\left[-\left(\frac{1}{2}\underline{\mathbf{h}}\_{M}^{T}\overline{\mathbf{C}}\_{M}\underline{\mathbf{h}}\_{M} - \frac{1}{\sigma^{2}}\underline{\mathbf{y}}^{T}\overline{\mathbf{A}}\_{M}\underline{\mathbf{h}}\_{M} + \underline{\lambda}\_{M}\underline{\mathbf{h}}\_{M}\right)\right] \end{array} \tag{A28}$$

In Equation (A28), the function *QM* **h***<sup>M</sup>* will be expressed as the unconstrained Gaussion with mean **h**MAP *<sup>M</sup>* and covariance **<sup>C</sup>**−<sup>1</sup> *<sup>M</sup>* based on a multivariate Gaussian distribution. The term **C***<sup>M</sup>* denotes the sub-matrix of **C**. The sparsity parameter is then obtained by substituting Equations (A24), (A25), and (A28) into Equation (A20) as presented in Equation (A29):

$$\lambda\_{\mathcal{S}} = \begin{cases} \frac{1}{\int \frac{h\_3 Q\_M(\mathbf{h}\_M) d\mathbf{h}\_M}{1}} = \frac{1}{h\_{\mathcal{S}}^{\text{MAP}}} \text{ if } \mathcal{g} \in M\\ \frac{1}{\int h\_3 Q\_P(\mathbf{h}\_P) d\mathbf{h}\_P} = \frac{1}{u\_{\mathcal{S}}} \text{ if } \mathcal{g} \in P \end{cases} \tag{A29}$$

and its covariance X is given by

$$X\_{ab} = \begin{cases} \left(\overline{\mathbf{C}}\_p^{-1}\right)\_{ab'} \text{ if } a, b \in M\\ u\_p^2 \delta\_{ab\prime} \text{ Otherwise.} \end{cases} \tag{A30}$$

Similarly, the inference for σ<sup>2</sup> can be computed from Equation (24) as

$$
\sigma^2 = \frac{1}{N\_0} \int Q(\underline{\mathbf{h}}) \Big( \| \underline{\mathbf{y}} - \overline{\mathbf{A}} \underline{\mathbf{h}} \|^2 \Big) d \underline{\mathbf{h}} \tag{A31}
$$

where

$$h\_{\mathfrak{F}} = \begin{cases} \begin{array}{c} h\_{\mathfrak{F}}^{\text{MAP}} \text{ if } \mathfrak{g} \in M \\ u\_{\mathfrak{F}} \text{ if } \mathfrak{g} \in P \end{array} \end{cases}$$

The core procedure of the proposed CMF method is based on *L*1-optimal sparsity parameters. The estimated sources are discovered by multiplying the respective rows of the *Wk*(ω) components with the corresponding columns of the *Hk*(*t*) weights and time-varying phrase spectrum *ej*φ*k*(ω,*t*). The separated sources *s*ˆ*j*(*t*) are obtained by converting the time-frequency represented sources into time domain.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
