**4. Conclusions**

A novel solution for classification of the noisy mixtures using a single microphone was presented. The complex matrix factorization was proposed and extended by adaptively tuning the sparse regularization. Thus, the desired *L*1-optimal sparse decomposition was obtained. In addition, the phase estimates of the CMF could extract the recurrent pattern of the magnitude spectra. The updated equation was derived through an auxiliary function. For classification, the multiclass support vector was used as the mean supervector for encoding the sound-event signatures. The proposed noisy sound separation and event classification method was demonstrated by using four sets of noisy sound-event mixtures, which were door open, door knocking, footsteps, and speech. Based on the experimental results, first, the optimal window length of STFT was found where 1.5 s of the sliding window yielded the best separation performance. The second was two significant features that were ZCR and MFCCs. These parameters were set for examining the proposed method. The proposed method achieved outstanding results in both separation and classification. In future work, the proposed method will be evaluated on a public dataset such as the DCASE 2016, alongside the comparison with other machine learning algorithms.

**Author Contributions:** Conceptualization, P.P. and W.L.W.; Methodology, P.P. and N.T.; Software, P.P.; Validation, N.T. and W.L.W.; Investigation, P.P. and N.T.; Writing—original draft preparation, P.P. and W.L.W.; Writing—review and editing, N.T., M.A.M.A., O.A., and G.R.; Visualization, M.A.M.A.; Supervision, W.L.W.; Project administration, N.T., M.A.M.A., and O.A.; Funding acquisition, W.L.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the UK Global Challenge Research Fund, the National Natural Science Foundation of China (No. 61971093, No. 61401071, No. 61527803), and supported by the NSAF (Grant No. U1430115) and EPSRC IAA Phase 2 funded project: "3D super-fast and portable eddy current pulsed thermography for railway inspection (EP/K503885/1).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Single-Channel Sound Event Separation**

The prior *P*(**H**|λ) corresponds to the sparsity cost, for which a natural choice is a generalized Gaussian prior:

$$P(\mathbf{H}|\lambda) = \prod\_{k,t} \frac{p\lambda^k(t)}{2\Gamma(1/p)} \exp\left(-\left(\lambda^k(t)\right)^p \left|\mathbf{H}^k(t)\right|^p\right) \tag{A1}$$

where <sup>λ</sup>*k*(*t*) and *<sup>p</sup>* are the shape parameters of the distribution. When *<sup>p</sup>* = 1, *<sup>P</sup>*(**H**|λ) promotes the *L*1-norm sparsity. *L*1-norm sparsity has been shown to be probabilistically equivalent to the pseudo-norm, *L*0, which is the theoretically optimum sparsity [29,30]. However, *L*0-norm is non-deterministic polynomial-time (NP) hard and is not useful in large datasets such as audio. Given Equation (3), the posterior density is defined as

$$P(\boldsymbol{\theta}|\mathbf{Y},\boldsymbol{\lambda}) \propto P(\mathbf{Y}|\boldsymbol{\Theta})P(\mathbf{H}|\boldsymbol{\lambda})\tag{A2}$$

The maximum a posteriori probability (MAP) estimation problem leads to minimizing the following optimization problem with respect to θ:

$$f(\theta) = \sum\_{f,t} \left| \mathbf{Y}(\omega, t) - \mathbf{X}(\omega, t) \right|^2 + \sum\_{k,t} \left[ \left( \lambda^k(t) \right)^p \left| \mathbf{H}^k(t) \right|^p - \log \lambda^k(t) \right] \tag{A3}$$

subject to *f* **W***k*(ω) = 1 (*k* = 1, ... ,*K*).

The CMF parameters has been upgraded by using an efficient auxiliary function for an iterative process. The auxiliary function for *f*(θ) can be expressed as the following: for any auxiliary variables with *k* **Y** *k* (ω, *t*) = **Y**(ω, *t*), for any β*k*(ω, *t*) > 0, *k* <sup>β</sup>*k*(ω, *<sup>t</sup>*) <sup>=</sup> 1, for any **<sup>H</sup>***k*(*t*) ∈ R, **<sup>H</sup>***<sup>k</sup>* (*t*) ∈ R, and *p* = 1. The term *<sup>f</sup>*(θ) <sup>≤</sup> *<sup>f</sup>* <sup>+</sup> θ, θ with an auxiliary function was defined as

$$f^{+}\left(\theta,\overline{\theta}\right) \equiv \sum\_{f,k,t} \frac{\left|\overline{\mathbf{F}}^{k}\left(\omega,t\right) - \mathbf{W}^{k}\left(\omega\right)\mathbf{H}^{k}\left(t\right)e^{j\left(\theta^{k}\left(\omega,t\right)\right)}\right|^{2}}{\beta^{k}\left(\omega,t\right)} + \sum\_{k,t} \left[\left(\lambda^{k}\left(t\right)\right)^{p}\left|\overline{\mathbf{H}}^{k}\left(t\right)\right|^{p-2}\mathbf{H}^{k}\left(t\right)^{2} + \left(2-p\right)\left|\overline{\mathbf{H}}^{k}\left(t\right)\right|^{p}\right] - \log\lambda^{k}\left(t\right)\right] \tag{A4}$$

where θ = **Y** *k* (ω, *<sup>t</sup>*), **<sup>H</sup>***<sup>k</sup>* (*t*) <sup>1</sup> <sup>≤</sup> *<sup>f</sup>* <sup>≤</sup> *<sup>F</sup>*, 1 <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>*, 1 <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup>* . The function *<sup>f</sup>* <sup>+</sup> θ, θ is minimized w.r.t. θ when

$$\overline{\mathbf{Y}}^{k}(\omega,t) = \mathbf{W}^{k}(\omega)\overline{\mathbf{H}}^{k}(t)\cdot e^{i\phi^{k}(\omega,t)} + \beta^{k}(\omega,t)(\mathbf{Y}(\omega,t) - \mathbf{X}(\omega,t))\tag{A5}$$

$$\overline{\mathbf{H}}^k(t) = \mathbf{H}^k(t) \tag{A6}$$

### **Appendix B. Estimation of the Spectral Basis and Temporal Code**

In Equation (4), the update rule for <sup>θ</sup> is derived by differentiating *<sup>f</sup>* <sup>+</sup> θ, θ . partially w.r.t. **W***k*(ω) and **H***k*(*t*), and setting them to zero, which yields the following:

$$\mathbf{W}^{k}(\omega) \;= \frac{\sum\_{t} \frac{\mathbf{H}^{k}(t)}{\rho^{k}(\omega, t)} \text{Re} \Big[ \overline{\mathbf{Y}}^{k}(\omega, t)^{\*} \cdot \mathbf{e}^{j} \dot{\Phi}^{k}(\omega, t) \Big]}{\sum\_{t} \frac{\mathbf{H}^{k}(t)^{2}}{\rho^{k}(\omega, t)}} \tag{A7}$$

$$\mathbf{H}^{k}(t) = \frac{\sum\_{f} \frac{\mathbf{W}^{k}(\omega)}{\beta^{k}(\omega, t)} \text{Re} \left[ \overline{\mathbf{Y}}^{k}(\omega, t)^{\*} \cdot e^{j\phi^{k}(\omega, t)} \right]}{\sum\_{f} \frac{\mathbf{W}^{k}(\omega)^{2}}{\beta^{k}(\omega, t)} + (\lambda^{k}(t))^{p} \left| \overline{\mathbf{H}}^{k}(t) \right|^{p-2}} \tag{A8}$$

The update rule for the phase, φ*k*(ω, *t*), can be derived by reformulating Equation (A1) as follows:

*<sup>f</sup>* <sup>+</sup> θ, θ = *k*, *f*,*t* **Y** *k* (ω,*t*) 2 <sup>−</sup> <sup>2</sup>**W***<sup>k</sup>* (ω)**H***<sup>k</sup>* (*t*)Re **Y** *k* (ω,*t*)·*e*−*j*φ*<sup>k</sup>* (ω,*t*) +**W***<sup>k</sup>* (ω) 2 **H***<sup>k</sup>* (*t*) 2 <sup>β</sup>*<sup>k</sup>* (ω,*t*) <sup>+</sup> *k*,*t* λ*k*(*t*) **H***k* (*t*) −1 **H***k*(*t*) <sup>2</sup> <sup>−</sup> **<sup>H</sup>***<sup>k</sup>* (*t*) <sup>−</sup> *k*,*t* log λ*k*(*t*) = *<sup>A</sup>* <sup>−</sup> <sup>2</sup> *k*, *f*,*t* **W***<sup>k</sup>* (ω)**H***<sup>k</sup>* (*t*) **Y** *k* (ω,*t*) β*<sup>k</sup>* (ω,*t*) ⎛ ⎜⎜⎜⎜⎜⎝ *Re* **Y** *k* (ω,*t*)·*e*−*j*φ*<sup>k</sup>* (ω,*t*) **Y** *k* (ω,*t*) ⎞ ⎟⎟⎟⎟⎟⎠ = *<sup>A</sup>* <sup>−</sup> <sup>2</sup> *k*, *f*,*t* **B***k*(ω, *<sup>t</sup>*) *Re***<sup>Y</sup>** *k* (ω,*t*) (*r*) +*j***Y** *k* (ω,*t*) (*i*) (cos <sup>φ</sup>*<sup>k</sup>* (ω,*t*)−*<sup>j</sup>* sin <sup>φ</sup>*<sup>k</sup>* (ω,*t*)) **Y** *k* (ω,*t*) = *<sup>A</sup>* <sup>−</sup> <sup>2</sup> *k*, *f*,*t* **B***k*(ω, *<sup>t</sup>*) cos <sup>φ</sup>*k*(ω, *<sup>t</sup>*)cos <sup>Ω</sup>*k*(ω, *<sup>t</sup>*) <sup>+</sup> sinφ*k*(ω, *<sup>t</sup>*)sin <sup>Ω</sup>*k*(ω,*t*) = *<sup>A</sup>* <sup>−</sup> <sup>2</sup> *k*, *f*,*t* **B***k*(ω, *<sup>t</sup>*) cos φ*k*(ω, *t*) − Ω*k*(ω, *t*) (A9)

where *<sup>A</sup>* denotes the terms that are irrelevant with <sup>φ</sup>*k*(ω, *<sup>t</sup>*), **<sup>B</sup>***k*(ω, *<sup>t</sup>*) <sup>=</sup> **<sup>W</sup>***k*(ω)**H***k*(*t*)**<sup>Y</sup>** *k* (ω,*t*) <sup>β</sup>*k*(ω,*t*) , cos <sup>Ω</sup>*k*(ω, *<sup>t</sup>*) <sup>=</sup> *Re* **Y** *k* (ω,*t*) **Y** *k* (ω,*t*) , and sin <sup>Ω</sup>*k*(ω, *<sup>t</sup>*) <sup>=</sup> Im **Y** *k* (ω,*t*) **Y** *k* (ω,*t*) . The auxiliary function, *<sup>f</sup>* <sup>+</sup> θ, θ in (A4) is minimized when cos φ*k*(ω, *t*) − Ω*k*(ω, *t*) = cos φ*k*(ω, *t*) cos*k*(ω, *t*) + sin φ*k*(ω, *t*)sin Ω*k*(ω, *t*) = 1, namely, cos φ*k*(ω, *t*) = cos Ω*k*(ω, *t*) and sin φ*k*(ω, *t*) = sin Ω*k*(ω, *t*). The update formula for *ej*φ*k*(ω,*t*) eventually leads to

$$\begin{split} e^{j\Phi^{k}(\omega,t)} &= \cos\Phi^{k}(\omega,t) + j\sin\Phi^{k}(\omega,t) \\ &= \frac{\frac{\mathrm{Re}\left[\overline{\mathbf{Y}}^{k}(\omega,t)\right] + \mathrm{Im}\left[\overline{\mathbf{Y}}^{k}(\omega,t)\right]}{\left|\overline{\mathbf{Y}}^{k}(\omega,t)\right|}}{\left|\overline{\mathbf{Y}}^{k}(\omega,t)\right|} \\ &= \frac{\overline{\mathbf{Y}}^{k}(\omega,t)}{\left|\overline{\mathbf{Y}}^{k}(\omega,t)\right|} \end{split} \tag{A10}$$

The update formula for β*k*(ω, *t*) and **H***k*(*t*) for projection onto the constraint space is set to

$$\boldsymbol{\beta}^{k}(\boldsymbol{\omega},t) = \frac{\mathbf{W}^{k}(\boldsymbol{\omega})\mathbf{H}^{k}(t)}{\sum\_{k} \mathbf{W}^{k}(\boldsymbol{\omega})\mathbf{H}^{k}(t)}\tag{A11}$$

$$\mathbf{H}^{k}(t) \leftarrow \frac{\mathbf{H}^{k}(t)}{\sum\_{k} \mathbf{H}^{k}(t)}\tag{A12}$$
