3.1.1. Variational Sparsity Versus Fixed Sparsity

In this implementation, several experiments were conducted to investigate the effect of sparsity regularization on source separation performance. The proposed separation method was evaluated by variational sparsity in the case of (1) uniform constant sparsity with low sparseness e.g., λ*<sup>k</sup> <sup>t</sup>* = 0.01 and (2) uniform constant sparsity with high sparseness (e.g., λ*<sup>k</sup> <sup>t</sup>* = 100). The hypothesis is that the proposed variational sparsity will significantly yield improvement of the audio source separation when compared with fixed sparsity.

To investigate the impact of uniform sparsity parameter, the set of sparsity regularization values from 0 to 10 with a 0.5 interval were determined for each experiment of 60 mixtures of six types. Results of the uniform regularization given by various sparsity (i.e., λ*<sup>k</sup> <sup>t</sup>* = 0, 0.5, ... , 10) is illustrated in Figure 2.

**Figure 2.** Separation results of the proposed method by using different uniform regularization.

Figure 2 illustrates that the best performance of the unsupervised CMF was in a range of 1.5–3, which yielded the highest SDR of over 8dB. When the term λ*<sup>k</sup> <sup>t</sup>* was set too high, the low spectral values of sound-event signals were overly sparse. This overfitting sparsity *Hk*(*t*) caused the separation performance toward a tendency to degrade. Conversely, the underfitting sparsity *Hk*(*t*) occurred when the term λ*<sup>k</sup> <sup>t</sup>* was set too low. The coding parameter *<sup>H</sup>k*(*t*) could not distinguish between the two sound-event signals. It was also noticed that if the factorization is non-regularized, this will cause the separation results to contain a mixed sound. According to the uniform sparsity results in Figure 2, the separation performance of the proposed method varies depending on the assigned sparsity values. Thus, it is challenging to find a solution for the indistinctness among the sound-event sources in the TF representation to determine the optimal value of sparseness. Thus, this introduces the importance of determining the optimal λ for separation. Table 1 presents the essential sparsity value on the separation performance by comparing the proposed method given by variational sparsity against the uniform sparsity scheme. The average performance improvement of the proposed adaptive CMF method against the uniform constant sparsity was 1.32 dB SDR. The SDR results clearly indicate that the adaptive sparsity yielded the surpass separation performance over the constant sparsity scheme. Hence, the proposed variational sparsity improves the performance of the discovered original sound-event signals by adaptively selecting the appropriate sparsity parameters to be individually adapted for

$$\text{reach element code (i.e., } \lambda\_{\mathcal{J}} = \begin{cases} \frac{1}{\int h\_{\mathcal{S}} \text{Vol}\left(\underline{\mathbf{h}}\_{\text{Id}}\right) d\underline{\mathbf{h}}\_{\text{Id}}} = \frac{1}{h^{\text{MAP}}} & \text{if } \mathcal{g} \in M\\ \frac{1}{\int h\_{\mathcal{S}} \text{Op}\left(\underline{\mathbf{h}}\_{\mathcal{D}}\right) d\underline{\mathbf{h}}\_{\mathcal{D}}} = \frac{1}{u\_{\mathcal{S}}} & \text{and } \sigma^{2} = \frac{1}{N\_{0}} \int Q(\underline{\mathbf{h}}) \left( \|\underline{\mathbf{y}} - \overline{\mathbf{A}}\underline{\mathbf{h}}\|^{2} \right) d\underline{\mathbf{h}}\_{\mathcal{D}}\\ & \text{and } \sigma^{2} = M \end{cases}$$

where ˆ *hg* = *h*MAP *<sup>g</sup>* if *g* ∈ *M ug* if *<sup>g</sup>* <sup>∈</sup> *<sup>P</sup>* ). Consequently, the optimal sparsity facilitates the estimated spectral dictionary via the estimated temporal code. The quantitative measures of separation performance were performed to assess the proposed single-channel sound event separation method. The overall average signal-to-distortion ratio (SDR) was 8.62 dB as illustrated in Figure 3.


**Table 1.** Comparison of average SDR performance on three types of mixtures between uniform regularization methods and the proposed method.

**Figure 3.** Average SDR results of six-mixture types.

Each sound-event signal has its own temporal pattern that can be clearly noticed in TF representation. Examples of sound-event signals in the TF domain are illustrated in Figure 4. Through the adaptive *L*1-SCMF method, the proposed single-channel separation method can generate complex temporal patterns such as speech. Thus, the separation results clearly indicate that the performances of noisy source separation perform with high SDR values.

**Figure 4.** *Cont*.

**Figure 4.** Example of time-frequency representation of four sound event classes.

3.1.2. Comparison of the Proposed Adaptive CMF with Other SCBSS Methods Based on NMF

This section presents the adaptive CMF separating performance against the state-of-the-art NMF methods (i.e., CMF, SNMF, and NMF-ISD). In the compared methods, the experimental variables such as the normalizing time-frequency domain were computed by using the short-time Fourier transform (i.e., 1024-point Hanning window with 50% overlap). The number of factors was two, with a sparsity weight of 1.5. One hundred random realizations of twenty second-event mixtures were executed. As a result, the average SDRs are presented in Table 2. The proposed adaptive CMF method yielded the best separating performance over the CMF, SNMF, and NMF\_ISD methods with the average improvement SDR at 2.13 dB. The estimated door open signals obtained the highest SDR among the four event categories.


**Table 2.** Comparison of average SDR and SIR performance on three types of mixtures between SCICA, NMF-ISD, SNMF, CMF, and the proposed method.

The sparsity parameter was carefully adapted using the proposed adaptive *L1*-SCMF method exploiting the phase information and temporal code of the sources, which is inherently ignored by SNMF and NMF-ISD and has led to an improved performance of about 2 dB in SDR. On the other hand, the parts decomposed by the CMF, SNMF, and NMF-ISD methods were unable to capture the phase spectra and the temporal dependency of the frequency patterns within the audio signal.

Additionally, the CMF and NMF-ISD are unique when the signal adequately spans the positive octant. Thus, the rotation of **W** and opposite **H** can obtain the same results. The CMF method can easily be over or under sparse resolution of the factorization due to manually determining the sparsity value.

### *3.2. Performance of Event Classification Based on MSVM Algorithm*

This section elucidates the features and performance of the MSVM-learning model. The MSVM-learning model was investigated to obtain the optimal size of the sliding window and then determine the significant features that led to the classification performance. Finally, the efficiency of the MSVM model was evaluated. These topics are presented in order in the following parts.
