Next Article in Journal
High Latitude Ionospheric Gradient Observation Results from a Multi-Scale Network
Previous Article in Journal
Can Satellite Remote Sensing Assist in the Characterization of Yeasts Related to Biogeographical Origin?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sparse Component Analysis (SCA) Based on Adaptive Time—Frequency Thresholding for Underdetermined Blind Source Separation (UBSS)

by
Norsalina Hassan
1 and
Dzati Athiar Ramli
2,*
1
Department of Electrical Engineering, Politeknik Seberang Perai, Jalan Permatang Pauh, Bukit Mertajam 13700, Pulau Pinang, Malaysia
2
School of Electrical & Electronic Engineering, Universiti Sains Malaysia, Nibong Tebal 14300, Pulau Pinang, Malaysia
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(4), 2060; https://doi.org/10.3390/s23042060
Submission received: 6 January 2023 / Revised: 30 January 2023 / Accepted: 8 February 2023 / Published: 11 February 2023
(This article belongs to the Section Physical Sensors)

Abstract

:
Blind source separation (BSS) recovers source signals from observations without knowing the mixing process or source signals. Underdetermined blind source separation (UBSS) occurs when there are fewer mixes than source signals. Sparse component analysis (SCA) is a general UBSS solution that benefits from sparse source signals which consists of (1) mixing matrix estimation and (2) source recovery estimation. The first stage of SCA is crucial, as it will have an impact on the recovery of the source. Single-source points (SSPs) were detected and clustered during the process of mixing matrix estimation. Adaptive time–frequency thresholding (ATFT) was introduced to increase the accuracy of the mixing matrix estimations. ATFT only used significant TF coefficients to detect the SSPs. After identifying the SSPs, hierarchical clustering approximates the mixing matrix. The second stage of SCA estimated the source recovery using least squares methods. The mixing matrix and source recovery estimations were evaluated using the error rate and mean squared error (MSE) metrics. The experimental results on four bioacoustics signals using ATFT demonstrated that the proposed technique outperformed the baseline method, Zhen’s method, and three state-of-the-art methods over a wide range of signal-to-noise ratio (SNR) ranges while consuming less time.

1. Introduction

When the original source signals of interest cannot be recorded directly and only a combination of mixtures is available, a source separation problem will exist. The objective of the source separation problem is to isolate the original signals from a combination of several source signals. Blind source separation (BSS) is a typical method used for source separation in the signal processing research community and it has gained considerable attention [1,2,3]. Recently, BSS theory has been refined and enhanced. In selecting the proper BSS approach, the link between the number of observations (sensors) and the number of underlying sources is of paramount importance. BSS is categorised as determined BSS (M = N), overdetermined BSS (M > N) and underdetermined BSS (UBSS) (M < N), where M represents the number of observations (sensors) and N represents the number of sources [4]. Despite being a trickier problem than the other two branches of BSS problems, UBSS is the most popular one because it is the most suitable BSS to suit practical applications, especially when the number of sensors is less than the number of sources [5].
UBSS has garnered substantial scholarly interest, and numerous solutions to this issue have been offered [6,7,8,9,10]. UBSS is classified into two classes, namely, statistic characteristic and sparsity [11]. Sparsity is a potent regulatory concept for reducing the number of possible solutions or locating a more realistic solution. In a given representation, a signal is considered sparse if a small number of coefficients can adequately explain the majority of its characteristics. If the source signals are sparse, it is easy to identify the mixing mechanism. The sparse component analysis (SCA) methodology has emerged as a crucial method for dealing with UBSS because it uses the sparseness of nonstationary signals [12]. UBSS based on SCA is divided into two primary stages: the estimation of the mixing matrix and the estimation of the source signal [13]. Since the exact estimation of the mixing matrix is the foundation of signal recovery, the first step must be precise. Sparsity-based practices focus on extracting the non-negative sources. Much work has been carried out on what is called SCA [14,15,16,17,18]. Several techniques for obtaining sparsity in the transform domain have been developed thus far, including the short-time Fourier Transform (STFT) and the wavelet packet transform [19,20,21,22,23]. UBSS techniques are highly dependent on the sparsity of the source signal. If the signal in the TF plane is not sufficiently sparse or if noises exist after the transformation, there will be some outliers present. To solve either of these issues and boost the estimated performance of the mixing matrix, several estimation techniques based on the detection of single-source points (SSPs) have been developed [16,24,25,26,27,28,29,30].
The authors of [25] introduced the concept of SSP detection. The authors introduced the time–frequency (TF) ratio of the mixtures (TIFROM) to further relax the assumption of a TF disjoint. This strategy provided some overlaps between the source of the TF representations. The identification of SSPs using the ratio of the TF transforms was also reported in [30]. By proposing an STFT-based technique for estimating the mixing matrix, [24] demonstrated that a TF point is an SSP when the absolute direction of the real and the imaginary parts of the TF vectors of the mixed signals are the same. The same procedure of detecting the SSPs can be found in [13]. For a time-delayed mixing matrix with a single-source point, ref. [27] showed that the time-delayed mixing model of the UBSS problem works well with the single-source identification-based mixing matrix estimation strategy. The authors of [26] utilised the single-source detection (SSD) algorithm that recognises the TF points occupied by a single source for each source. An SSP identification technique based on the transformation matrix was proposed by [28]. The number of source signals was then identified by looking for the maximum value of the potential function. However, the existence of noise in the system had a considerable impact on the selection of the peak values. Most of the SSP-based UBSS algorithms ignored the link between SSPs, resulting in low SSP identification accuracy, especially in noisy cases. The authors of [16] proposed recognising SSPs by using sparse coding, which accounts for the linear correlations between SSPs. Consequently, even at a low signal-to-noise ratio (SNR), this approach has been observed to be capable of delivering exceptional mixing matrix estimation performance. Table 1 shows a summary of the research related to SSP detection.
The method in [16] used a predetermined parameter to select the STFT coefficients before detecting the SSPs to estimate the mixing matrix. Choosing the right parameter is difficult since incorrect choices will increase the errors in the estimated source. Consequently, it remains challenging to identify SSPs reliably and effectively. To estimate the mixing matrix, this research constructed the first stage of UBSS by using SCA and a technique prior to SSP detection. The work was inspired by the following aims: (1) how to enhance the accuracy of the mixing matrix estimation and (2) how to reduce the computational time (time cost) in estimating the mixing matrix. The authors of [30] proved that only a minimal number of TF coefficients is enough to fully estimate the mixing matrix before identifying single source points. The TF selection factor is also able to increase the level of accuracy and boost the computational efficiency. From this perspective, in comparison with the method in [16], an adaptive algorithm for selecting the significant coefficient before it can be used as an input to detect the SSPs was proposed in this study.
The goal of this work was to propose an adaptive approach for selecting meaningful data from TF mixtures that facilitates better identification of SSPs. The study focused on increasing the accuracy and decreasing the time costs of estimating the mixing matrix as well as enhancing the quality of source signal separation. After identifying the SSPs in the TF plane during the estimation of the mixing matrix, these points are clustered by hierarchical clustering to estimate the mixing matrix. The second stage of SCA employed the least squares method to carry out the source recovery estimation.
The remainder of this article is organised as follows. Section 2 describes the generation of mixed signals. The proposed method is introduced in Section 3. Section 4 discusses the source recovery method. Section 5 analyses the performance of the proposed method, and the conclusions are given in Section 5.

2. The UBSS–SCA-Based Method

The UBSS approach based on SCA is made up of two phases. The initial step of the procedure is to estimate the mixing matrix, followed by estimating the source signal. The procedure starts with mixed signal generation. The flowchart of the entire process is shown in Figure 1, with the proposed technique noted in the green box.

3. Mixed Signal Generation

This study considered an underdetermined instantaneous mixed model where the mixed signal in the time domain can be expressed as
X t = A S ( t )
where X t = [ X 1 ( t ) , X 2 ( t ) , , X M ( t ) ] T represents the M-many mixed signals received by the sensors at the receiving end, A = [ a 1 , a 2 , , a N ] is the mixing matrix, the i th column is called the steering vector corresponding to the i t h source, S t = [ S 1 ( t ) , S 2 ( t ) , , S N ( t ) ] T represents the M-many sources, T is the instant time and T represents the transpose operation. STFT was used as the sparse method to demonstrate the mechanism of SCA processing. In STFT, sparsity indicates that in every particular time–frequency (TF) bin, the energy of only one of the speakers dominated. The mixed signals were made sparser by applying STFT to Equation (1). Both sides utilised STFT without noise for convenience. The transformed model (the TF domain) of the mixtures was obtained as Equation (2) as follows:
X ~ t , f = A S ~ ( t , f )
where X ~ t , f = [ X ~ 1 ( t , f ) , X ~ 2 ( t , f ) , , X ~ M ( t , f ) ] T , with X ~ i t , f standing for the STFT coefficients of the i th mixture’s mixed signals in the f th frequency bin at time t , and S ~ t , f = [ S ~ 1 ( t , f ) , S ~ 2 ( t , f ) , , S ~ N ( t , f ) ] T , with S ~ j t , f representing the STFT coefficients of the j th source signal in the f th frequency bin at time t .
The mixed signals in the time–frequency (TF) domain was used to estimate the mixing matrix. The mixing matrix estimation is an essential procedure in SCA, and it can be improved in two ways: single-source point (SSP) detection and clustering. In this work, an adaptive algorithm was applied before the SSP detection step to increase the accuracy and reduce the complexity. An SSP is a time–frequency point at which the energy of just one source is significant, while the energies of signals from other sources are either zero or very near to zero at that point. By locating the SSPs, the data clustering can be enhanced, allowing clustering algorithms to be able to estimate the mixing matrix more accurately.

4. Mixing Matrix Estimation

4.1. The Proposed Adaptive Time–Frequency Thresholding (ATFT) Method

Through the development of mixing matrix estimation, research has been carried out to find a new way to improve SCA. Estimating an unknown mixing matrix accurately is crucial to provide a sufficient condition for the recoverability of a source vector. Many different SSP detection strategies have been put forth as means of enhancing the accuracy of mixing matrix estimates. By extracting 1D subspaces from the whole collection of TF representation vectors of the observed mixed signals, ref. [16] provided a technique for identifying SSPs that took a pragmatic approach. To estimate the mixing matrix, it employed a fixed parameter to decide which STFT coefficients to use before looking for the SSPs. Only a few TF coefficients were found to be sufficient to fully estimate the mixing matrix before identifying single-source locations, as stated by [8]. However, there is another possible method of improving the accuracy and computing the efficiency, which is by using a TF selection factor. An algorithm that can adaptively determine the significant coefficients was proposed in this work to better estimate the mixing matrix. The primary contribution of this work was to formulate a method, which was called adaptive time–frequency thresholding (ATFT) and was able to reduce the dimensions of a mixture of TF vectors. This decrease was important so that the complexity was reduced, and a significant coefficient was selected accurately before it was used as an input to detect the single-source points (SSPs). The steps of the proposed ATFT method are shown in Figure 2.
According to Figure 2, the objective here is to select the significant coefficients of the TF vectors rather than blindly detecting the SSPs by using all the TF vectors. Through use of the proposed ATFT method, the significant TF vector was chosen by setting an adaptive threshold, α , to select the column with more energy than the given level. The threshold was calculated via the following steps:
Step 1: The normalisation process was executed for each element in the TF vectors of the mixed signals would have a unit norm, using the following operation:
T F n o r m j = i = 1 j X ~ M j ( t , f ) 2
where M is the number of mixed signals and j = 1,2 , 3 , D , and D is the dimension/number of elements in the array. Next, the mean of the norm of the results in Equation (3) was calculated and denoted as
E ( T F n o r m j ) = 1 D j = 1 D T F n o r m j
Step 2: An adaptive threshold, α , was set to select the significant coefficient from the TF mixtures.
  • Let i n d e x j = [ 1 , 2 , 3 , , D ] .
  • The threshold of ATFT, α , is obtained from the norm of the result of all mixtures and was computed by
    α = ( d = 0 D - 1 T F n o r m j 2 ) 1 / 2 / D
The threshold values varied for each different mixture and dataset. The selected column must meet the condition.
3.
According to Equation (5), the thresholding operation is defined as
i n d e x j = 1 , T F n o r m j > E ( T F n o r m j ) α 0 , o t h e r w i s e
where j = 1,2 , 3 , D . Here, i n d j shows that only the mixture vectors with the norm that is greater than E ( T F n o r m j ) α is used. This will determine which column of the TF vectors will be used for the following process.
Step 3: From Equation (6), the corresponding column of the TF mixture vectors will be selected as follows:
X ~ y z t , f = X ~ i j t , f index j = 1 n o t   s e l e c t , o t h e r w i s e
According to Equation (7), y = i , j = 1,2 , 3 , D , z = 1,2 , 3 , D ^ ; D ^ is the new dimension and X ~ y z t , f represents the new TF vectors which are selected adaptively from the initial TF vectors in Equation (2) using Equation (5). The purposes of this step are to select the significant coefficients and reduce the computational burden. Only X ~ y z t , f is used further for SSP detection and mixing matrix estimation. Selection of the column with a significant coefficient, the mixing matrix estimation can be improved.

4.2. Single Source Point (SSP) Detection

In this study, sparse coding was used to find and extract the SSPs from the ATFT method’s mixture vectors. Sparse coding is a technique for representing an input signal with a limited number of basic functions [31]. Sparse coding was applied for each X ~ y z t , f to detect the SSPs. Two assumptions were made to estimate the mixing matrix, i.e., (A1) in the mixing matrix A , any column vector is linearly independent; (A2) there are some TF points ( t , f ) for each source s i in which only s i is dominant, S ~ i ( t , f ) S ~ j ( t , f ) j i [19]. Based on these assumptions, for each source signal S t at any TF point such as ( u , v ) where only one source is active, S ~ i ( u , v ) is written as
X ~ u , v = S ~ i ( u , v ) a i
where ( u , v ) is an SSP corresponding to S t . Using the mixture vectors at these SSPs allows the estimation of the column vectors of the mixing matrix. At another TF point such as ( φ , ω ) with only a single active source, S ~ i ( φ , ω ) , for each source signal S t , Equation (9) is obtained as
X ~ φ , ω = S ~ i ( φ , ω ) a i
From Equations (8) and (9), both the SSPs, X ~ u , v and X ~ φ , ω , are collinear with a i at u , v and φ , ω respectively. As validated by [16], if X ~ u , v and X ~ φ , ω are both SSPs corresponding to the same active sources, they can be linearly represented as
X ~ u , v = X ~ φ , ω r
where r is a real number. The SSP can be represented as a one-dimensional (1D) subspace, and its identification can be viewed as the identification of the 1D subspaces in datasets of mixed signals using sparse coding techniques. To obtain the sparse coding solution, each mixture’s ATFT vectors can be written as a linear combination of other TF mixtures’ vectors, Y i ~ :
Y i ~ = Y ~ b i ,   s . t .   b i i = 0
where Y ~ [ y 1 , y 2 , , y q ] ; y 1 , y 2 , , y q form the set of all the mixtures’ ATFT vectors; b i [ b i 1 , b i 2 , , b i q ] is the coding coefficient vector and q is the number of TF points. The trivial approach of expressing a mixture TF vector by itself is eliminated by the constraint b i i = 0 . Sparse coding attempts to find a solution, b i , in which nonzero entries are a mixture of the TF vectors from the same subspace as Y i ~ ; b i is the sparse representation of Y i ~ . If b i is sufficiently sparse, the solution to Equation (11) is acquired by optimising the following objective function:
J b i ; λ = λ b i 1 + 1 2 Y ~ i - Y ~ b i 2 2   s . t . b i i = 0
where λ > 0 is a regularisation parameter that controls the trade-off between good reconstruction and sparsity in the representation. The value of 0.001 was chosen for λ in this study. A column vector of Y ~ was extracted corresponding to the only nonzero element of b corresponding to one point of the 1D subspace of Y ~ . The l 1 -norm solver from the MATLAB package was used, and this convex minimisation problem effectively solved Equation (12) by using the convex optimisation method.
After obtaining the sparse coding solutions for each TF point, TF points with just one nonzero element were considered in the sparse coding coefficient vector to be SSPs. Following the identification of the SSPs, the next step was to estimate the mixing matrix. The detected SSPs consisted of the set of normalised column vectors of the mixing matrix denoted as Ω, as stated in Equation (13):
Ω = { y 1 , y 2 , , y N }
To aid the clustering analysis, the data must be normalised after the SSP detection process, which entails mapping the data points to the positive half-unit circle. The normalised SSPs then will be clustered by the clustering method. In summary, the proposed ATFT and SSP approaches are described in Algorithm 1.
Algorithm 1: Mixing matrix estimation procedure using the proposed ATFT method
Input: The mixed signal, X ^ t .
  • X ^ t is transformed from the time domain into the TF domain using STFT to produce X ~ t , f .
  • Each element in the mixture vectors, X ~ t , f , is normalised to have a unit norm, T F n o r m j = i = 1 j X ~ M j ( t , f ) 2 where j = 1,2 , 3 , D . D is the dimension.
  • The mean of the unit norm, E ( T F n o r m j ) = 1 D j = 1 D T F n o r m j , i s   c o m p u t e d .
  • The threshold α is used to select the significant coefficient.
    • Let i n d e x j = [ 1 , 2 , 3 , , D ]
      i n d e x j = 1 , T F n o r m j > E ( T F n o r m j ) α 0 , o t h e r w i s e
      where j = 1,2 , 3 , D .   α = 3 ( d = 0 D - 1 T F n o r m j 2 ) 1 / 2 / D , and N = D
    • X ~ y z t , f = X ~ i j ( t , f ) i n d e x j = 1 n o t   s e l e c t , o t h e r w i s e
      where u = i , j = 1,2 , 3 , D , v = 1,2 , 3 , D ^ . D ^ = new dimension
  • For each X ~ y z t , f , the sparse coding coefficients are computed by using l 1 -norm optimisation.
  • The mixture of TF vectors with sparse coding coefficient vectors containing only one nonzero element is added into Ω.
  • The elements are normalised on Ω.
Output: Normalised Ω.

4.3. Clustering

The hierarchical clustering algorithm [24] was used to cluster and extract the SSPs vectors, = { y 1 , y 2 , , y N } , followed by grouping its elements into n clusters. The initial cluster centre was set as C = [ c 1 , c 2 , , c n ] and n represented the cluster’s centre. In addition, 1 - cos θ was used as the distance measure, where cos θ = X ~ m T X ~ n / ( X ~ m X ~ m ) is the cosine of the angle between the m t h and n t h sample vectors X ~ m and X ~ n in X ~ . The centroid of each cluster was identified after clustering to compute the column vectors of the mixing matrix. To further reduce the error of the mixing matrix estimation, outlier points were those that were far from the cluster’s mean direction and were eliminated. Points in the data that met the requirement Ø Q i - μ Ø Q > ϵ σ Ø Q were removed, where Ø Q i is the absolute direction of the i t h sample in the Q t h cluster and μ Ø Q is the mean of the absolute direction of the samples in the Q t h cluster. The new cluster centre C n e w = [ c n e w 1 , c n e w 2 , , c n e w n ] was obtained via the mean value of each cluster. The procedure is summarised in Algorithm 2.
Algorithm 2: The procedure of mixing matrix estimation using hierarchical clustering
Input: The extracted SSP vectors, = { y 1 , y 2 , , y N } .
  • The clustering method is applied to the extracted SSPs to group its elements into n clusters.
  • The centres of these n clusters are calculated as the estimations for the columns of the mixing matrix to produce the estimated mixing matrix.
  • The outliers are removed and C n e w is produced.
  • The centres of the clusters are calculated as the estimated mixing matrix, A ~ .
Output: The estimated mixing matrix, A ~ is formed.

5. Source Recovery Estimation

To construct a sparse TF representation of the recovered sources, a sequence of least square problems was used to recover the source signals, minimising the error function by selecting the optimal M × M - 1 submatrix of A ~ . Let A be a set composed of all M × M - 1 submatrices of the estimated mixing matrix, A ~ ; that is,
A = A i A i = [ a ^ θ 1 , a ^ θ 1 , , a ^ θ M - 1 ]
where the indices of these ( M - 1 )-many nonzero elements are denoted as θ 1 , θ 2 , , θ M - 1 and A has C N ( M - 1 ) elements, i = 1 , 2 , , C N ( M - 1 ) . For each TF point X ~ t , f , there is submatrix A * = [ a ^ 1 , a ^ 2 , , a ^ M - 1 ] in the set A that satisfies Equation (15):
X ~ ( t , f ) = A * A * X ~ ( t , f )
where A * is the pseudo-inverse of A * . A * constructs an n -dimensional vector S t , f by setting its j th element as
S ~ j t , f = e i , i f   j = i 0 , o t h e r w i s e
where e = [ e 1 , e 2 , , e M - 1 ] T = A * X ~ t , f , and A * is obtained from Equation (17):
A * = arg min A i A X ~ t , f - A i A i X ~ ( t , f ) 2
Finally, by inverting the STFT, the time domain of the predicted source signals S ~ t was easily retrieved. In summary, Algorithm 3 describes the source recovery procedure of the technique suggested in of this research.
Algorithm 3: Source recovery estimation
Input: The mixed signal, X ^ t , and the estimated mixing matrix, A ~ .
  • X ^ t   i s   t r a n s f o r m e d into the TF domain using STFT.
  • All M × M - 1 submatrices of A ~ are placed into A .
  • The source of TF representation is estimated for each TF point.
  • Using inverse STFT, the estimated source signals are transformed back into the time domain.
Output: The estimated source signals, S ~ .

6. Numerical Simulations

In this section, numerical results demonstrating the efficiency of the proposed methods are presented. Taking into consideration the fact that bioacoustic source separation is comparatively understudied in comparison with human speech separation [32], this study addressed the problem of source separation using the bioacoustic mixed signals consisting of four frog species with unique vocal behaviour. Four real bioacoustic signals, S N ( t ) , acquired from the database of [33] were considered. All the signals were recorded in the mono channel at 16-bit resolution, had a 16 kHz sampling rate and were in .wav format. The size of the STFT was set to 1024 in all experiments, the time step was equal to 512, and the weighting function was the Hanning window.
In all experiments, the source signal was synthetically mixed to create the mixtures. Four source signals with 15,000 samples were randomly selected and mixed with different random 3 × 4 mixing matrices for each simulation test to generate the mixed signals. For each of the mixture signals, the proposed algorithm was implemented to separate these bioacoustic source signals. The mixed signals were combined with Gaussian noise, and the noise performance of the algorithm at various signal-to-noise ratios (SNR) ranging from 5 dB to 45 dB was tested. The average of 100 Monte Carlo simulation tests was obtained to evaluate the performance of the proposed method. Figure 3 shows the four original bioacoustics signals and the three mixed signal waveforms produced when the SNR was 45 dB.
Natural signals such as bioacoustics signals are not often extremely sparse in the temporal domain. The source signals were made sparser by applying STFT. Figure 4a,b demonstrates the scatterplots of the underdetermined mixtures of three mixtures and four sources of bioacoustics signal in the time domain and TF domain, respectively. The three coordinates in the figures are the three components of the mixed vector, i.e., X 1 , X 2 and X 3 . Figure 4a shows that the scattered points in the time domain are disorganised, and these points provided limited insight about the mixing matrix. Because of the moderate sparsity of the sources, the directions of the mixtures were obscure. STFT was applied and the column orientations of the mixing matrix became visible in the scatterplot, as shown in Figure 4b. It is clear from the diagram that there were four clumping lines that corresponded to the column vectors of the mixing matrix.
Figure 5 shows the scatterplot of the TF mixtures after selection by the ATFT and Zhen’s methods. It can be noticed that the ATFT method has a smaller coefficient and a clearer column direction than Zhen’s method. This is because the ATFT method only selects the significant coefficients from the full time–frequency (TF) representation of the vectors of the observed mixed signals. However, Zhen’s method used a fixed parameter to decide which STFT coefficients to use before looking for the SSPs.
Figure 6 depicts the scatterplot following SSP detection using the ATFT and Zhen’s method. There are four lines visible, indicating that the number of sources is four, and their orientation is more distinct than in Figure 5. If we compare Figure 6a,b, it can be observed that the SSP detected in the TF mixtures’ vectors selected by the ATFT method had fewer data points than in Zhen’s method. This result shows that more outliers were effectively removed, indicating the ability of the ATFT method to generate better results than Zhen’s method.
The performance of the proposed technique was assessed by two categories of assessment metrics.

6.1. The Performance of Mixing Matrix Estimation

The following performance indicator was used in this work to assess the performance of mixing matrix estimation:
E r r o r = 1 n i = 1 n ( 1 - a i T a ^ i a i a ^ i )
where a ^ i denotes the estimation of the mixing vector a i and n stands for the number of sources. In general, as the error reduces, the accuracy of the mixing matrix estimation increases. To assess the efficacy of the investigated technique, the error rate was used to assess the accuracy of the mixing matrix estimation and compare it to Zhen’s method. Figure 7 illustrates the average error obtained after 100 Monte Carlo trials had been run with the ATFT method and Zhen’s method when the SNR ranged from 5 dB to 45 dB.
The simulation results showed that for each method, the accuracy improved with an increase in the SNR. Furthermore, the tests revealed that the ATFT approach was able to predict the mixing matrix more accurately than the existing method. Because of the adaptive threshold setting process, it was able to automatically adapt to the data and consistently achieved accurate estimations. The proposed technique provides an excellent alternative for mixing matrix estimation over the baseline techniques, thus lowering the errors. To analyse the complexity of the research method, the time cost was used to represent the total time in second (s) taken by the CPU for estimating the mixing matrix. The simulations were performed using MATLAB 2016(a) and tested in the Windows 7 operating system with an Intel Core i7 with a 2.93 GHz CPU and 8 GB of RAM. Table 2 shows the time cost of estimating the mixing matrix after 100 Monte Carlo trials of ATFT and Zhen’s method.
The results showed that the ATFT method was slightly faster than Zhen’s method for all sets in the bioacoustic database, including the speech signals. Zhen’s technique takes the sparse linear representation of the relationships among all time–frequency (TF) mixing vectors into account to determine the locations of the single sources (SSPs). In contrast, the ATFT method uses only the most significant mixtures. Because of the computational difficulty of the approach, the size of the mixture vectors in the ATFT method were sufficient for obtaining an accurate estimation of the mixing matrix. Therefore, the ATFT method cost less than Zhen’s method, since the number of samples needed to estimate the mixing matrix was smaller than in Zhen’s method. In conclusion, the ATFT method provides a more efficient and lower time cost, and it performs better in real-time applications than Zhen’s method.
To further validate the relevance of the ATFT method, additional related techniques were also used to demonstrate the performance of the suggested method. The results of [25,34,35] were used to make a quantitative comparison in estimating the mixing matrix. Figure 8 presents the comparative performance of different methods in estimating the mixing matrix against different signal-to-noise ratios (SNR) after 100 Monte Carlo trials for bioacoustic signals.
When the SNR was less than 30 dB, the difference in estimation performance between the methods of DUET, TIFROM, Reju, and Zhen, and the ATFT method was rather large. When the SNR exceeded 30 dB, the performance of all estimation methods reduced. Both the ATFT and Zhen’s methods had a more stable and reliable performance. When the error changed to be not more than 0.2 dB, as the SNR decreased from 45 dB to 5 dB, it was observed that the ATFT method still performed the best. In the noisy case, the ATFT method outperformed the methods developed by [25,34,35] with more than a 0.5 dB difference in the error when the SNR decreased from 45 dB to 5 dB.
The performance of the ATFT algorithm was further evaluated by setting different numbers of mixtures M and different numbers of sources N , where M × N = 3 × 4 , 3 × 5 , 3 × 6 , 4 × 5 , 4 × 6 , 4 × 7 . Figure 9 and Figure 10 depict the mean error resulting from 100 Monte Carlo simulations used to estimate the mixing matrix for three and four mixtures with varying numbers of sources, respectively.
When the performance of different settings is compared between Figure 9 and Figure 10, the setting with three mixtures has smaller average errors at 45 dB SNR. This demonstrated that the setting with three mixtures outperformed the setting with four mixtures in terms of the quality of the mixing matrix estimation. This result demonstrated that the estimation of the mixing matrix for three mixtures was superior to the estimation for four mixtures. The two figures demonstrated that the performance of various sensor setups declined as the number of sources increased. This finding showed a significant correlation between the number of sources and sensors, and the performance.

6.2. The Performance of Source Recovery Estimation

The performance of the source recovery estimation was examined using the following method to accomplish bioacoustic source recovery:
M S E = 10 l o g 10 ( 1 n i = 1 n min δ s i - δ s ^ i 2 2 s i 2 2 )
where s i denotes the estimation of the source signals and δ is a scalar reflecting the scalar ambiguity.
Figure 11 depicts the average source recovery performance over 100 Monte Carlo trials against SNR for the ATFT method and Zhen’s method. This simulation used a 3 × 4 mixing model and the estimated mixing matrix from Section 6.1 to obtain the estimated source by inserting them into the source recovery algorithm. It was discovered that as the SNR increased, the MSE for both methods decreased. This inversely proportional relationship occurred because the noise term diminishes with a higher SNR. A smaller MSE indicates superior quality. The first step of SCA is estimation of the mixing matrix. This step is important because it is required to recover the source signals accurately and will directly affect the second stage. Since the ATFT technique outperformed Zhen’s method in predicting the mixing matrix in Section 6.1, it was demonstrated that the proposed method consistently outperformed Zhen’s method for all levels of SNR.
The simulation results of the original sources and the estimated sources are shown in Figure 12. The first-row plots are the original sources. The second-row plots represent the estimated sources using the ATFT method. The third-row plots represent the estimated sources using Zhen’s method. The estimated sources obtained using the ATFT method are rather similar to the original sources in comparison with those of Zhen’s method. Despite both the ATFT method and Zhen’s method using the same algorithm for source recovery, the proposed method generated significantly better results. In conclusion, the accuracy of source recovery depends on the estimation accuracy of the mixing matrix.

7. Conclusions

The primary objectives of this work were to develop an adaptive algorithm that selected significant coefficients from time–frequency (TF) mixtures to promote better detection of SSPs for estimating the mixing matrix by using the SCA-based UBSS technique. In comparing the proposed method with the benchmark method and several state-of-the-art methods after 100 Monte Carlo trials, the results revealed that the TF selection factor of the proposed method provided higher accuracy than the others for estimating the mixing matrix under different signal-to-noise ratios (SNR) ranging from 5 to 45 dB. In terms of source recovery at all levels of SNR, the proposed method excelled in comparison with the benchmark method approach because of its the more accurate estimation of the mixing matrix that it provided. The simulation results illustrated that the estimated sources were recovered successfully for bioacoustic signals. In conclusion, a method based on the SCA technique has successfully been developed to estimate the mixing matrix and source recovery, and it can address the problems concerning the accuracy, time costs and separation quality of UBSS.

Author Contributions

Conceptualisation, N.H. and D.A.R.; methodology, N.H. and D.A.R.; software, N.H.; validation, N.H. and D.A.R.; formal analysis, N.H. and D.A.R.; investigation, N.H. and D.A.R.; resources, N.H. and D.A.R.; data curation, N.H. and D.A.R.; writing—original draft preparation, N.H.; writing—review and editing, N.H. and D.A.R.; visualisation, N.H.; supervision, D.A.R.; project administration, D.A.R.; funding acquisition, D.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported under the Ministry of Higher Education Malaysia for Fundamental Research’s Grant Scheme with project code: FRGS/1/2020/ICT03/USM/02/1.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the Ministry of Higher Education Malaysia for Fundamental Research’s Grant Scheme (project code: FRGS/1/2020/ICT03/USM/02/1).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Comon, P. Independent component analysis, A new concept? Signal Process. 1994, 36, 287–314. [Google Scholar] [CrossRef]
  2. Babatas, E.; Erdogan, A.T. Time and frequency based sparse bounded component analysis algorithms for convolutive mixtures. Signal Process. 2020, 173, 107590. [Google Scholar] [CrossRef]
  3. Sawada, H.; Ono, N.; Kameoka, H.; Kitamura, D.; Saruwatari, H. A review of blind source separation methods: Two converging routes to ILRMA originating from ICA and NMF. APSIPA Trans. Signal Inf. Process. 2019, 8, e12. [Google Scholar] [CrossRef]
  4. Wang, L. A Study on Multi-Subspace Representation of Nonlinear Mixture with Application in Blind Source Separation: Modeling and Performance Analysis. Ph.D. Thesis, Keio University, Tokyo, Japan, 2019. [Google Scholar]
  5. Guan, W.; Dong, L.; Cai, Y.; Yan, J.; Han, Y. Sparse component analysis with optimized clustering for underdetermined blind modal identification. Meas. Sci. Technol. 2019, 30, 125011. [Google Scholar] [CrossRef]
  6. Guo, Q.; Ruan, G.; Liao, Y. A time-frequency domain underdetermined blind source separation algorithm for MIMO radar signals. Symmetry 2017, 9, 104. [Google Scholar] [CrossRef]
  7. Guo, Q.; Li, C.; Ruan, G. Mixing matrix estimation of underdetermined blind source separation based on data field and improved FCM clustering. Symmetry 2018, 10, 21. [Google Scholar] [CrossRef]
  8. Lu, J.; Cheng, W.; Zi, Y. A novel underdetermined blind source separation method and its application to source contribution quantitative estimation. Sensors 2019, 19, 1413. [Google Scholar] [CrossRef]
  9. Yu, G. An underdetermined blind source separation method with application to modal identification. Shock Vib. 2019, 2019, 1–15. [Google Scholar] [CrossRef]
  10. Qiu, P.; Zhang, Y.; Wang, Y.; Yin, Q.; Wang, Q. Underdetermined Speech Source Separation Based on Hybrid Clustering. In Proceedings of the 2021 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; Volume 430074, pp. 3077–3081. [Google Scholar]
  11. Su, Q.; Shen, Y.; Wei, Y.; Deng, C. Underdetermined blind source separation by a novel time–frequency method. AEU -Int. J. Electron. Commun. 2017, 77, 43–49. [Google Scholar] [CrossRef]
  12. Li, Y.; Wang, Y.; Dong, Q. A novel mixing matrix estimation algorithm in instantaneous underdetermined blind source separation. Signal Image Video Process. 2020, 14, 1001–1008. [Google Scholar] [CrossRef]
  13. He, X.; He, F. Underdetermined mixing matrix estimation based on artificial bee colony optimization and single-source-point detection. Multimed. Tools Appl. 2020, 79, 13061–13087. [Google Scholar] [CrossRef]
  14. Li, Y.; Amari, S.I.; Cichocki, A.; Ho, D.W.C.; Xie, S. Underdetermined blind source separation based on sparse representation. IEEE Trans. Signal Process. 2006, 54, 423–437. [Google Scholar]
  15. Liu, C.; Li, Y.; Nie, W. A new underdetermined blind source separation algorithm under the anechoic mixing model. Int. Conf. Signal Process. Proc. ICSP 2016, 2019, 1799–1803. [Google Scholar]
  16. Zhen, L.; Peng, D.; Yi, Z.; Xiang, Y.; Chen, P. Underdetermined Blind Source Separation Using Sparse Coding. IEEE Trans. Neural Networks Learn. Syst. 2017, 28, 3102–3108. [Google Scholar] [CrossRef]
  17. Eqlimi, E.; Makkiabadi, B.; Samadzadehaghdam, N.; Khajehpour, H.; Mohagheghian, F.; Sanei, S. A Novel Underdetermined Source Recovery Algorithm Based on k-Sparse Component Analysis. Circuits Syst. Signal Process. 2019, 38, 1264–1286. [Google Scholar] [CrossRef]
  18. Cheng, W.; Jia, Z.; Chen, X.; Han, L.; Zhou, G.; Gao, L. Underdetermined convolutive blind source separation in time-frequency domain based on single source points and experimental validation. Meas. Sci. Technol. 2020, 31, 095001. [Google Scholar] [CrossRef]
  19. Wang, T.; Yang, F.; Yang, J. Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 802–815. [Google Scholar] [CrossRef]
  20. Li, Y.; Cichocki, A.; Amari, S. Sparse component analysis for blind source separation with less sensors than sources. Ica 2003, 2003, 89–94. [Google Scholar]
  21. Linh-Trung, N.; Aïssa-El-Bey, A.; Abed-Meraim, K.; Belouchrani, A. Underdetermined blind source separation of non-disjoint nonstationary sources in the time-frequency domain. In Proceedings of the 8th International Symposium on Signal Processing and its Applications, ISSPA, Sydney, NSW, Australia, 28–31 August 2005; IEEE: Piscatevi, NJ, USA, 2005; Volume 1, pp. 46–49. [Google Scholar]
  22. He, X.S.; He, F.; Cai, W.H. Underdetermined BSS Based on K-means and AP Clustering. Circuits Syst. Signal Process. 2016, 35, 2881–2913. [Google Scholar] [CrossRef]
  23. Shi, F.; Liu, C. Mixing matrix estimation algorithm for underdetermined instantaneous mixing model. Int. J. Perform. Eng. 2019, 15, 337–345. [Google Scholar] [CrossRef]
  24. Reju, V.G.; Koh, S.N.; Soon, I.Y. An algorithm for mixing matrix estimation in instantaneous blind source separation. Signal Process. 2009, 89, 1762–1773. [Google Scholar] [CrossRef]
  25. Abrard, F.; Deville, Y. A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources. Signal Process. 2005, 85, 1389–1403. [Google Scholar] [CrossRef]
  26. Kim, S.G.; Yoo, C.D. Underdetermined blind source separation based on subspace representation. IEEE Trans. Signal Process. 2009, 57, 2604–2614. [Google Scholar]
  27. Zhang, L.; Yang, J.; Guo, Z.; Zhou, Y. Underdetermined blind source separation from time-delayed mixtures based on prior information exploitation. J. Electr. Eng. Technol. 2015, 10, 2179–2188. [Google Scholar] [CrossRef] [Green Version]
  28. Ye, F.; Chen, J.; Gao, L.; Nie, W.; Sun, Q. A Mixing Matrix Estimation Algorithm for the Time-Delayed Mixing Model of the Underdetermined Blind Source Separation Problem. Circuits Syst. Signal Process. 2019, 38, 1889–1906. [Google Scholar] [CrossRef]
  29. Kumar, M.; Jayanthi, V.E. Underdetermined blind source separation using CapsNet. Soft Comput. 2020, 24, 9011–9019. [Google Scholar] [CrossRef]
  30. Wang, J.; Chen, X.; Zhao, H.; Li, Y.; Yu, D. An Effective Two-Stage Clustering Method for Mixing Matrix Estimation in Instantaneous Underdetermined Blind Source Separation and Its Application in Fault Diagnosis. IEEE Access 2021, 9, 115256–115269. [Google Scholar] [CrossRef]
  31. Wang, X.; Wang, S.; Huang, Z.; Du, Y. Structure regularized sparse coding for data representation. Knowl. -Based Syst. 2019, 174, 87–102. [Google Scholar] [CrossRef]
  32. Bermant, P.C. BioCPPNet: Automatic bioacoustic source separation with deep neural networks. Sci. Rep. 2021, 11, 1–13. [Google Scholar] [CrossRef] [PubMed]
  33. Frogs of Australia > Taxonomy. Available online: https://frogs.org.au/frogs/search.php (accessed on 10 December 2022).
  34. Yilmaz, Ö.; Rickard, S. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 2004, 52, 1830–1846. [Google Scholar] [CrossRef]
  35. Reju, V.G.; Koh, S.N.; Soon, I.Y. Underdetermined convolutive blind source separation via time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 2010, 18, 101–116. [Google Scholar] [CrossRef]
Figure 1. General flow chart of the entire process.
Figure 1. General flow chart of the entire process.
Sensors 23 02060 g001
Figure 2. The flowchart of the proposed ATFT method.
Figure 2. The flowchart of the proposed ATFT method.
Sensors 23 02060 g002
Figure 3. Four original bioacoustic signals and three mixed signals.
Figure 3. Four original bioacoustic signals and three mixed signals.
Sensors 23 02060 g003
Figure 4. Scatterplot of mixed signals for a scenario with three sensors (M = 3) and four sources (N = 4) of bioacoustic signals. (a) Mixtures in the time domain, (b) Mixtures in the time–frequency domain.
Figure 4. Scatterplot of mixed signals for a scenario with three sensors (M = 3) and four sources (N = 4) of bioacoustic signals. (a) Mixtures in the time domain, (b) Mixtures in the time–frequency domain.
Sensors 23 02060 g004
Figure 5. Scatterplot of the TF mixtures’ vectors by (a) the ATFT method and (b) Zhen’s method.
Figure 5. Scatterplot of the TF mixtures’ vectors by (a) the ATFT method and (b) Zhen’s method.
Sensors 23 02060 g005
Figure 6. Scatterplot after SSP detection: (a) ATFT method and (b) Zhen’s method.
Figure 6. Scatterplot after SSP detection: (a) ATFT method and (b) Zhen’s method.
Sensors 23 02060 g006
Figure 7. Comparison of the performance of mixing matrix estimation using the ATFT method and Zhen’s method, tested on bioacoustic signals.
Figure 7. Comparison of the performance of mixing matrix estimation using the ATFT method and Zhen’s method, tested on bioacoustic signals.
Sensors 23 02060 g007
Figure 8. Comparison of the performance of mixing matrix estimation by different methods.
Figure 8. Comparison of the performance of mixing matrix estimation by different methods.
Sensors 23 02060 g008
Figure 9. Comparison of the performance of mixing matrix estimation for three mixtures with different numbers of sources using the ATFT method.
Figure 9. Comparison of the performance of mixing matrix estimation for three mixtures with different numbers of sources using the ATFT method.
Sensors 23 02060 g009
Figure 10. Comparison of the performance of mixing matrix estimation for four mixtures with different numbers of sources using the ATFT method.
Figure 10. Comparison of the performance of mixing matrix estimation for four mixtures with different numbers of sources using the ATFT method.
Sensors 23 02060 g010
Figure 11. Comparison of the performance of source recovery estimation by the ATFT method and Zhen’s method on bioacoustic signals.
Figure 11. Comparison of the performance of source recovery estimation by the ATFT method and Zhen’s method on bioacoustic signals.
Sensors 23 02060 g011
Figure 12. Simulation results of source recovery. First row: the original source signals of four bioacoustic signals. Second row: the estimated sources using the ATFT method. Third row: the estimated sources using Zhen’s method.
Figure 12. Simulation results of source recovery. First row: the original source signals of four bioacoustic signals. Second row: the estimated sources using the ATFT method. Third row: the estimated sources using Zhen’s method.
Sensors 23 02060 g012
Table 1. Summary of research related to the detection of single-source points.
Table 1. Summary of research related to the detection of single-source points.
ReferencesMixing SystemSingle-Source Point DetectionLimitations
[25]Two mixtures of three sources from two guitars and one voiceTime–frequency ratio of the mixtures (TIFROM)Low estimation accuracy for noisy and insufficiently sparse sources
[24]Three, four and five mixtures each for four to seven speech utterancesCompared the absolute directions of the real and imaginary parts of the TF points in the mixing signalsLimited application, requires real-valued entries in the mixing matrix
[26]Three mixtures of four sets of sources consisting of the genres of music, speech, instruments and various soundsAn SSD algorithm that recognises the TF points occupied by a single source for each source.Loses efficiency when the mixing matrix is complex and not real
[27]Two mixtures of three speech signalsExtracts prior information from the complex-valued mixing matrix at the receiver’s endToo much computation or poor robustness
[16]Three mixtures of four speech sourcesSparse codingHas a fixed parameter to select the STFT coefficients before SSP detection
[28]Two mixtures of four speech signalsAn SSP detection technique based on the transformation matrixThe selection of the peak value used to determine the number of source signals is greatly affected by noise
[29]Two mixtures of two sets of sources consisting of three male and female speech signalsCalculates the mixing ratioSensitive to noise in real-world systems
[13]Three mixtures of six flutes Calculating the mixing ratioSensitive to noise
Table 2. The time cost of estimating the mixing matrix.
Table 2. The time cost of estimating the mixing matrix.
DatabaseZhen’s MethodATFT Method
Bioacoustic signals2.33211.9218
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hassan, N.; Ramli, D.A. Sparse Component Analysis (SCA) Based on Adaptive Time—Frequency Thresholding for Underdetermined Blind Source Separation (UBSS). Sensors 2023, 23, 2060. https://doi.org/10.3390/s23042060

AMA Style

Hassan N, Ramli DA. Sparse Component Analysis (SCA) Based on Adaptive Time—Frequency Thresholding for Underdetermined Blind Source Separation (UBSS). Sensors. 2023; 23(4):2060. https://doi.org/10.3390/s23042060

Chicago/Turabian Style

Hassan, Norsalina, and Dzati Athiar Ramli. 2023. "Sparse Component Analysis (SCA) Based on Adaptive Time—Frequency Thresholding for Underdetermined Blind Source Separation (UBSS)" Sensors 23, no. 4: 2060. https://doi.org/10.3390/s23042060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop