Orthogonalization of the Sensing Matrix Through Dominant Columns in Compressive Sensing for Speech Enhancement
Abstract
:1. Introduction
2. Related Work
3. Theoretical Background
3.1. Preprocessing
3.2. Discrete Cosine Transform (DCT) and Inverse DCT (IDCT)
3.3. Framing and Deframing
3.4. Voice Activity Detector
- Spectral Shape: The short-term power spectrum envelope of the speech signal.
- Spectro–temporal modulations: The spectral scales and temporal rates of speech.
- Voicing: The vibrations of the vocal cords (folds) that appear in speech as fundamental and harmonic components of the pitch (frequency).
- Long-term variability: The variations in speech caused by successive phone generation.
3.5. Particle Swarm Optimization (PSO)
Algorithm 1: PSO Algorithm |
Initialize: |
Set initial positions and velocities of each of the particles randomly within a specified range. |
Set intertia weight . |
Set constant and . |
Set for each particle as the initial position. |
Set as the best position among all particles. |
Main: |
While termination criterion is not met do: |
For each particle do: |
Evaluate fitness of particle through objective function presented in eq. (18) |
If current position is better than : |
Update with current position |
If current position is better than : |
Update with current position |
For each particle do: |
Update particle velocity using: |
Update positions of particles using: |
End while |
Return as the best solution found |
3.6. Compressive Sensing (CS)
- is required to be a k-sparse vector that satisfies , where denotes the length of .
- required to be a matrix having full rank.
3.7. Orthogonal Matching Pursuit (OMP)
- The signal needs to be a k-sparse, where .
- The matrix is required to fulfill the condition , where denotes the mutual coherence of column vectors of matrix ϕ and is calculated as follows:
Algorithm 2: OMP Algorithm | |
Inputs: | |
Input signal vector must be k-sparse. | |
/* sparse representation of speech frame using DCT */ | |
Sensing matrix , where . | |
Termination residue threshold . | /* unwanted or noise components */ |
Maximum number of iterations . | |
Initialize: | |
/* iteration counts */ | |
/* measurement vector creation */ | |
/* current residue. */ | |
/*indices of dominant columns.*/ | |
/*contributions of dominant columns.*/ | |
/*dominent columns matrix. */ | |
/* estimated filtered . */ | |
Main: | |
, where | /* columns normalization of sensing matrix */ |
While () do: | |
/* finds the column () with highest contribution (). */ | |
If () then: | /* verifying that the column indices in not the dominant column list */ |
/* adding column indices in the list */ | |
/* adding the contribution of the column */ | |
/* update dominating column matrix */ | |
Else | |
/* update dominating columns contributions only as column indices already exists */ | |
End If | |
/* finding the projection of the onto the dominant basis */ | |
/* remaining residue. */ | |
End While | |
For do: | |
For do: | |
/* generating reduced dimension sensing matrix through dominent columns */ | |
End For | |
End For | |
/* generating reduced dimension */ | |
/* estimation of reduced dimension from */ | |
For () do: | |
/* filling up the full dimension from */ | |
End For | |
Outputs: | |
3.8. Dominant Columns Group Orthogonalization of Sensing Matrix (DCGOSM)
- Let and be the sparse representation of speech and noise signals, respectively.
- If the sensing matrix is denoted by , then the compressed speech and noise signals can be obtained by and , respectively.
- The OMP algorithm can be used to recover the enhanced speech signal from . The OMP finds the dominant columns, their contributions, and the reduced sensing matrix with only dominant columns (denoted by , , and , respectively, in Algorithm 2) to estimate the enhanced speech signal while leaving the residue noise component columns intact.
- If and are the reduced sensing matrix columns’ contributions to the speech and noise signals obtained through OMP, respectively, then the orthogonalization of dominant columns group of the sensing matrix can be achieved by optimizing the columns of the sensing matrix, such that:
3.9. Proposed DCGOSM-Based Speech Enhancement Technique
3.9.1. Process of Obtaining the Required Sensing Matrix
- They assume that the noise components spread equally over all columns or noise must be white noise. However, for sounds other than white noise, this distribution of components will not be possible. Therefore, for sounds other than white noise, the performance of these approaches will naturally decrease.
- Since the dominant columns are not known initially, they need to be searched every time from all columns, which increases processing time.
- The algorithms have no awareness of speech and noise signals, and the selection of dominant columns is done solely on the basis of contribution. The algorithms always assume that higher contribution is due to speech components. Therefore, during higher noise conditions, the algorithms may select columns with dominating noise components. The condition may worsen in non-white noise cases.
- In the first step, the clean speech sample is mixed with the noise using the mixer block, which adjusts the amplitude of the noise signal according to the given SNR and then adds it to the clean speech to generate the noisy speech sample.
- The noisy speech sample is passed through the preprocessing block, which normalizes the signal. Normalization helps to reduce the impact of variations in recording conditions, microphone characteristics, and speaker differences. By normalizing the speech signals, the relative differences in amplitude caused by these factors are minimized, making the subsequent processing algorithms more robust and less sensitive to such variations.
- Subsequently, the signal is divided into frames using overlapping Hamming windows. Considering the block-based processing nature of the CS, these windows are required to divide the speech signal into frames. This process also reduces the computational complexity of subsequent processing steps, and the analysis can be performed on smaller segments, reducing the amount of data to be processed and improving efficiency. The framing of the signal causes spectral leakage at the edges of the frame. The overlapping Hamming windows are used to mitigate spectral leakage by smoothly transitioning between adjacent frames, reducing the abrupt changes at the frame boundaries.
- Signal sparsity is the fundamental condition for CS. Therefore, to make the frame sparse, each frame is converted into the frequency domain using the discrete cosine transform (DCT), then the insignificant DCT components within the transformed frame are set to zero. The term “significance” refers to the components that encompass the most energy, specifically those exceeding 90%. This objective can be accomplished by strategically selecting the highest 25% of the coefficients.
- The non-transformed speech frames are sent to the VAD black for identification of the speech and non-speech frames. In addition, the preliminary noise variance () estimation is performed using the initial frame, which is considered to carry only noise.
- The noisy speech and non-speech frames are then grouped into separate databases for and . This helps in obtaining dominant columns separately for noisy speech and non-speech frames.
- To minimize the optimization time, only a small number of frames are chosen from these databases. The selected noise frames are further used for the noise variance estimation (), which is used during the final enhancement process. Estimating the noise variance from the multiple selected frames reduces the error possibilities, especially in the cases where the noise is not uniformly distributed throughout the entire signal duration.
- PSO is initialized according to the number of particles and maximum generations. Each particle (here, the superscript represents the generation, and the subscript represents the particle) is defined as an dimension row vector, where each element is bounded within the range of .
- Each of these particles () represents a sensing matrix () when converted into an rows and columns matrix:
- However, before using this matrix as a sensing matrix, it is required to make a sample from the uniform spherical ensemble (USE). This is done by dividing each element of each column of the matrix by the column’s norm:
- Using the sensing matrix ( formed by particle , the observation vector for speech, and for noise, frames are generated.
- The OMP is applied to and independently to reconstruct and , respectively. However, our aim is not to find and , but to find the dominating basis vectors (columns) and in for and . The OMP uses as the termination threshold.
- Our aim is to make a dominant columns group orthogonal sensing matrix where the columns used to represent the speech signals and noise signals are different. Therefore, speech signals can be extracted from the noisy samples using only the speech dominant columns group.
- To match the above-mentioned condition, the matrix must satisfy the condition (Equation (17)).
- However, the condition is only possible for the ideal cases. Therefore, the optimization is performed to minimize the value to its maximum extent or up to a certain termination threshold () (Equation (18)).
- is calculated through for each particle .
- Once the is calculated for each particle in the current generation, based on the values obtained from all the generations ( to ), the (particle’s best),\ and (global best) values are estimated.
- Based on these estimations, the new positions of particles are calculated as described in Equations (7) and (8).
- PSO repeats the process until the proper matrix is generated.
- After completion of this process, we obtain the sensing matrix (), final noise variance (), dominating columns for speech signal components (), and noisy-speech frames database ().
3.9.2. Process of Speech Enhancement Using the Obtained Sensing Matrix
- Speech enhancement is performed only on the noisy speech frames separated by VAD during the generation of sensing matrix. Additionally, the enhancement is performed on a frame-by-frame basis using the obtained matrix, the final noise variance (), and the OMP algorithm. However, because the dominating columns for speech signal components () are already known, the OMP is only iterated through these columns instead of all the columns of .
- The enhanced speech frames are then converted into the time domain by taking IDCT and finally converted into a single enhanced speech stream through the de-framing operation.
4. Performance Evaluation and Result Analysis
4.1. Sensing Matrix Optimization
4.2. Speech Quality Analysis
4.3. Recovery Time Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Donoho, D.L. Compressed Sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
- Candes, E.J.; Romberg, J.; Tao, T. Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information. IEEE Trans. Inf. Theory 2006, 52, 489–509. [Google Scholar] [CrossRef] [Green Version]
- Rani, M.; Dhok, S.B.; Deshmukh, R.B. A Systematic Review of Compressive Sensing: Concepts, Implementations and Applications. IEEE Access 2018, 6, 4875–4894. [Google Scholar] [CrossRef]
- Xia, K.; Pan, Z.; Mao, P. Video Compressive Sensing Reconstruction Using Unfolded LSTM. Sensors 2022, 22, 7172. [Google Scholar] [CrossRef]
- Vanjari, H.B.; Kolte, M.T. Comparative Analysis of Speech Enhancement Techniques in Perceptive of Hearing Aid Design. In Proceedings of the Third International Conference on Information Management and Machine Intelligence, Jaipur, India, 23–24 December 2021; Springer Nature: Singapore, 2023; pp. 117–125. [Google Scholar]
- Calisesi, G.; Ghezzi, A.; Ancora, D.; D’Andrea, C.; Valentini, G.; Farina, A.; Bassi, A. Compressed Sensing in Fluorescence Microscopy. Prog. Biophys. Mol. Biol. 2022, 168, 66–80. [Google Scholar] [CrossRef]
- Kwon, H.-M.; Hong, S.-P.; Kang, M.; Seo, J. Data Traffic Reduction with Compressed Sensing in an AIoT System. Comput. Mater. Contin. 2022, 70, 1769–1780. [Google Scholar] [CrossRef]
- Shannon, C.E. Communication in the Presence of Noise. Proc. IRE 1949, 37, 10–21. [Google Scholar] [CrossRef]
- Donoho, D.L.; Stark, P.B. Uncertainty Principles and Signal Recovery. SIAM J. Appl. Math. 1989, 49, 906–931. [Google Scholar] [CrossRef] [Green Version]
- Candès, E.J.; Romberg, J.K.; Tao, T. Stable Signal Recovery from Incomplete and Inaccurate Measurements. Commun. Pure Appl. Math. 2006, 59, 1207–1223. [Google Scholar] [CrossRef] [Green Version]
- Amini, A.; Marvasti, F. Deterministic Construction of Binary, Bipolar, and Ternary Compressed Sensing Matrices. IEEE Trans. Inf. Theory 2011, 57, 2360–2370. [Google Scholar] [CrossRef] [Green Version]
- Ben-Haim, Z.; Eldar, Y.C.; Elad, M. Coherence-Based Performance Guarantees for Estimating a Sparse Vector Under Random Noise. IEEE Trans. Signal Process. 2010, 58, 5030–5043. [Google Scholar] [CrossRef] [Green Version]
- Baraniuk, R.; Davenport, M.; DeVore, R.; Wakin, M. A Simple Proof of the Restricted Isometry Property for Random Matrices. Constr. Approx. 2008, 28, 253–263. [Google Scholar] [CrossRef] [Green Version]
- Abrol, V.; Sharma, P.; Budhiraja, S. Evaluating Performance of Compressed Sensing for Speech Signals. In Proceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, India, 22–23 February 2013; pp. 1159–1164. [Google Scholar]
- Xu, S.-F.; Chen, X.-B. Speech Signal Acquisition Methods Based on Compressive Sensing. In Systems and Computer Technology; CRC Press: Boca Raton, FL, USA, 2015; ISBN 978-0-429-22578-9. [Google Scholar]
- Swami, P.D.; Sharma, R.; Jain, A.; Swami, D.K. Speech Enhancement by Noise Driven Adaptation of Perceptual Scales and Thresholds of Continuous Wavelet Transform Coefficients. Speech Commun. 2015, 70, 1–12. [Google Scholar] [CrossRef]
- Donoho, D.L. For Most Large Underdetermined Systems of Linear Equations the Minimal ℓ1-Norm Solution Is Also the Sparsest Solution. Commun. Pure Appl. Math. 2006, 59, 797–829. [Google Scholar] [CrossRef]
- Yang, H.; Hao, D.; Sun, H.; Liu, Y. Speech Enhancement Using Orthogonal Matching Pursuit Algorithm. In Proceedings of the 2014 International Conference on Orange Technologies, Xi’an, China, 20–23 September 2014; pp. 101–104. [Google Scholar]
- Needell, D.; Tropp, J.A. CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples. Appl. Comput. Harmon. Anal. 2009, 26, 301–321. [Google Scholar] [CrossRef] [Green Version]
- Pilastri, A.L.; Tavares, J.M.R. Reconstruction Algorithms in Compressive Sensing: An Overview. In Proceedings of the 11th edition of the Doctoral Symposium in Informatics Engineering (DSIE-16), Porto, Portugal, 3 February 2016. [Google Scholar]
- Firouzeh, F.F.; Ghorshi, S.; Salsabili, S. Compressed Sensing Based Speech Enhancement. In Proceedings of the 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS), Gold Coast, Australia, 15–17 December 2014; pp. 1–6. [Google Scholar]
- Wu, D.; Zhu, W.-P.; Swamy, M.N.S. Compressive Sensing-Based Speech Enhancement in Non-Sparse Noisy Environments. IET Signal Process. 2013, 7, 450–457. [Google Scholar] [CrossRef]
- Gemmeke, J.F.; Van Hamme, H.; Cranen, B.; Boves, L. Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition. IEEE J. Sel. Top. Signal Process. 2010, 4, 272–287. [Google Scholar] [CrossRef] [Green Version]
- Wu, D.; Zhu, W.-P.; Swamy, M.N.S. The Theory of Compressive Sensing Matching Pursuit Considering Time-Domain Noise with Application to Speech Enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 682–696. [Google Scholar] [CrossRef]
- Quackenbush, S.R.; Barnwell, T.P.; Clements, M.A. Objective Measures of Speech Quality; Prentice Hall: Upper Saddle River, NJ, USA, 1988; ISBN 978-0-13-629056-8. [Google Scholar]
- ITU-T Recommendation Database. Available online: https://www.itu.int/ITU-T/recommendations/rec.aspx?rec=14949&lang=en (accessed on 7 June 2023).
- Speech Coding and Synthesis; Kleijn, W.B.; Paliwal, K.K. (Eds.) Elsevier: Amsterdam, The Netherlands; New York, NY, USA, 1995; ISBN 978-0-444-82169-0. [Google Scholar]
- Hu, Y.; Loizou, P.C. Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 229–238. [Google Scholar] [CrossRef]
- Tribolet, J.; Noll, P.; McDermott, B.; Crochiere, R. A Study of Complexity and Quality of Speech Waveform Coders. In Proceedings of the ICASSP’78. IEEE International Conference on Acoustics, Speech, and Signal Processing, Tulsa, OK, USA, 10–12 April 1978; Volume 3, pp. 586–590. [Google Scholar]
- Hansen, J.H.L.; Pellom, B.L. An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP 1998), Sydney, Australia, 30 November 1998; ISCA; p. 0917-0. [Google Scholar]
- Klatt, D. Prediction of Perceived Phonetic Distance from Critical-Band Spectra: A First Step. In Proceedings of the ICASSP’82. IEEE International Conference on Acoustics, Speech, and Signal Processing, Paris, France, 3–5 May 1982; Volume 7, pp. 1278–1281. [Google Scholar]
- P.862: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs. Available online: https://www.itu.int/rec/T-REC-P.862 (accessed on 7 June 2023).
- P.862.3: Application Guide for Objective Quality Measurement Based on Recommendations P.862, P.862.1 and P.862.2. Available online: https://www.itu.int/rec/T-REC-P.862.3/_page.print (accessed on 7 June 2023).
- Crespo Marques, E.; Maciel, N.; Naviner, L.; Cai, H.; Yang, J. A Review of Sparse Recovery Algorithms. IEEE Access 2019, 7, 1300–1322. [Google Scholar] [CrossRef]
- Mallat, S.G.; Zhang, Z. Matching Pursuits with Time-Frequency Dictionaries. IEEE Trans. Signal Process. 1993, 41, 3397–3415. [Google Scholar] [CrossRef] [Green Version]
- de Paiva, N.M.; Marques, E.C.; de Barros Naviner, L.A. Sparsity Analysis Using a Mixed Approach with Greedy and LS Algorithms on Channel Estimation. In Proceedings of the 2017 3rd International Conference on Frontiers of Signal Processing (ICFSP), Paris, France, 6–8 September 2017; pp. 91–95. [Google Scholar]
- Dai, W.; Milenkovic, O. Subspace Pursuit for Compressive Sensing Signal Reconstruction. IEEE Trans. Inf. Theory 2009, 55, 2230–2249. [Google Scholar] [CrossRef] [Green Version]
- Donoho, D.L.; Tsaig, Y.; Drori, I.; Starck, J.-L. Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit. IEEE Trans. Inf. Theory 2012, 58, 1094–1121. [Google Scholar] [CrossRef]
- Needell, D.; Vershynin, R. Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit. Found. Comput. Math. 2009, 9, 317–334. [Google Scholar] [CrossRef]
- Wang, J.; Kwon, S.; Shim, B. Generalized Orthogonal Matching Pursuit. IEEE Trans. Signal Process. 2012, 60, 6202–6216. [Google Scholar] [CrossRef] [Green Version]
- Sun, H.; Ni, L. Compressed Sensing Data Reconstruction Using Adaptive Generalized Orthogonal Matching Pursuit Algorithm. In Proceedings of the 2013 3rd International Conference on Computer Science and Network Technology, Dalian, China, 12–13 October 2013; pp. 1102–1106. [Google Scholar]
- Bi, X.; Leng, L.; Kim, C.; Liu, X.; Du, Y.; Liu, F. Constrained Backtracking Matching Pursuit Algorithm for Image Reconstruction in Compressed Sensing. Appl. Sci. 2021, 11, 1435. [Google Scholar] [CrossRef]
- GBRAMP: A Generalized Backtracking Regularized Adaptive Matching Pursuit Algorithm for Signal Reconstruction—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0045790621001907 (accessed on 7 June 2023).
- Zhang, C. An Orthogonal Matching Pursuit Algorithm Based on Singular Value Decomposition. Circuits Syst. Signal Process. 2020, 39, 492–501. [Google Scholar] [CrossRef]
- Das, S.; Mandal, J.K. An Enhanced Block-Based Compressed Sensing Technique Using Orthogonal Matching Pursuit. Signal Image Video Process. 2021, 15, 563–570. [Google Scholar] [CrossRef]
- Blumensath, T.; Davies, M.E. Gradient Pursuits. IEEE Trans. Signal Process. 2008, 56, 2370–2382. [Google Scholar] [CrossRef]
- Kwon, S.; Wang, J.; Shim, B. Multipath Matching Pursuit. IEEE Trans. Inf. Theory 2014, 60, 2986–3001. [Google Scholar] [CrossRef] [Green Version]
- Elad, M. Optimized Projections for Compressed Sensing. IEEE Trans. Signal Process. 2007, 55, 5695–5702. [Google Scholar] [CrossRef]
- Singh, R.; Bhattacharjee, U.; Singh, A.K. Performance Evaluation of Normalization Techniques in Adverse Conditions. Procedia Comput. Sci. 2020, 171, 1581–1590. [Google Scholar] [CrossRef]
- Blinn, J.F. What’s That Deal with the DCT? IEEE Comput. Graph. Appl. 1993, 13, 78–83. [Google Scholar] [CrossRef]
- Stanković, L.; Brajović, M. Analysis of the Reconstruction of Sparse Signals in the DCT Domain Applied to Audio Signals. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 1220–1235. [Google Scholar] [CrossRef]
- Reznik, Y.A. Relationship between DCT-II, DCT-VI, and DST-VII Transforms. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 5642–5646. [Google Scholar]
- Prabhu, K.M.M. Window Functions and Their Applications in Signal Processing; Taylor & Francis: Oxford, UK, 2014; ISBN 978-1-4665-1584-0. [Google Scholar]
- Shukla, V.; Swami, P.D. Speech Enhancement Using VAD for Noise Estimation in Compressive Sensing. In Proceedings of the Data, Engineering and Applications; Sharma, S., Peng, S.-L., Agrawal, J., Shukla, R.K., Le, D.-N., Eds.; Springer Nature: Singapore, 2022; pp. 357–369. [Google Scholar]
- Lokesh, S.; Devi, M.R. Speech Recognition System Using Enhanced Mel Frequency Cepstral Coefficient with Windowing and Framing Method. Clust. Comput 2019, 22, 11669–11679. [Google Scholar] [CrossRef]
- Van Segbroeck, M.; Tsiartas, A.; Narayanan, S.S. A Robust Frontend for VAD: Exploiting Contextual, Discriminative and Spectral Cues of Human Voice. In Proceedings of the INTERSPEECH, Lyon, France, 25–29 August 2013; pp. 704–708. [Google Scholar]
- Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
- Candes, E.J.; Wakin, M.B. An Introduction To Compressive Sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
- Jerri, A.J. The Shannon Sampling Theorem—Its Various Extensions and Applications: A Tutorial Review. Proc. IEEE 1977, 65, 1565–1596. [Google Scholar] [CrossRef]
- Yuan, X.; Haimi-Cohen, R. Image Compression Based on Compressive Sensing: End-to-End Comparison with JPEG 2020. IEEE Trans. Multimed. 2020, 22, 2889–2904. [Google Scholar] [CrossRef] [Green Version]
- Fira, M.; Costin, H.-N.; Goraș, L. A Study on Dictionary Selection in Compressive Sensing for ECG Signals Compression and Classification. Biosensors 2022, 12, 146. [Google Scholar] [CrossRef]
- Golub, G.; Kahan, W. Calculating the Singular Values and Pseudo-Inverse of a Matrix. J. Soc. Ind. Appl. Math. Ser. B Numer. Anal. 1965, 2, 205–224. [Google Scholar] [CrossRef]
- Haneche, H.; Boudraa, B.; Ouahabi, A. A New Way to Enhance Speech Signal Based on Compressed Sensing. Measurement 2020, 151, 107117. [Google Scholar] [CrossRef]
- Fu, S.-W.; Liao, C.-F.; Tsao, Y. Learning With Learned Loss Function: Speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality. IEEE Signal Process. Lett. 2020, 27, 26–30. [Google Scholar] [CrossRef] [Green Version]
- Martin-Doñas, J.M.; Gomez, A.M.; Gonzalez, J.A.; Peinado, A.M. A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality. IEEE Signal Process. Lett. 2018, 25, 1680–1684. [Google Scholar] [CrossRef]
- Varga, A.; Steeneken, H.J.M. Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Commun. 1993, 12, 247–251. [Google Scholar] [CrossRef]
- Hu, Y.; Loizou, P.C. Subjective Comparison and Evaluation of Speech Enhancement Algorithms. Speech Commun. 2007, 49, 588–601. [Google Scholar] [CrossRef] [Green Version]
Techniques | ||||||||
---|---|---|---|---|---|---|---|---|
OMP | CoSaMP | StOMP | Proposed | |||||
SNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) |
0 | 5.74 | 4.631 | 3.113 | 2.814 | 4.417 | 3.581 | 6.380 | 5.874 |
5 | 9.124 | 8.049 | 7.481 | 6.328 | 8.392 | 7.164 | 9.627 | 9.157 |
10 | 12.339 | 11.385 | 11.542 | 10.221 | 11.916 | 10.773 | 12.639 | 12.298 |
15 | 14.862 | 14.101 | 14.658 | 13.527 | 14.696 | 13.807 | 14.993 | 14.745 |
20 | 16.266 | 15.941 | 16.267 | 15.765 | 16.299 | 15.944 | 16.390 | 16.386 |
SNR (dB) | PESQ | STOI | PESQ | STOI | PESQ | STOI | PESQ | STOI |
0 | 1.801 | 0.688 | 1.759 | 0.634 | 1.796 | 0.608 | 2.296 | 0.748 |
5 | 2.319 | 0.764 | 2.069 | 0.689 | 2.173 | 0.699 | 2.795 | 0.731 |
10 | 2.570 | 0.793 | 2.403 | 0.734 | 2.509 | 0.768 | 3.092 | 0.809 |
15 | 2.759 | 0.837 | 2.611 | 0.776 | 2.708 | 0.817 | 3.201 | 0.825 |
20 | 2.957 | 0.877 | 2.760 | 0.824 | 2.874 | 0.869 | 3.220 | 0.862 |
Techniques | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
OMP | CoSaMP | StOMP | K-SVDCS | Proposed | ||||||
SNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) |
0 | 2.560 | 2.385 | 0.758 | 1.638 | 1.388 | 1.853 | -- | 3.80 | 3.112 | 3.072 |
5 | 6.201 | 5.835 | 5.193 | 4.602 | 5.681 | 4.965 | -- | 3.50 | 6.673 | 6.592 |
10 | 10.119 | 9.345 | 9.349 | 8.471 | 9.532 | 8.708 | -- | 3.06 | 10.176 | 9.885 |
15 | 13.118 | 12.438 | 13.119 | 12.195 | 13.074 | 12.168 | -- | 1.60 | 13.361 | 13.056 |
20 | 15.383 | 14.873 | 15.453 | 14.890 | 15.156 | 14.685 | -- | -0.94 | 15.516 | 15.325 |
SNR (dB) | PESQ | STOI | PESQ | STOI | PESQ | STOI | PESQ | STOI | PESQ | STOI |
0 | 1.780 | 0.652 | l.917 | 0.683 | 1.969 | 0.644 | 1.96 | 0.66 | 2.304 | 0.615 |
5 | 2.216 | 0.736 | 2.251 | 0.726 | 2.308 | 0.715 | 2.28 | 0.72 | 2.384 | 0.689 |
10 | 2.434 | 0.811 | 2.513 | 0.759 | 2.513 | 0.771 | 2.52 | 0.79 | 2.674 | 0.748 |
15 | 2.606 | 0.827 | 2.646 | 0.774 | 2.709 | 0.805 | 2.69 | 0.81 | 2.903 | 0.785 |
20 | 2.667 | 0.859 | 2.772 | 0.816 | 2.853 | 0.847 | 2.85 | 0.83 | 3.194 | 0.824 |
Techniques | ||||||||
---|---|---|---|---|---|---|---|---|
OMP | CoSaMP | StOMP | Proposed | |||||
SNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) | SNR (dB) | SSNR (dB) |
0 | 3.288 | 2.490 | l.292 | l.744 | 2.056 | 2.036 | 4.687 | 4.058 |
5 | 7.220 | 6.101 | 5.639 | 4.656 | 6.266 | 5.129 | 7.926 | 7.454 |
10 | 10.689 | 9.631 | 9.844 | 8.445 | 10.264 | 8.901 | 11.206 | 10.748 |
15 | 13.772 | 12.803 | 13.599 | 12.358 | 13.517 | 12.366 | 14.037 | 13.602 |
20 | 15.686 | 15.095 | 15.711 | 14.933 | 15.629 | 14.949 | 15.821 | 15.599 |
SNR (dB) | PESQ | STOI | PESQ | STOI | PESQ | STOI | PESQ | STOI |
0 | 1.684 | 0.659 | 1.887 | 0.629 | 1.942 | 0.591 | 2.304 | 0.677 |
5 | 2.101 | 0.751 | 2.183 | 0.682 | 2.268 | 0.680 | 2.384 | 0.685 |
10 | 2.509 | 0.787 | 2.496 | 0.745 | 2.577 | 0.756 | 2.674 | 0.772 |
15 | 2.654 | 0.836 | 2.632 | 0.756 | 2.747 | 0.808 | 2.903 | 0.791 |
20 | 2.838 | 0.862 | 2.747 | 0.796 | 2.878 | 0.839 | 3.194 | 0.808 |
Techniques | ||||||
---|---|---|---|---|---|---|
SNR (dB) | DNN (MSE) [64] | DNN- PMSQE [65] | BLSTM (MSE) [64] | BLSTM (PMSQE) [65] | DNN- Quality-Net [64] | Proposed |
18 | 2.810 | 3.082 | 3.287 | 2.899 | 3.377 | 3.243 |
12 | 2.576 | 2.819 | 2.908 | 2.777 | 3.010 | 3.024 |
6 | 2.275 | 2.497 | 2.504 | 2.578 | 2.614 | 2.777 |
0 | 1.912 | 2.111 | 2.065 | 2.261 | 2.171 | 2.337 |
−6 | 1.530 | 1.711 | 1.569 | 1.865 | 1.671 | 2.244 |
Techniques | ||||||
---|---|---|---|---|---|---|
SNR (dB) | DNN (MSE) [64] | DNN PMSQE [65] | BLSTM (MSE) [64] | BLSTM (PMSQE) [65] | DNN-Quality-Net [64] | Proposed |
18 | 0.855 | 0.886 | 0.972 | 0.882 | 0.966 | 0.842 |
12 | 0.831 | 0.865 | 0.942 | 0.867 | 0.937 | 0.821 |
6 | 0.788 | 0.822 | 0.885 | 0.836 | 0.882 | 0.815 |
0 | 0.715 | 0.741 | 0.796 | 0.773 | 0.794 | 0.807 |
−6 | 0.604 | 0.615 | 0.663 | 0.667 | 0.663 | 0.718 |
Techniques | ||||
---|---|---|---|---|
OMP | CoSaMP | StOMP | Proposed | |
(8,32,64) | 0.7956 | 0.8071 | 0.3439 | 0.3261 * |
(16,64,128) | 0.9434 | 0.7652 | 0.3694 | 0.3527 |
(32,128,256) | 1.3928 | l.2097 | 0.4637 | 0.4165 |
(64,256,512) | 2.8048 | 2.0487 | 0.9845 | 0.8542 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shukla, V.; Swami, P.D. Orthogonalization of the Sensing Matrix Through Dominant Columns in Compressive Sensing for Speech Enhancement. Appl. Sci. 2023, 13, 8954. https://doi.org/10.3390/app13158954
Shukla V, Swami PD. Orthogonalization of the Sensing Matrix Through Dominant Columns in Compressive Sensing for Speech Enhancement. Applied Sciences. 2023; 13(15):8954. https://doi.org/10.3390/app13158954
Chicago/Turabian StyleShukla, Vasundhara, and Preety D. Swami. 2023. "Orthogonalization of the Sensing Matrix Through Dominant Columns in Compressive Sensing for Speech Enhancement" Applied Sciences 13, no. 15: 8954. https://doi.org/10.3390/app13158954
APA StyleShukla, V., & Swami, P. D. (2023). Orthogonalization of the Sensing Matrix Through Dominant Columns in Compressive Sensing for Speech Enhancement. Applied Sciences, 13(15), 8954. https://doi.org/10.3390/app13158954