Environmental Noise Classification with Inception-Dense Blocks for Hearing Aids
Abstract
:1. Introduction
2. Related Work
2.1. Time-Frequency Representations for Noise Signal
- 1.
- Signal pre-processing.
- 2.
- A Fourier transformation to obtain the signal spectrogram.
- 3.
- Mapping of the spectrogram into a mel-spectrogram through triangular overlapping windows whose center frequencies are distributed on the mel scale. The function B for computing the mth mel-frequency from frequency f in Hertz and its inverse are given by [24]:
- 4.
- Taking a log calculation (decibles) on the mel spectrogram,
2.2. Conventional Noise Classification Algorithms
2.3. Deep Convolutional Neural Network
3. Proposed Methodology
3.1. Inception Block with Dense Connectivity
3.2. Depthwise-Separable Convolution
3.3. Network Structure
4. Experiments
4.1. Dataset
4.2. Data Preprocessing
4.3. Data Augmentation
4.4. Training Settings
5. Results
5.1. Classification Results on Urbansound8k
5.2. Classification Results on the Hearing Aids Noisy Sound (HANS) Dataset
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Löhler, J.; Walther, L.E.; Hansen, F.; Kapp, P.; Meerpohl, J.; Wollenberg, B.; Schönweiler, R.; Schmucker, C. The prevalence of hearing loss and use of hearing aids among adults in Germany: A systematic review. Eur. Arch. Oto-Rhino 2019, 276, 945–956. [Google Scholar] [CrossRef] [Green Version]
- World Health Organization. Deafness and Hearing Loss. 2020. Available online: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (accessed on 26 April 2021).
- Mulrow, C.D.; Tuley, M.R.; Aguilar, C. Sustained benefits of hearing aids. J. Speech Lang. Hear. Res. 1992, 35, 1402–1405. [Google Scholar] [CrossRef]
- Vestergaard Knudsen, L.; Öberg, M.; Nielsen, C.; Naylor, G.; Kramer, S.E. Factors influencing help seeking, hearing aid uptake, hearing aid use and satisfaction with hearing aids: A review of the literature. Trends Amplif. 2010, 14, 127–154. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Skagerstrand, A.; Stenfelt, S.; Arlinger, S.; Wikström, J. Sounds perceived as annoying by hearing-aid users in their daily soundscape. Int. J. Audiol. 2014, 53, 259–269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, W.; Zhang, X.; Zou, X.; Han, W. Deep neural network and noise classification-based speech enhancement. Mod. Phys. Lett. B 2017, 31, 1740096. [Google Scholar] [CrossRef]
- Park, G.; Cho, W.; Kim, K.S.; Lee, S. Speech Enhancement for Hearing Aids with Deep Learning on Environmental Noises. Appl. Sci. 2020, 10, 6077. [Google Scholar]
- Zhang, L.; Wang, M.; Zhang, Q.; Liu, M. Environmental Attention-Guided Branchy Neural Network for Speech Enhancement. Appl. Sci. 2020, 10, 1167. [Google Scholar] [CrossRef] [Green Version]
- Lee, K.; Ellis, D.P. Audio-based semantic concept classification for consumer video. IEEE Trans. Audio Speech Lang. Process. 2009, 18, 1406–1416. [Google Scholar] [CrossRef] [Green Version]
- Temko, A.; Monte, E.; Nadeu, C. Comparison of sequence discriminant support vector machines for acoustic event classification. In Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France, 14–19 May 2006; Volume 5, p. V. [Google Scholar]
- Chu, S.; Narayanan, S.; Kuo, C.C.J. Environmental sound recognition with time frequency audio features. IEEE Trans. Audio Speech Lang. Process. 2009, 17, 1142–1158. [Google Scholar] [CrossRef]
- Geiger, J.T.; Helwani, K. Improving event detection for audio surveillance using gabor filterbank features. In Proceedings of the 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 714–718. [Google Scholar]
- Valero, X.; Alias, F. Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification. IEEE Trans. Multimed. 2012, 14, 1684–1689. [Google Scholar] [CrossRef]
- Uzkent, B.; Barkana, B.D.; Cevikalp, H. Non-speech environmental sound classification using SVMs with a new set of features. Int. J. Innov. Comput. Inf. Control 2012, 8, 3511–3524. [Google Scholar]
- Stowell, D.; Giannoulis, D.; Benetos, E.; Lagrange, M.; Plumbley, M.D. Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 2015, 17, 1733–1746. [Google Scholar] [CrossRef]
- Piczak, K.J. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 1015–1018. [Google Scholar]
- Kons, Z.; Toledo-Ronen, O.; Carmel, M. Audio Event Classification Using Deep Neural Networks. Interspeech. 2013, pp. 1482–1486. Available online: https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_1482.pdf (accessed on 21 December 2020).
- McLoughlin, I.; Zhang, H.; Xie, Z.; Song, Y.; Xiao, W. Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 23, 540–552. [Google Scholar] [CrossRef] [Green Version]
- Salamon, J.; Bello, J.P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
- Palanisamy, K.; Singhania, D.; Yao, A. Rethinking cnn models for audio classification. arXiv 2020, arXiv:2007.11154. [Google Scholar]
- Sehgal, A.; Kehtarnavaz, N. Guidelines and benchmarks for deployment of deep learning models on smartphones as real-time apps. Mach. Learn. Knowl. Extr. 2019, 1, 450–465. [Google Scholar] [CrossRef] [Green Version]
- Huzaifah, M. Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. arXiv 2017, arXiv:1706.07156. [Google Scholar]
- Su, Y.; Zhang, K.; Wang, J.; Zhou, D.; Madani, K. Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl. Acoust. 2020, 158, 107050. [Google Scholar] [CrossRef]
- Stevens, S.S.; Volkmann, J.; Newman, E.B. A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 1937, 8, 185–190. [Google Scholar] [CrossRef]
- Hagan, M.T.; Demuth, H.B.; Beale, M. Neural Network Design; PWS Publishing Co.: Boston, MA, USA, 1997. [Google Scholar]
- Nordqvist, P.; Leijon, A. An efficient robust sound classification algorithm for hearing aids. J. Acoust. Soc. Am. 2004, 115, 3033–3041. [Google Scholar] [CrossRef]
- Büchler, M.; Allegro, S.; Launer, S.; Dillier, N. Sound classification in hearing aids inspired by auditory scene analysis. EURASIP J. Adv. Signal Process. 2005, 2005, 387845. [Google Scholar] [CrossRef] [Green Version]
- Abe, K.; Sakaue, H.; Okuno, T.; Terada, K. Sound classification for hearing aids using time-frequency images. In Proceedings of the 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, BC, Canada, 23–26 August 2011; pp. 719–724. [Google Scholar]
- Zhang, Z.; Xu, S.; Cao, S.; Zhang, S. Deep convolutional neural network with mixup for environmental sound classification. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV); Springer: Cham, Switzerland, 2018; pp. 356–367. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Singh, J.; Joshi, R. Background Sound Classification in Speech Audio Segments. In Proceedings of the 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, 10–12 October 2019; pp. 1–6. [Google Scholar]
- Park, G.; Lee, S. Environmental Noise Classification Using Convolutional Neural Networks with Input Transform for Hearing Aids. Int. J. Environ. Res. Public Health 2020, 17, 2270. [Google Scholar] [CrossRef] [Green Version]
- Roedily, W.; Ruan, S.J.; Li, L.P.H. Real-Time Noise Classifier on Smartphones. IEEE Consum. Electron. Mag. 2020, 10, 37–42. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Salamon, J.; Jacoby, C.; Bello, J.P. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 1041–1044. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; Volume 8, pp. 18–25. [Google Scholar]
- McFee, B.; Humphrey, E.; Bello, J. A software framework for musical data augmentation. In Proceedings of the 16th International Society for Music Information Retrieval Conference, Malaga, Spain, 26–30 October 2015. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 5 April 2021).
Layer | Filter Size/Stride (Number of Filters) | Out Shape | Params |
---|---|---|---|
Input | – | (128, 128, 1) | |
Conv1 | /1 (16) | (128, 128, 16) | 228 |
Conv2 | /1 (16) | (128, 128, 16) | 2,384 |
MaxPool1 | /1 (None) | (42, 42, 16) | |
Inception (a) | [/1(16)conv]*1, [/1(16)conv, /1(32)conv]*1, [/1(16)conv, /1(32)conv, /1(32)]*1, [/1(None)avergepool,/1(16)conv] | (42, 42, 112) | 20,128 |
Inception (b) | [/1(32)conv]*1, [/1(32)conv, /1(32)conv]*1, [/1(32)conv, /1(32)conv, 3x3/1(32)conv]*1, [/1(None)avergepool, /1(32)conv] | (42, 42, 240) | 21,152 |
Inception (c) | [/1(32)conv]*1, [/1(32)conv, /1(32)conv]*1, [/1(32)conv, /1(32)conv, /1(32)conv]*1, [/1(None)avergepool,/1(32)conv] | (42, 42, 368) | 39,584 |
Conv3 | /1(32) | (42, 42, 32) | 13,408 |
AveragePool | / (None) | (42, 10, 32) | |
GlobalAvergePool1 | – | (32) | |
Dense | number of classes | number of classes | 330 |
Total Params | 97,214 |
Fold | UrbanSound8K | UrbanSound8k (aug) | ||||
---|---|---|---|---|---|---|
SetA | SetB | SetC | SetA | SetB | SetC | |
1 | 5238 | 2619 | 873 | 69,593 | 34,792 | 12,841 |
2 | 5328 | 2664 | 888 | 70,740 | 35,372 | 12,964 |
3 | 5550 | 2775 | 925 | 73,766 | 36,885 | 13,528 |
4 | 5940 | 2970 | 990 | 78,853 | 39,429 | 14,390 |
5 | 5616 | 2808 | 936 | 74,556 | 37,280 | 13,555 |
6 | 4938 | 2469 | 823 | 65,599 | 32,800 | 11,966 |
7 | 5028 | 2514 | 838 | 66,804 | 33,405 | 12,250 |
8 | 4836 | 2418 | 806 | 64,211 | 32,114 | 11,964 |
9 | 4896 | 2448 | 816 | 65,035 | 32,519 | 12,118 |
10 | 5022 | 2511 | 837 | 66,698 | 33,350 | 12,376 |
Group | Sample | r |
---|---|---|
1.5 | Even | 2 |
2 | Each | 2 |
5 | Each | −2.5, −2, −1, 1, 2 |
6 | Each | −2.5, −2, −1, 1, 2, 2.5 |
Major Categorie | Class | Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 |
---|---|---|---|---|---|---|
Vehicle engine (ve) | Helicopter | 6 | 6 | 6 | 6 | 6 |
Airplane | 6 | 6 | 6 | 6 | 6 | |
Train | 6 | 6 | 6 | 6 | 6 | |
Machine tools (ma) | Drilling | None | None | None | 2 | 1.5 |
Jackhammer | 2 | None | None | 1.5 | None | |
Chainsaw | 6 | 6 | 6 | 6 | 6 | |
Household appliance (ha) | Air condition | None | 1.5 | None | None | 1.5 |
Vacuum cleaner | 6 | 6 | 6 | 6 | 6 | |
Washing machine | 6 | 6 | 6 | 6 | 6 | |
Natural (na) | Thunderstorm | 5 | 5 | 5 | 5 | 5 |
Sea waves | 5 | 5 | 5 | 5 | 5 | |
Rain | 5 | 5 | 5 | 5 | 5 | |
Wind | 5 | 5 | 5 | 5 | 5 | |
Human speech (hu) | Children playing | None | None | None | None | None |
Street music | 1.5 | None | None | 2 | 2 |
Approach | Mean Acc | Parameters | FLOPs |
---|---|---|---|
PiczakCNN [16] | 73.09 | 109,134,090 | 515,806,780 |
ResNet [20] | 73.26 | 23,608,202 | 5,044,643,516 |
Inception [20] | 75.24 | 21,823,274 | 3,196,935,548 |
DenseNet [20] | 76.30 | 18,341,194 | 5,367,216,252 |
SBCNN [19] | 79.00 | 874,746 | 170,694,732 |
ZhangCNN [29] | 82.60 | 1,186,322 | 882,779,336 |
Proposed model | 83.03 | 97,214 | 394,483,170 |
Approach | Param | Set | Acc (%) | FLOPs | Inference Time (s) | |
---|---|---|---|---|---|---|
Model | FLM | |||||
Roediy [35] | 116,869 | A | 57.70 | 6,184,606 | 0.012 | 0.033 |
B | 57.48 | 9,234,590 | 0.023 | 0.045 | ||
C | 58.72 | 13,191,710 | 0.036 | 0.057 | ||
Zhang [29] | 1,183,081 | A | 71.29 | 377,229,476 | 0.064 | 0.084 |
B | 72.56 | 582,501,796 | 0.093 | 0.112 | ||
C | 71.01 | 860,721,316 | 0.130 | 0.146 | ||
Singh [33] | 4,694,473 | A | 72.53 | 3,072,945,444 | 0.264 | 0.283 |
B | 73.78 | 4,637,731,620 | 0.391 | 0.408 | ||
C | 74.75 | 6,676,496,676 | 0.549 | 0.565 | ||
Proposed model | 97,049 | A | 73.20 | 187,247,748 | 0.032 | 0.061 |
B | 74.84 | 280,86,4388 | 0.049 | 0.067 | ||
C | 75.27 | 394,482,820 | 0.067 | 0.084 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ting, P.-J.; Ruan, S.-J.; Li, L.P.-H. Environmental Noise Classification with Inception-Dense Blocks for Hearing Aids. Sensors 2021, 21, 5406. https://doi.org/10.3390/s21165406
Ting P-J, Ruan S-J, Li LP-H. Environmental Noise Classification with Inception-Dense Blocks for Hearing Aids. Sensors. 2021; 21(16):5406. https://doi.org/10.3390/s21165406
Chicago/Turabian StyleTing, Po-Jung, Shanq-Jang Ruan, and Lieber Po-Hung Li. 2021. "Environmental Noise Classification with Inception-Dense Blocks for Hearing Aids" Sensors 21, no. 16: 5406. https://doi.org/10.3390/s21165406
APA StyleTing, P. -J., Ruan, S. -J., & Li, L. P. -H. (2021). Environmental Noise Classification with Inception-Dense Blocks for Hearing Aids. Sensors, 21(16), 5406. https://doi.org/10.3390/s21165406