Sound-Event Detection of Water-Usage Activities Using Transfer Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Sound-Event Detection by YAMNet
2.2. Water-Sound Activity Classifier: W-YAMNet
3. Experimental Results
3.1. Overview of Experiments
3.2. Water-Related Sound Score Using the YAMNet Output
3.3. Prediction Performance of W-YAMNet
3.3.1. Training of W-YAMNet
3.3.2. Frame-Level and Clip-Level Classification and Clip Length
3.3.3. Data Size for W-YAMNet Training
3.4. Real-Time Application
4. Discussion
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gokalp, H.; Clarke, M. Monitoring Activities of Daily Living of the Elderly and the Potential for Its Use in Telecare and Telehealth: A Review. Telemed. e-Health 2013, 19, 910–923. [Google Scholar] [CrossRef] [PubMed]
- Chan, M.; Esteve, D.; Fourniols, J.; Escriba, C.; Campo, E. Smart wearable systems: Current status and future challenges. Artif. Intell. Med. 2012, 56, 137–156. [Google Scholar] [CrossRef] [PubMed]
- Townsend, D.; Knoefel, F.; Goubran, R. Privacy versus autonomy: A tradeoff model for smart home monitoring technologies. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 4749–4752. [Google Scholar]
- Ding, D.; Cooper, R.A.; Pasquina, P.F.; Fici-Pasquina, L. Sensor technology for smart homes. Maturitas 2011, 69, 131–136. [Google Scholar] [CrossRef] [PubMed]
- Hyun, S.; Chee, Y. The Edge Computing System for the Detection of Water Usage Activities with Sound Classification. J. Biomed. Eng. Res. 2023, 44, 147–156. [Google Scholar]
- Cakir, E.; Parascandolo, G.; Heittola, T.; Huttunen, H.; Virtanen, T. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1291–1303. [Google Scholar] [CrossRef]
- Adavanne, S.; Parascandolo, G.; Pertilä, P.; Heittola, T.; Virtanen, T. Sound event detection using spatial features and convolutional recurrent neural network. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 15–20 April 2018; pp. 771–775. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the Advances in Neural Information Processing Systems, Palais des Congrès de Montréal, Montréal, Canada, 8–13 December 2014.
- Audio Classification Tensorflow. Available online: https://www.tensorflow.org/lite/examples/audio_classification (accessed on 4 October 2023).
- Hershey, S.; Chaudhuri, S.; Ellis, D.P.; Gemmeke, J.F.; Jansen, A.; Moore, R.C.; Wilson, K. CNN architectures for large-scale audio classification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 131–135. [Google Scholar]
- AudioSet—A Sound Vocabulary and Dataset. Available online: https://research.google.com/audioset (accessed on 4 October 2023).
- Gemmeke, J.F.; Ellis, D.P.; Freedman, D.; Jansen, A.; Lawrence, W.; Moore, R.C.; Plakal, M.; Ritter, M. AudioSet: An ontology and human-labeled dataset for audio events. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 776–780. [Google Scholar]
- Campana, M.G.; Delmastro, F.; Pagani, E. Transfer learning for the efficient detection of COVID-19 from smartphone audio data. Pervasive Mob. Comput. 2023, 89, 101754. [Google Scholar] [CrossRef] [PubMed]
- Sound Classification with YAMNet. Available online: https://www.tensorflow.org/hub/tutorials/yamnet (accessed on 4 October 2023).
- Tsalera, E.; Papadakis, A.; Samarakou, M. Comparison of pre-trained CNNs for audio classification using transfer learning. J. Sens. Actuator Netw. 2021, 10, 72. [Google Scholar] [CrossRef]
- Bozkurt, B.; Germanakis, I.; Stylianou, Y. A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection. Comput. Biol. Med. 2018, 100, 132–143. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Feng, G.; Liang, J.; An, T. Acoustic scene classification based on Mel spectrogram decomposition and model merging. Appl. Acoust. 2021, 182, 108258. [Google Scholar] [CrossRef]
- IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events. Available online: https://dcase.community/challenge2021 (accessed on 4 October 2023).
- Choi, K.; Fazekas, G.; Sandler, M.; Cho, K. Transfer learning for music classification and regression tasks. arXiv 2017, arXiv:1703.09179. [Google Scholar] [CrossRef]
- Mobile Web Application Development Platform. Available online: https://firebase.google.com/?hl=ko (accessed on 4 October 2023).
- Google Colab. Available online: https://colab.research.google.com/?hl=ko (accessed on 4 October 2023).
- Abayomi, O.; Damaševičius, R.; Qazi, A.; Adedoyin-Olowe, M.; Misra, S. Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review. Electronics 2022, 11, 3795. [Google Scholar] [CrossRef]
- Cheek, P.; Nikpour, L.; Nowlin, H. Aging well with smart technology. Nurs. Adm. Q. 2005, 29, 329–338. [Google Scholar] [CrossRef] [PubMed]
Rank | Class Name | YAM Output | Class Index 1 |
---|---|---|---|
1 | Water 2 | 0.2001 | 282 |
2 | Water tap, faucet | 0.1788 | 364 |
3 | Sink (filling or washing) | 0.1434 | 365 |
4 | Toilet flush | 0.1088 | 368 |
5 | Inside, small room | 0.0762 | 500 |
6 | Bathtub (filling or washing) | 0.0637 | 366 |
7 | Liquid | 0.0585 | 438 |
8 | Hiss | 0.0474 | 79 |
9 | Fill (with liquid) | 0.0471 | 776 |
10 | Steam | 0.0441 | 290 |
16 | Drip 3 | 0.0205 | 442 |
Bathroom ID | T01 | T02 | T03 | |
---|---|---|---|---|
Datasets | ||||
Training (frames, 0.96 s) | 850 | 780 | 1340 | |
Validation (frames, 0.96 s) | 280 | 260 | 440 | |
Test (audio clips, 5 s) | 60 | 102 | 78 |
Decision Interval | Decision Unit | F1 Score | ||
---|---|---|---|---|
T01 | T02 | T03 | ||
5 s | Frame level | 0.950 | 0.900 | 0.918 |
Clip level | 1.000 | 0.971 | 1.000 | |
4 s | Frame level | 0.953 | 0.893 | 0.917 |
Clip level | 1.000 | 0.952 | 0.990 | |
3 s | Frame level | 0.943 | 0.896 | 0.919 |
Clip level | 1.000 | 0.953 | 0.977 | |
2 s | Frame level | 0.939 | 0.880 | 0.899 |
Clip level | 0.993 | 0.937 | 0.980 | |
1 s | Frame level | 0.916 | 0.853 | 0.856 |
Clip level | 0.964 | 0.902 | 0.935 |
Decision Interval | T01 | T02 | T03 | |||||||
---|---|---|---|---|---|---|---|---|---|---|
5 s | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion matrix | FL | 18 | 0 | 0 | 34 | 0 | 0 | 22 | 0 | 0 |
SH | 0 | 21 | 0 | 0 | 34 | 0 | 0 | 30 | 0 | |
FA | 0 | 0 | 21 | 1 | 1 | 32 | 0 | 0 | 26 | |
F1 SCORE | 1 | 0.980 | 1 | |||||||
4 s | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion matrix | FL | 22 | 0 | 0 | 42 | 0 | 0 | 28 | 0 | 0 |
SH | 0 | 26 | 0 | 0 | 38 | 4 | 0 | 37 | 1 | |
FA | 0 | 0 | 26 | 0 | 2 | 40 | 0 | 0 | 33 | |
F1 SCORE | 1 | 0.952 | 0.99 | |||||||
3 s | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion matrix | FL | 30 | 0 | 0 | 57 | 0 | 0 | 37 | 0 | 0 |
SH | 0 | 35 | 0 | 0 | 50 | 6 | 0 | 49 | 2 | |
FA | 0 | 0 | 35 | 1 | 1 | 54 | 1 | 0 | 43 | |
F1 SCORE | 1 | 0.953 | 0.977 | |||||||
2 s | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion matrix | FL | 45 | 0 | 0 | 85 | 0 | 0 | 56 | 0 | 0 |
SH | 0 | 53 | 0 | 1 | 74 | 10 | 0 | 73 | 3 | |
FA | 0 | 1 | 52 | 3 | 2 | 80 | 1 | 0 | 66 | |
F1 SCORE | 0.993 | 0.937 | 0.98 | |||||||
1 s | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion matrix | FL | 90 | 0 | 0 | 169 | 0 | 2 | 111 | 0 | 2 |
SH | 2 | 101 | 4 | 1 | 140 | 29 | 3 | 140 | 10 | |
FA | 0 | 5 | 101 | 11 | 7 | 152 | 10 | 1 | 123 | |
F1 SCORE | 0.964 | 0.902 | 0.935 |
Used Data | T01 | T02 | T03 | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Full | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion | FL | 18 | 0 | 0 | 34 | 0 | 0 | 22 | 0 | 0 |
SH | 0 | 21 | 0 | 0 | 34 | 0 | 0 | 30 | 0 | |
FA | 0 | 0 | 21 | 1 | 1 | 32 | 0 | 0 | 26 | |
F1 SCORE | 1 | 0.98 | 1 | |||||||
2/3 | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion | FL | 18 | 0 | 0 | 34 | 0 | 0 | 22 | 0 | 0 |
SH | 0 | 21 | 0 | 0 | 28 | 6 | 0 | 30 | 0 | |
FA | 0 | 0 | 21 | 1 | 1 | 32 | 0 | 0 | 26 | |
F1 SCORE | 1 | 0.921 | 1 | |||||||
1/3 | FL | SH | FA | FL | SH | FA | FL | SH | FA | |
confusion | FL | 18 | 0 | 0 | 34 | 0 | 0 | 22 | 0 | 0 |
SH | 0 | 21 | 0 | 0 | 24 | 10 | 0 | 30 | 0 | |
FA | 0 | 0 | 21 | 0 | 0 | 34 | 2 | 0 | 24 | |
F1 SCORE | 1 | 0.853 | 0.919 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hyun, S.H. Sound-Event Detection of Water-Usage Activities Using Transfer Learning. Sensors 2024, 24, 22. https://doi.org/10.3390/s24010022
Hyun SH. Sound-Event Detection of Water-Usage Activities Using Transfer Learning. Sensors. 2024; 24(1):22. https://doi.org/10.3390/s24010022
Chicago/Turabian StyleHyun, Seung Ho. 2024. "Sound-Event Detection of Water-Usage Activities Using Transfer Learning" Sensors 24, no. 1: 22. https://doi.org/10.3390/s24010022
APA StyleHyun, S. H. (2024). Sound-Event Detection of Water-Usage Activities Using Transfer Learning. Sensors, 24(1), 22. https://doi.org/10.3390/s24010022