Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis

Younis, Samir A.; Sobhy, Dalia; Tawfik, Noha S.

doi:10.3390/fi16070242

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis

by

Samir A. Younis

,

Dalia Sobhy

^* and

Noha S. Tawfik

Computer Engineering Department, Arab Academy of Science and Technology and Maritime Transport, Alexandria 1029, Egypt

^*

Author to whom correspondence should be addressed.

Future Internet 2024, 16(7), 242; https://doi.org/10.3390/fi16070242

Submission received: 21 May 2024 / Revised: 24 June 2024 / Accepted: 25 June 2024 / Published: 7 July 2024

(This article belongs to the Special Issue Intelligent Decision Support Systems and Prediction Models in IoT-Based Scenarios)

Download

Browse Figure

Versions Notes

Abstract

Crying is a newborn’s main way of communicating. Despite their apparent similarity, newborn cries are physically generated and have distinct characteristics. Experienced medical professionals, nurses, and parents are able to recognize these variations based on their prior interactions. Nonetheless, interpreting a baby’s cries can be challenging for carers, first-time parents, and inexperienced paediatricians. This paper uses advanced deep learning techniques to propose a novel approach for baby cry classification. This study aims to accurately classify different cry types associated with everyday infant needs, including hunger, discomfort, pain, tiredness, and the need for burping. The proposed model achieves an accuracy of 98.33%, surpassing the performance of existing studies in the field. IoT-enabled sensors are utilized to capture cry signals in real time, ensuring continuous and reliable monitoring of the infant’s acoustic environment. This integration of IoT technology with deep learning enhances the system’s responsiveness and accuracy. Our study highlights the significance of accurate cry classification in understanding and meeting the needs of infants and its potential impact on improving infant care practices. The methodology, including the dataset, preprocessing techniques, and architecture of the deep learning model, is described. The results demonstrate the performance of the proposed model, and the discussion analyzes the factors contributing to its high accuracy.

Keywords: audio processing; cry sound analysis; deep learning; spectrogram; transformer models; convolutional neural networks

Graphical Abstract

Share and Cite

MDPI and ACS Style

Younis, S.A.; Sobhy, D.; Tawfik, N.S. Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis. Future Internet 2024, 16, 242. https://doi.org/10.3390/fi16070242

AMA Style

Younis SA, Sobhy D, Tawfik NS. Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis. Future Internet. 2024; 16(7):242. https://doi.org/10.3390/fi16070242

Chicago/Turabian Style

Younis, Samir A., Dalia Sobhy, and Noha S. Tawfik. 2024. "Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis" Future Internet 16, no. 7: 242. https://doi.org/10.3390/fi16070242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI