Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (18)

Search Parameters:
Keywords = vocal tract resonance

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 1214 KB  
Article
The Reliability and Validity of a New Laryngeal Palpation Tool for Static and Dynamic Examination
by Isabelle Bargar, Melina Maria Ippers, Katrin Neumann, Philipp Mathmann and Ben Barsties v. Latoszek
J. Clin. Med. 2025, 14(17), 6309; https://doi.org/10.3390/jcm14176309 - 6 Sep 2025
Viewed by 912
Abstract
Background/Objectives: Voice disorders caused by laryngeal hypertension can impact volume, quality, pitch, resonance, flexibility, and stamina. The laryngeal palpation is a tactile-perceptual assessment, which is one of a few examination methods to evaluate laryngeal hypertension. Laryngeal palpation is a manual examination of [...] Read more.
Background/Objectives: Voice disorders caused by laryngeal hypertension can impact volume, quality, pitch, resonance, flexibility, and stamina. The laryngeal palpation is a tactile-perceptual assessment, which is one of a few examination methods to evaluate laryngeal hypertension. Laryngeal palpation is a manual examination of the extrinsic and paralaryngeal tissues of the larynx (e.g., lateral laryngeal mobility, thyrohyoid and cricothyroid spaces, vertical laryngeal position/mobility, and pain) through the examiner’s fingers. It can be performed during rest (static assessment) or during phonation (dynamic assessment) of the individual being evaluated. This study aimed to validate a novel laryngeal palpation tool with quantitative ordinal scores by assessing its reliability and diagnostic accuracy establishing preliminary clinical cut-off values, and examining its correlations with self-reported voice disorder symptoms. Methods: In a prospective, controlled validation study, 33 participants were selected to assess the validity and reliability of the novel diagnostic tool in a clinical sample and healthy controls. The clinical sample (n = 19) comprised individuals diagnosed with voice disorders, whereas the healthy control group (n = 14) included participants with no history or symptoms of voice pathology. The novel laryngeal palpation tool was employed by two independent examiners to assess both static and dynamic laryngeal function in all participants. In addition, each participant completed the following questionnaires: Voice Handicap Index (VHI-30) with the 30-item, Vocal Fatigue Index (VFI), and the Vocal Tract Discomfort Scale (VTD). Results: Static palpatory assessment of laryngeal tension demonstrated excellent discriminatory power between groups and tension levels (AROC = 0.979), along with high intra-rater (ICC = 0.966) and inter-rater reliability (ICC = 0.866). Significant correlations were revealed between the static palpation results and the VHI scores (r = 0.496; p < 0.01) and VFI (r = 0.514; p < 0.01). For the dynamic evaluation of the palpation tool, comparable results for the validity (AROC = 0.840) and reliability (inter-rater: ICC = 0.800, and intra-rater: ICC = 0.840) were revealed. However, no significant correlations were found between dynamic palpation and self-perceived questionnaires, although some were likely found with static palpation. The validity of the total score was found to be AROC = 0.992. Conclusions: The static and dynamic assessments using the novel laryngeal palpation tool demonstrated promising reliability and diagnostic accuracy, providing initial evidence to support its clinical utility. Further studies are needed to establish broader validation. Full article
(This article belongs to the Special Issue New Advances in the Management of Voice Disorders: 2nd Edition)
Show Figures

Figure 1

17 pages, 559 KB  
Systematic Review
Acoustic Voice Analysis as a Tool for Assessing Nasal Obstruction: A Systematic Review
by Gamze Yesilli-Puzella, Emilia Degni, Claudia Crescio, Lorenzo Bracciale, Pierpaolo Loreti, Davide Rizzo and Francesco Bussu
Appl. Sci. 2025, 15(15), 8423; https://doi.org/10.3390/app15158423 - 29 Jul 2025
Viewed by 2040
Abstract
Objective: This study aims to critically review and synthesize the existing literature on the use of voice analysis in assessing nasal obstruction, with a particular focus on acoustic parameters. Data sources: PubMed, Scopus, Web of Science, Ovid Medline, and Science Direct. Review methods: [...] Read more.
Objective: This study aims to critically review and synthesize the existing literature on the use of voice analysis in assessing nasal obstruction, with a particular focus on acoustic parameters. Data sources: PubMed, Scopus, Web of Science, Ovid Medline, and Science Direct. Review methods: A comprehensive literature search was conducted without any restrictions on publication year, employing Boolean search techniques. The selection and review process of the studies followed PRISMA guidelines. The inclusion criteria comprised studies with participants aged 18 years and older who had nasal obstruction evaluated using acoustic voice analysis parameters, along with objective and/or subjective methods for assessing nasal obstruction. Results: Of the 174 abstracts identified, 118 were screened after the removal of duplicates. The full texts of 37 articles were reviewed. Only 10 studies met inclusion criteria. The majority of these studies found no significant correlations between voice parameters and nasal obstruction. Among the various acoustic parameters examined, shimmer was the most consistently affected, with statistically significant changes identified in three independent studies. A smaller number of studies reported notable findings for fundamental frequency (F0) and noise-related measures such as NHR/HNR. Conclusion: This systematic review critically evaluates existing studies on the use of voice analysis for assessing and monitoring nasal obstruction and hyponasality. The current evidence remains limited, as most investigations predominantly focus on glottic sound and dysphonia, with insufficient attention to the influence of the vocal tract, particularly the nasal cavities, on voice production. A notable gap exists in the integration of advanced analytical approaches, such as machine learning, in this field. Future research should focus on the use of advanced analytical approaches to specifically extrapolate the contribution of nasal resonance to voice thus defining the specific parameters in the voice spectrogram that can give precise information on nasal obstruction. Full article
(This article belongs to the Special Issue Innovative Digital Health Technologies and Their Applications)
Show Figures

Figure 1

21 pages, 9934 KB  
Article
On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations
by Florian Kraxberger, Christoph Näger, Marco Laudato, Elias Sundström, Stefan Becker, Mihai Mihaescu, Stefan Kniesburges and Stefan Schoder
Bioengineering 2023, 10(12), 1369; https://doi.org/10.3390/bioengineering10121369 - 28 Nov 2023
Cited by 6 | Viewed by 1827
Abstract
Sound generation in human phonation and the underlying fluid–structure–acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that [...] Read more.
Sound generation in human phonation and the underlying fluid–structure–acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds’ coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds’ vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates. Full article
Show Figures

Figure 1

18 pages, 24683 KB  
Article
An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model
by Christoph Näger, Stefan Kniesburges, Bogac Tur, Stefan Schoder and Stefan Becker
Bioengineering 2023, 10(12), 1343; https://doi.org/10.3390/bioengineering10121343 - 22 Nov 2023
Cited by 6 | Viewed by 2143
Abstract
In the human phonation process, acoustic standing waves in the vocal tract can influence the fluid flow through the glottis as well as vocal fold oscillation. To investigate the amount of acoustic back-coupling, the supraglottal flow field has been recorded via high-speed particle [...] Read more.
In the human phonation process, acoustic standing waves in the vocal tract can influence the fluid flow through the glottis as well as vocal fold oscillation. To investigate the amount of acoustic back-coupling, the supraglottal flow field has been recorded via high-speed particle image velocimetry (PIV) in a synthetic larynx model for several configurations with different vocal tract lengths. Based on the obtained velocity fields, acoustic source terms were computed. Additionally, the sound radiation into the far field was recorded via microphone measurements and the vocal fold oscillation via high-speed camera recordings. The PIV measurements revealed that near a vocal tract resonance frequency fR, the vocal fold oscillation frequency fo (and therefore also the flow field’s fundamental frequency) jumps onto fR. This is accompanied by a substantial relative increase in aeroacoustic sound generation efficiency. Furthermore, the measurements show that fo-fR-coupling increases vocal efficiency, signal-to-noise ratio, harmonics-to-noise ratio and cepstral peak prominence. At the same time, the glottal volume flow needed for stable vocal fold oscillation decreases strongly. All of this results in an improved voice quality and phonation efficiency so that a person phonating with fo-fR-coupling can phonate longer and with better voice quality. Full article
Show Figures

Figure 1

16 pages, 6517 KB  
Article
Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech
by Karyna Isaieva, Freddy Odille, Yves Laprie, Guillaume Drouot, Jacques Felblinger and Pierre-André Vuissoz
J. Imaging 2023, 9(10), 233; https://doi.org/10.3390/jimaging9100233 - 20 Oct 2023
Cited by 3 | Viewed by 2642
Abstract
MRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images [...] Read more.
MRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images is often insufficient. The goal of this study was to test the applicability of super-resolution algorithms for dynamic vocal tract MRI. In total, 25 sagittal slices of 8 mm with an in-plane resolution of 1.6 × 1.6 mm2 were acquired consecutively using a highly-undersampled radial 2D FLASH sequence. The volunteers were reading a text in French with two different protocols. The slices were aligned using the simultaneously recorded sound. The super-resolution strategy was used to reconstruct 1.6 × 1.6 × 1.6 mm3 isotropic volumes. The resulting images were less sharp than the native 2D images but demonstrated a higher signal-to-noise ratio. It was also shown that the super-resolution allows for eliminating inconsistencies leading to regular transitions between the slices. Additionally, it was demonstrated that using visual stimuli and shorter text fragments improves the inter-slice consistency and the super-resolved image sharpness. Therefore, with a correct speech task choice, the proposed method allows for the reconstruction of high-quality dynamic 3D volumes of the vocal tract during natural speech. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

20 pages, 4358 KB  
Article
Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model
by Subin Erattakulangara, Karthika Kelat, David Meyer, Sarv Priya and Sajan Goud Lingala
Bioengineering 2023, 10(5), 623; https://doi.org/10.3390/bioengineering10050623 - 22 May 2023
Cited by 5 | Viewed by 3473
Abstract
Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The [...] Read more.
Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80–100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human. Full article
(This article belongs to the Special Issue AI in MRI: Frontiers and Applications)
Show Figures

Figure 1

22 pages, 2587 KB  
Review
Impacts of Development, Dentofacial Disharmony, and Its Surgical Correction on Speech: A Narrative Review for Dental Professionals
by Christine Bode, Nare Ghaltakhchyan, Erika Rezende Silva, Timothy Turvey, George Blakey, Raymond White, Jeff Mielke, David Zajac and Laura Jacox
Appl. Sci. 2023, 13(9), 5496; https://doi.org/10.3390/app13095496 - 28 Apr 2023
Cited by 7 | Viewed by 6471
Abstract
Speech is a communication method found only in humans that relies on precisely articulated sounds to encode and express thoughts. Anatomical differences in the maxilla, mandible, tooth position, and vocal tract affect tongue placement and broadly influence the patterns of airflow and resonance [...] Read more.
Speech is a communication method found only in humans that relies on precisely articulated sounds to encode and express thoughts. Anatomical differences in the maxilla, mandible, tooth position, and vocal tract affect tongue placement and broadly influence the patterns of airflow and resonance during speech production. Alterations in these structures can create perceptual distortions in speech known as speech sound disorders (SSDs). As craniofacial development occurs, the vocal tract, jaws, and teeth change in parallel with stages of speech development, from babbling to adult phonation. Alterations from a normal Class 1 dental and skeletal relationship can impact speech. Dentofacial disharmony (DFD) patients have jaw disproportions, with a high prevalence of SSDs, where the severity of malocclusion correlates with the degree of speech distortion. DFD patients often seek orthodontic and orthognathic surgical treatment, but there is limited familiarity among dental providers on the impacts of malocclusion and its correction on speech. We sought to review the interplay between craniofacial and speech development and the impacts of orthodontic and surgical treatment on speech. Shared knowledge can facilitate collaborations between dental specialists and speech pathologists for the proper diagnosis, referral, and treatment of DFD patients with speech pathologies. Full article
(This article belongs to the Special Issue Advances in Maxillofacial and Oral Surgery)
Show Figures

Figure 1

14 pages, 4798 KB  
Article
Vocal Tract Resonance Detection at Low Frequencies: Improving Physical and Transducer Configurations
by Jithin Thilakan, Balamurali B.T., Sarun P.M. and Jer-Ming Chen
Sensors 2023, 23(2), 939; https://doi.org/10.3390/s23020939 - 13 Jan 2023
Viewed by 2326
Abstract
Broadband excitation introduced at the speaker’s lips and the evaluation of its corresponding relative acoustic impedance spectrum allow for fast, accurate and non-invasive estimations of vocal tract resonances during speech and singing. However, due to radiation impedance interactions at the lips at low [...] Read more.
Broadband excitation introduced at the speaker’s lips and the evaluation of its corresponding relative acoustic impedance spectrum allow for fast, accurate and non-invasive estimations of vocal tract resonances during speech and singing. However, due to radiation impedance interactions at the lips at low frequencies, it is challenging to make reliable measurements of resonances lower than 500 Hz due to poor signal to noise ratios, limiting investigations of the first vocal tract resonance using such a method. In this paper, various physical configurations which may optimize the acoustic coupling between transducers and the vocal tract are investigated and the practical arrangement which yields the optimal vocal tract resonance detection sensitivity at low frequencies is identified. To support the investigation, two quantitative analysis methods are proposed to facilitate comparison of the sensitivity and quality of resonances identified. Accordingly, the optimal configuration identified has better acoustic coupling and low-frequency response compared with existing arrangements and is shown to reliably detect resonances down to 350 Hz (and possibly lower), thereby allowing the first resonance of a wide range of vowel articulations to be estimated with confidence. Full article
Show Figures

Figure 1

7 pages, 818 KB  
Article
Destruction of Vowel Space Area in Patients with Dysphagia after Stroke
by Min Kyu Choi, Seung Don Yoo and Eo Jin Park
Int. J. Environ. Res. Public Health 2022, 19(20), 13301; https://doi.org/10.3390/ijerph192013301 - 15 Oct 2022
Cited by 3 | Viewed by 2119
Abstract
Dysphagia is associated with dysarthria in stroke patients. Vowel space decreases in stroke patients with dysarthria; destruction of the vowel space is often observed. We determined the correlation of destruction of acoustic vowel space with dysphagia in stroke patients. Seventy-four individuals with dysphagia [...] Read more.
Dysphagia is associated with dysarthria in stroke patients. Vowel space decreases in stroke patients with dysarthria; destruction of the vowel space is often observed. We determined the correlation of destruction of acoustic vowel space with dysphagia in stroke patients. Seventy-four individuals with dysphagia and dysarthria who had experienced stroke were enrolled. For /a/, /ae/, /i/, and /u/ vowels, we determined formant parameter (it reflects vocal tract resonance frequency as a two-dimensional coordinate point), formant centralization ratio (FCR), and quadrilateral vowel space area (VSA). Swallowing function was assessed using the videofluoroscopic dysphagia scale (VDS) during videofluoroscopic swallowing studies. Pearson’s correlation and linear regression were used to determine the correlation between VSA, FCR, and VDS. Subgroups were created based on VSA; vowel space destruction groups were compared using ANOVA and Scheffe’s test. VSA and FCR were negatively and positively correlated with VDS, respectively. Groups were separated based on mean and standard deviation of VSA. One-way ANOVA revealed significant differences in VDS, FCR, and age between the VSA groups and no significant differences in VDS between mild and moderate VSA reduction and vowel space destruction groups. VSA and FCR values correlated with swallowing function. Vowel space destruction has characteristics similar to VSA reduction at a moderate-to-severe degree and has utility as an indicator of dysphagia severity. Full article
Show Figures

Figure 1

22 pages, 3364 KB  
Article
Sound Visualization Demonstrates Velopharyngeal Coupling and Complex Spectral Variability in Asian Elephants
by Veronika C. Beeck, Gunnar Heilmann, Michael Kerscher and Angela S. Stoeger
Animals 2022, 12(16), 2119; https://doi.org/10.3390/ani12162119 - 18 Aug 2022
Cited by 8 | Viewed by 5918
Abstract
Sound production mechanisms set the parameter space available for transmitting biologically relevant information in vocal signals. Low–frequency rumbles play a crucial role in coordinating social interactions in elephants’ complex fission–fusion societies. By emitting rumbles through either the oral or the three-times longer nasal [...] Read more.
Sound production mechanisms set the parameter space available for transmitting biologically relevant information in vocal signals. Low–frequency rumbles play a crucial role in coordinating social interactions in elephants’ complex fission–fusion societies. By emitting rumbles through either the oral or the three-times longer nasal vocal tract, African elephants alter their spectral shape significantly. In this study, we used an acoustic camera to visualize the sound emission of rumbles in Asian elephants, which have received far less research attention than African elephants. We recorded nine adult captive females and analyzed the spectral parameters of 203 calls, including vocal tract resonances (formants). We found that the majority of rumbles (64%) were nasally emitted, 21% orally, and 13% simultaneously through the mouth and trunk, demonstrating velopharyngeal coupling. Some of the rumbles were combined with orally emitted roars. The nasal rumbles concentrated most spectral energy in lower frequencies exhibiting two formants, whereas the oral and mixed rumbles contained higher formants, higher spectral energy concentrations and were louder. The roars were the loudest, highest and broadest in frequency. This study is the first to demonstrate velopharyngeal coupling in a non-human animal. Our findings provide a foundation for future research into the adaptive functions of the elephant acoustic variability for information coding, localizability or sound transmission, as well as vocal flexibility across species. Full article
(This article belongs to the Special Issue Elephant Communication)
Show Figures

Graphical abstract

15 pages, 14826 KB  
Article
Data-Driven Analysis of European Portuguese Nasal Vowel Dynamics in Bilabial Contexts
by Nuno Almeida, Samuel Silva, Conceição Cunha and António Teixeira
Appl. Sci. 2022, 12(9), 4601; https://doi.org/10.3390/app12094601 - 3 May 2022
Viewed by 3057
Abstract
European Portuguese (EP) is characterized by a large number of nasals encompassing five phonemic nasal vowels. One notable characteristic of these sounds is their dynamic nature, involving both oral and nasal gestures, which makes their study and characterization challenging. The study of nasal [...] Read more.
European Portuguese (EP) is characterized by a large number of nasals encompassing five phonemic nasal vowels. One notable characteristic of these sounds is their dynamic nature, involving both oral and nasal gestures, which makes their study and characterization challenging. The study of nasal vowels, in particular, has been addressed using a wide range of technologies: early descriptions were based on acoustics and nasalance, later expanded with articulatory data obtained from EMA and real-time magnetic resonance (RT-MRI). While providing important results, these studies were limited by the discrete nature of the EMA-pellets, providing only a small grasp of the vocal tract; by the small time resolution of the MRI data; and by the small number of speakers. To tackle these limitations, and to take advantage of recent advances in RT-MRI allowing 50 fps, novel articulatory data has been acquired for 11 EP speakers. The work presented here explores the capabilities of recently proposed data-driven approaches to model articulatory data extracted from RT-MRI to assess their suitability for investigating the dynamic characteristics of nasal vowels. To this end, we explore vocal tract configurations over time, along with the coordination of velum and lip aperture in oral and nasal bilabial contexts for nasal vowels and oral congeners. Overall, the results show that both generalized additive mixed models (GAMMs) and functional linear mixed models (FLMMs) provide an elegant approach to tackle the data from multiple speakers. More specifically, we found oro-pharyngeal differences in the tongue configurations for low and mid nasal vowels: vowel track aperture was larger in the pharyngeal and smaller in the palatal region for the three non-high nasal vowels, providing evidence of a raised and more advanced tongue position of the nasal vowels. Even though this work is aimed at exploring the applicability of the methods, the outcomes already highlight interesting data for the dynamic characterization of EP nasal vowels. Full article
Show Figures

Figure 1

20 pages, 2992 KB  
Article
Data-Driven Critical Tract Variable Determination for European Portuguese
by Samuel Silva, Nuno Almeida, Conceição Cunha, Arun Joseph, Jens Frahm and António Teixeira
Information 2020, 11(10), 491; https://doi.org/10.3390/info11100491 - 21 Oct 2020
Cited by 3 | Viewed by 3460
Abstract
Technologies, such as real-time magnetic resonance (RT-MRI), can provide valuable information to evolve our understanding of the static and dynamic aspects of speech by contributing to the determination of which articulators are essential (critical) in producing specific sounds and how (gestures). While a [...] Read more.
Technologies, such as real-time magnetic resonance (RT-MRI), can provide valuable information to evolve our understanding of the static and dynamic aspects of speech by contributing to the determination of which articulators are essential (critical) in producing specific sounds and how (gestures). While a visual analysis and comparison of imaging data or vocal tract profiles can already provide relevant findings, the sheer amount of available data demands and can strongly profit from unsupervised data-driven approaches. Recent work, in this regard, has asserted the possibility of determining critical articulators from RT-MRI data by considering a representation of vocal tract configurations based on landmarks placed on the tongue, lips, and velum, yielding meaningful results for European Portuguese (EP). Advancing this previous work to obtain a characterization of EP sounds grounded on Articulatory Phonology, important to explore critical gestures and advance, for example, articulatory speech synthesis, entails the consideration of a novel set of tract variables. To this end, this article explores critical variable determination considering a vocal tract representation aligned with Articulatory Phonology and the Task Dynamics framework. The overall results, obtained considering data for three EP speakers, show the applicability of this approach and are consistent with existing descriptions of EP sounds. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

15 pages, 3634 KB  
Article
First-Step PPG Signal Analysis for Evaluation of Stress Induced during Scanning in the Open-Air MRI Device
by Jiří Přibil, Anna Přibilová and Ivan Frollo
Sensors 2020, 20(12), 3532; https://doi.org/10.3390/s20123532 - 22 Jun 2020
Cited by 21 | Viewed by 6986
Abstract
The paper describes first-step experiments with parallel measurement of cardiovascular parameters using a photoplethysmographic optical sensor and standard portable blood pressure monitors in different situations of body relaxation and stimulation. Changes in the human cardiovascular system are mainly manifested by differences in the [...] Read more.
The paper describes first-step experiments with parallel measurement of cardiovascular parameters using a photoplethysmographic optical sensor and standard portable blood pressure monitors in different situations of body relaxation and stimulation. Changes in the human cardiovascular system are mainly manifested by differences in the Oliva–Roztocil index, the instantaneous heart rate, and variations in blood pressure. In the auxiliary experiments, different physiological and psychological stimuli were applied to test whether relaxation and activation phases produce different measured parameters suitable for further statistical analysis and processing. The principal investigation is aimed at analysis of vibration and acoustic noise impact on a physiological and psychological state of a person lying inside the low-field open-air magnetic resonance imager (MRI). The obtained results will be used to analyze, quantify, and suppress a possible stress factor that has an impact on the speech signal recorded during scanning in the MRI device in the research aimed at 3D modeling of the human vocal tract. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

16 pages, 3046 KB  
Article
Non-Contact Speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam
by Yue Ma, Hong Hong, Hui Li, Heng Zhao, Yusheng Li, Li Sun, Chen Gu and Xiaohua Zhu
Remote Sens. 2020, 12(4), 653; https://doi.org/10.3390/rs12040653 - 17 Feb 2020
Cited by 7 | Viewed by 4894
Abstract
Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and [...] Read more.
Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and webcam. The continuous-wave auditory radar is utilized to extract the vocal vibration signal, and the webcam is used to obtain the fitted formant frequency. The traditional formant speech synthesizer is selected to synthesize and recover speech, using the vocal vibration signal as the sound source excitation and the fitted formant frequency as the vocal tract resonance characteristics. Experiments on reading single English characters and words are carried out. Using microphone records as a reference, the effectiveness of the proposed speech recovery technology is verified. Mean opinion scores show a relatively high consistency between the synthesized speech and original acoustic speech. Full article
(This article belongs to the Special Issue Radar Remote Sensing on Life Activities)
Show Figures

Graphical abstract

10 pages, 543 KB  
Proceeding Paper
A New Approach to the Formant Measuring Problem
by Marnix Van Soom and Bart de Boer
Proceedings 2019, 33(1), 29; https://doi.org/10.3390/proceedings2019033029 - 25 Dec 2019
Cited by 4 | Viewed by 2007
Abstract
Formants are characteristic frequency components in human speech that are caused by resonances in the vocal tract during speech production. They are of primary concern in acoustic phonetics and speech recognition. Despite this, making accurate measurements of the formants, which we dub “the [...] Read more.
Formants are characteristic frequency components in human speech that are caused by resonances in the vocal tract during speech production. They are of primary concern in acoustic phonetics and speech recognition. Despite this, making accurate measurements of the formants, which we dub “the formant measurement problem” for convenience, is as yet not considered to be fully resolved. One particular shortcoming is the lack of error bars on the formant frequencies’ estimates. As a first step towards remedying this, we propose a new approach for the formant measuring problem in the particular case of steady-state vowels—a case which occurs quite abundantly in natural speech. The approach is to look at the formant measuring problem from the viewpoint of Bayesian spectrum analysis. We develop a pitch-synchronous linear model for steady-state vowels and apply it to the open-mid front unrounded vowel [ɛ] observed in a real speech utterance. Full article
Show Figures

Figure 1

Back to TopTop