Next Article in Journal
Study on the Disturbance of Existing Subway Tunnels by Foundation Sloping Excavation
Next Article in Special Issue
Determination of Harmonic Parameters in Pathological Voices—Efficient Algorithm
Previous Article in Journal
Variation of Acoustic Transmission Spectrum during the Muscle Fatigue Process
Previous Article in Special Issue
Mapping Phonation Types by Clustering of Multiple Metrics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Do We Get What We Need from Clinical Acoustic Voice Measurements?

by
Meike Brockmann-Bauser
1,2,* and
Maria Francisca de Paula Soares
1,3,*
1
Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, 8091 Zurich, Switzerland
2
Medical Faculty, University of Zurich, 8006 Zurich, Switzerland
3
Department of Speech, Language and Hearing Science, Multidisciplinary Institute for Rehabilitation and Health, Federal University of Bahia, Salvador 40110-170, Brazil
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2023, 13(2), 941; https://doi.org/10.3390/app13020941
Submission received: 29 September 2022 / Revised: 28 December 2022 / Accepted: 4 January 2023 / Published: 10 January 2023
(This article belongs to the Special Issue Current Trends and Future Directions in Voice Acoustics Measurement)

Abstract

:

Featured Application

This article discusses the current state of the art and knowledge gaps in clinical acoustic analysis of voice quality and proposes future directions.

Abstract

Instrumental acoustic measurements of the human voice have enormous potential to objectively describe pathology and, thereby, to assist clinical treatment decisions. Despite the increasing application and accessibility of technical knowledge and equipment, recent research has highlighted a lack of understanding of physiologic, speech/language-, and culture-related influencing factors. This article presents a critical review of the current state of the art in the clinical application of instrumental acoustic voice quality measurements and points out future directions for improving its applications and dissemination in less privileged populations. The main barriers to this research relate to (a) standardization and reporting of acoustic analysis techniques; (b) understanding of the relation between perceptual and instrumental acoustic results; (c) the necessity to account for natural speech-related covariables, such as differences in speaking voice sound pressure level (SPL) and fundamental frequency f0; (d) the need for a much larger database to understand normal variability within and between voice-disordered and vocally healthy individuals related to age, training, and physiologic factors; and (e) affordable equipment, including mobile communication devices, accessible in various settings. This calls for further research into technical developments and optimal assessment procedures for pathology-specific patient groups.

1. Introduction

Voice disorders may substantially affect professional and personal daily life activities, as well as the general well-being of the person concerned [1]. The cost of treating voice disorders is estimated to be up to USD 13.5 billion per year in the U.S., comparable to the healthcare costs of chronic obstructive pulmonary disease (COPD), asthma, diabetes, and allergic rhinitis [2].
Voice dysfunction may be caused by changes in the structure, innervation, or function of the laryngeal mechanism [3]. Dysphonia, characterized as reduced voice capacity and involuntary alterations of vocal quality, pitch, and loudness, has been described as one of the major traits of disordered voice production [2]. Therefore, an objective measurement of these characteristics is key to determining a conclusive diagnosis and treatment plan. Especially for patients with so-called functional dysphonia, comprising around 34% of outpatients in voice clinics, this is the only way to objectively describe voice pathology at all. Thus, a reliable and valid measurement of vocal output is highly desirable from a patient-centered and socioeconomic viewpoint, but is also basic to evidence-based practice. Consequently, a useful clinical application of acoustic analysis techniques depends on the objectivity and reproducibility of measurement results, a clear hypothesis of what is measured and how it relates to dysfunction, and valid cutoff values for the discrimination of pathology. Moreover, ease of measurement techniques and of result interpretation as well as costs play a determining role in a successful widespread application.
In the present work, we will review the suitability of currently recommended methods to assess vocal quality in clinical voice diagnostics and point out future directions for improving its applications and dissemination in less privileged populations.

2. Background: Current State of the Art and Challenges in Clinical Acoustic Measurements

2.1. Hypothesis Underlying Measurements and Current Clinical Assessment Recommendations

Instrumental acoustic measurements rely on the hypothesis that the functionality of the human voice production system is reflected in measurable changes in the acoustic voice signal. Thanks to huge technical developments in the last 30 years, there are several measurement methods at hand for clinical applications, which are easy to apply and have a variously described relation to voice dysfunction [4,5,6]. According to current guidelines, both quantitative and qualitative characteristics of vocal output are measured to allow indirect conclusions to be drawn about the pathophysiology and extent of vocal function restriction [5,7]. Quantitative measures comprise fundamental frequency (f0) and voice intensity, or sound pressure level (SPL), measurements. These include speaking and singing voice range profiles (VRP), meaning distribution maps of f0 and SPL while performing different voice tasks, including sustained vowels, speaking, counting, reading, and singing [8]. Moreover, so-called voice quality measures, including jitter, Harmonics-to-Noise ratio (HNR), Cepstral Peak Prominence (CPP), and combined indices such as the Dysphonia Severity Index (DSI), have been recommended to objectively quantify dysphonia [5,7,8].
Over the years, guidelines have been published proposing technical equipment, procedures, and speech tasks to promote the reproducibility and reliability of acoustic analysis. The most important recommendations are described in Table 1 [5,6,7,9].

2.2. Technical Challenges

Acoustic voice measures are objective in nature and will provide a measurement of any input signal with a given algorithm, usually regardless of signal quality or suitability. Fundamental frequency has been shown to be the most robust measure, even under suboptimal recording conditions, allowing data comparability between different hard- and software equipment types [10,11,12] and audio file formats [13]. However, voice intensity and quality measurements are especially significantly influenced by characteristics of the equipment and environment, including the microphone, signal-to-noise ratio (SNR) of recording systems, data processing steps, analysis algorithms, room acoustics, and background noise [5,6,11,14,15]. Consequently, most acoustic parameters are not easily comparable to measures from other clinics, centers, practices, or research studies. Thus, without reporting of salient technical characteristics, results lack the basis for data interpretation and transfer. However, comprehensive reporting is not always included.
During the COVID-19 pandemic, new aspects challenged the application of instrumental analysis. Wearing a mask, as well as the type of mask (N95, surgical, cloth), have been shown to affect speech signal quality [16,17]. These effects may be partially mitigated by adapting voice recording strategies, such as placing the microphone differently or using additional amplification [16]. Other issues relate to remote assessments, with unspecified background noise and the availability of high quality recording hardware and software in the patients’ home environments. In general, studies agree that is possible to use mobile communication devices (MCDs) for voice analysis. However, identical characteristics to those in clinical settings must be considered to achieve basic data comparability [18,19,20]. This includes reporting of the use and type of mask during a recording session, and the specification of equipment and procedures. Further research must be conducted to systematize the procedures, establish minimal requirements, and propose calibration solutions for more reliable voice measurements with MCDs to support the broad use of these tools.
In summary, standardization of measurement technology and reporting is key to objectively describing voice dysfunction and developing a sufficient database to describe physiologic and pathologic influencing factors for each of our clinically applied parameters. Moreover, this is the basis for reliably measuring treatment effects and further developing applications for MCDs.

2.3. Dysphonia Limits the Objective Measurement of Vocal Quality

Dysphonic voice signals are characterized by an increase in irregularity related to fundamental frequency and voice intensity, and reduced organization or energy in the overtone spectrum. Thus, so-called traditional time-based vocal quality measures, including jitter (f0 variation) and shimmer (SPL variation), will have increased values with signal irregularity. Spectral measures such as Harmonics-to-Noise ratio (HNR) and Cepstral Peak Prominence (CPP) will decrease due to irregularity, as will indices incorporating these measures [4,5,6].
Single and combined acoustic voice quality measures have been related to breathiness, roughness, and overall dysphonia in a comparatively large body of literature [5,21,22,23]. Moreover, the discrimination of dysphonia may be possible with an Area Under the Curve (AUC) of up to 91% for CPPS as single measure [24]. Combined indices, including variants of CPP, HNR, and shimmer or jitter, showed an AUC of up to 94% for overall dysphonia and breathiness [21,23].
Based on the current literature, we can conclude that there is clearly a relation between perceptual dysphonia or voice disorder and several inferential acoustic measures. However, in a clinical population there are several pragmatic problems to address. Measures such as jitter, shimmer, and HNR rely on correct recognition of fo and SPL and, in the case of HNR, also on successful spectrum analysis. Since acoustic signals may simply be too irregular for these analysis steps, it has been recommended that different analysis strategies be applied based on voice signal regularity. According to Titze, nearly periodic Type 1 voices were classified as highly suitable for perturbation (such as jitter and shimmer) analysis. Type 2 voices with strong modulations or subharmonics approaching f0 in energy, as well as irregular or aperiodic Type 3 signals, were recommended for spectrographic and perceptual analysis methods [6]. As an addition to this model, Type 4 voices have been described as Type 3 signals that are stochastic in nature and, therefore, suitable for nonlinear analysis [25].
Thus, in more irregular Type 3 or 4 signals, results of f0 and SPL-based measures may be erroneously high [25]. In principle, similar measurement problems apply to indices incorporating one of these parameters, such as the DSI, Cepstral Spectral Index of Dysphonia (CSID), or Acoustic Voice Quality Index (AVQI) [7,26]. Since a typical clinical population incorporates approximately 20% of patients with so-called Type 3 or 4 signals, this will result in potentially distorted results in analyses of unselected voice-disordered populations [27].
Other methods, now increasingly based on machine learning, have been applied to several databases in efforts to improve the automatic classification of voice disorders. The results are promising, but more studies in the field are still needed [28].
As a consequence, we cannot simply assume that every acoustic measure can be reliably applied to every voice patient. To address this issue, CPP or CPPS based on spectral analysis were recommended for clinical voice assessments in 2019 [5]. However, from an international viewpoint, there is still a broad application of parameters such as jitter and shimmer, which are not applicable, per se, in severely dysphonic voices. As an illustration, in June 2022, we performed a short search in Pubmed using the keywords “CPP dysphonia” and “jitter dysphonia” for the timespan of 2019 to 2022. This resulted in 30 findings for “CPP”, while “jitter” resulted in 88 entries in the same period. Moreover, combined indices incorporating jitter or shimmer variants have been increasingly applied to dysphonic voices [23,28].
Therefore, recommendations for measurements of dysphonic voices should include a clear characterization of suitable voice signal quality for different types of parameters, including combined indices. The application of these rules should be reinforced by the software applied (such as by simply not measuring signals that are too irregular) and by publishing bodies, including reviewers.
Moreover, due to the issues described above, normative values will be either based on data sets excluding highly irregular signals, leaving out a substantial proportion of patients, or be based on data including these signals, leading to overly high normative values for moderately and especially severely dysphonic groups. In both cases, we might argue that there will still be highly valid observations in group comparisons. However, this shows that normative values are not necessarily useful benchmarks for individuals as diagnostic markers.
However, as things stand, sufficiently exhaustive reporting of experimental setups and procedures can become very technical and tedious. Moreover, controlling such detailed reporting in clinical and research contexts may be very time consuming and, thus, be deemed unpractical. It would be expedient if (commercial) measurement equipment, software, and procedures, as well as, perhaps, personnel training were instead independently certified as being qualified to produce recordings at an ISO-standardized quality level. In most cases, it could then suffice to say that the experimental setup and the training of the involved personnel was compliant with a certain level. Such levels could be specified in detail in a standards document and could be designated, for instance, as Level 0: unspecified; Level 1: adequate for preliminary subjective assessment and/or telepractice; Level 2: adequate for most analyses of f0, perturbations, and spectra; Level 3: highest quality for any kind of voice research. While companies have piloted similar schemes for their own product range, to our knowledge, no such scheme has been published or independently validated yet. Similarly, many educational faculties and schools invest substantial efforts into comprehensive training of future clinicians and researchers, but so far, these efforts have not led to the publication of transferable standards for instrumental acoustic analysis.

2.4. Dysphonia and Acoustic Measurements Depend on the Token and Elicitation Task

Dysphonia has primarily been a concept of how a voice is perceived against an internal standard of “normal”. In clinical application, standardized assessment schemes such as the Grading–Roughness–Breathiness–Asthenia–Strain (GRBAS) scale or the CAPE-V protocol have been recommended to focus on specific characteristics. However, even with standardized scales, there will be variance in judgments due to differences in training or the cultural background of the examiner [29,30]. Thus, an instrumental acoustic measurement of dysphonia seems to be a logical solution for an objective assessment of voice dysfunction.
However, there are some observations that may challenge the idea of capturing “dysphonia” per se. First, humans may weight and integrate information about aberrant voice function differently as single and combined acoustic indices [30,31,32]. Moreover, when human assessment is based on cultural and mental concepts, how do we compare this information to objective measures? In recent years, considerable effort has been invested into establishing language-specific normative values for CPP and the combined indices AVQI and CSID [23,28,33,34]. However, if we use language-specific normative values, this may undermine the comparability and transferability of study results between languages. Especially when native speakers assess the voice samples, there may be substantial underlying differences in perceptual expectation and training [31]. From a technical viewpoint, some phonemes, such as the fricatives /f/, /s/, or /sch/, can act as additive noise in speech samples. Depending on the applied parameter, this may inflate assessment results and, thus, the impression of pathology, when these phonemes are more frequent in a language. These issues may be addressed by analyzing only vowels; however, this leads to the clinical problem of the ecological validity of the token. Several studies have shown differences for jitter, shimmer, and CPPS in sustained vowel phonation as compared to speech, reading, or counting tasks [33,35,36,37,38]. Moreover, voice patients may perform well in one voice task in terms of voice technique, and sound much more dysphonic in another. Thus, a global representativity of a normative value or token cannot simply be assumed.
This calls for a standardization of tokens across languages, including standardization of vowel phonation and description of the phonetic context for speech and text. Moreover, a widespread data base is needed to understand what is comparable between languages and what is language- (and culture-) specific in voice acoustics. In a clinical setting, we also need to consider what patients are able to perform, and under which conditions their voice problems will be most measurable. This calls for a disease- and function-specific assessment approach. For example, in cases of neurological pathology, investigation of alternated motion rate (AMR), or use of acoustic parameters such as voice onset time (VOT) and space vowel area, were recommended for the acoustic analysis protocol [9]. Furthermore, specific populations, such as professional voice users, may need the exploration of endurance task or extended VRPs. Further research is needed into adequate language-neutral and language-specific tokens considering the pathology and the specific population.

2.5. Changes in Voice Quality Relate to fo and SPL

Several objective measures related to perceptual dysphonia, including jitter, shimmer, HNR, and CPPS, have been shown to be significantly influenced by habitual speaking voice SPL and f0. This was the case in sustained vowel phonation and speech in both healthy and voice-disordered individuals. When taking speaking SPL into account as a confounding variable, there were no differences between vocally healthy and matched voice-disordered women of the same age and similar professions (Figure 1) [4,35,36,37,39,40,41,42,43]. To date, there are no normative values available with correction for SPL and f0 for single voice quality measures or combined indices. Moreover, under the current guidelines to speak or phonate a vowel “at comfortable loudness and pitch”, these effects are not sufficiently controlled for [4,35,36,43].
From a clinical viewpoint, this leads to several pragmatic problems: patients will not necessarily speak at the same or comparable SPL and f0 or with the same voice quality when assessed at different points in time during the day (Figure 1 and Figure 2) [44,45]. Cepstral and multiparametric measures have especially been described as worse during the morning. Studies in young adults during a day and between days have partially shown a moderate test–retest reliability for several acoustic parameters, including jitter, shimmer, and CPPS [44,45]. Moreover, in professional voice users such as teachers, an increase in mean speaking f0 and SPL and a reduction in jitter and shimmer during a working day have been observed [46,47]. While not all of the observed changes are related to f0 and SPL, there is a lack of baseline data to understand which of these changes are physiologic and related to SPL and f0 only, and which must be classified as pathologic.
Thus, patients with a lower habitual speaking voice SPL will present with reduced periodicity in voicing, which is not related to vocal impairment or lower voice quality per se (Figure 1 and Figure 3, Table 2). Moreover, women tend to speak more softly than men in the same clinical voice tasks, which has, most likely, affected currently known normative values for gender [39,40,48]. Thus, basic comparability between and even within individuals is not a given, since measurements may be performed at a different speaking SPL and/or f0. This hinders the application of normative values for voice assessments as well as comparisons before and after treatment.
Moreover, some patients will undergo systematic changes in SPL and f0 through treatment. This is the case in training programs targeted at changing speaking SPL and pitch, such as Lee Silverman Voice Training or Vocal Function Exercises [49,50]. Thus, under current guidelines, basic comparability before and after treatment of popular acoustic indices is not given. Therefore, SPL and f0 should be controlled and registered during recording sessions, and should thereafter be included in reporting of the results.

2.6. Do We Know Enough about Normal Voice Function?

Patient characteristics such as gender, age, voice training, and voice use level have been shown to influence vocal function characteristics, including vocal maps, jitter, shimmer, HNR, and CPPS [42,44,51,52,53,54,55]. Age effects may be attenuated by voice training and appear less distinct in elderly individuals with a good general physical condition [56,57]. Thus, in the clinical application, we rely on an understanding of how and to what extent our target measures will be affected by these factors to understand pathological voice function. To date, it is unclear for most of these characteristics how we should take them into account when making a clinical decision. Moreover, further physiologic influencing factors, including heartbeat and motor neuron activity, have been shown to affect acoustic perturbation [58]. From a clinical viewpoint, this substantially reduces the usefulness of acoustic voice quality measures for determining the extent of functional restriction.
Given all of these circumstances, there is a need for far more baseline data for Voice Range Profiles and so-called voice quality measures, including technically well-researched normative values for gender and age, at least, to better understand the normal biological variation in voice production within and between individuals. This is the basis for an objective description of voice function restrictions to derive patient-specific treatment goals, and to reliably measure treatment effects.

2.7. How Funding Influences Diagnostic Standards

While the importance of reliable benchmark values is highly evident, we should consider the circumstances that have limited these developments so far. Besides the above discussed technical and procedural limitations, several monetary reasons may need to be addressed in the future. In several countries, there is no specific reimbursement for instrumental acoustic assessments. Moreover, equipment in some countries is very costly as compared to rates for specialist sessions. As a consequence, the application and improvement of techniques is restricted to a few well-financed services and commercial suppliers, with a relatively small group of potential customers. This limits the development of improved procedures, analysis techniques, and data bodies. Considering the comparatively weak evidence base, due to this, research funding options may be limited. In that view, the current situation seems like a classical chicken-and-egg problem, with the need for more data to prove the substantial need for more funding.

3. Discussion: Perspectives for Improving the Clinical Application of Voice Quality Measurements and Current Barriers

Based on the current evidence and the issues discussed above, we must improve our knowledge and practice regarding how and what we measure, as well as how to interpret measurement outcomes. In the following section, we will discuss several potential measures and perspectives to address the issues raised.

3.1. Reinforcing Standards for Acoustic Measurement and Reporting: Is Certification a Solution?

To obtain basic comparability between centers, studies, and countries, the application of measurement guidelines with detailed technical and procedural recommendations, including tokens for instrumental voice assessments, should be reinforced by clinician associations, publishing bodies, reviewers, and (further) education programs [5]. We suggest the application of international standards in the making and reporting of instrumental acoustic measurements. In reference to what are currently the most comprehensive standards published by Patel et al., these should include a reporting of the following:
  • All equipment (including SNR), acoustic wave recognition strategies, and analysis algorithms applied (software)
  • Acoustic environment description (normal room acoustics, quiet room, soundproof booth)
  • Voice tasks including number of repetitions and indication if samples or the entire phonation was used for analysis
  • Reporting of mean (with SD) of the speaking SPL and f0 by analyzed token
  • Use and type of mask during the recording session
Moreover, far more research is warranted to explore more technical solutions using widespread low-cost equipment, such as smartphones, to allow for a wider application of analysis techniques in smaller practices and economically less privileged countries and populations. As discussed above, measurement reliability relies on a thorough technical description of the applications. In this context, the development of simple and mistake-proof technical concepts for the calibration of equipment and the handling of background noise is key to offering current technologies to a wider population. An ISO quality level standardization of the experimental setup and the training of the involved personnel may be a possible future solution to avoid complex descriptions of technical details in the communication of acoustic analysis results.

3.2. Relating Voice Quality Measures to Voice SPL, f0 and Token Type

In the clinical application, we are faced with the dilemma of how to control for SPL and f0 without interfering with habitual voice production. In theory, there are different ways to address this. As discussed above, there are two focal points for technical and procedural questions.
As shown by regression analysis, SPL-related effects on CPPS, jitter, shimmer, and HNR may be controlled for by using a correction factor/formula, or by calculating and reporting measurement results in reference to a standard voice intensity level (Figure 1). However, this approach does not take into account the associated natural increase in f0, which has its own effect on all of these measures. Another approach may be the application of voice range profiles (VRP) supplemented with acoustic quality data. These provide a voice quality measurement related to SPL and f0, thereby giving a voice function map related to the voice range profile [59,60,61]. In addition, difference maps have been shown to provide a quantitative analysis of voice changes while accounting for the individual’s dependencies on SPL and f0 (Figure 2 and Figure 3) [62]. However, patients will have to be verbally motivated and, probably, trained to provide comparable SPL and f0 ranges before and after treatment. This shows that technical solutions may not fully work without adaptation of voice tasks.
Another possible method related to the recording procedure may be to ask patients to produce a specific predefined voice SPL, or to match their own SPL level from the first assessment. Studies in children and adults have shown that this is, in principle, feasible when providing visual feedback [41]. In addition, vowels could be examined at different loudness conditions, such as individually “soft”, “comfortable”, and “loud” levels, to evaluate changes in CPPS over a large SPL span within an individual.
However, to date, these approaches have not been systematically explored and applied in voice disordered individuals. Moreover, the voice tasks themselves have been shown to influence natural voice behavior, which is presumed to contribute to voice disorders. Considering that natural speech is the gold standard for dysphonia in daily life, a variety of voice tasks should be combined, including very open tasks to induce much more prolonged natural speech, such as by asking for a standardized picture description. Combined with methodologically tightly controlled tokens, such as vowel phonation and speech of controlled phonetic content with control of SPL under visual feedback, this will cover both habitual and standardized phonation types. To date, there are not enough data available on how voice function changes with token and task (or verbal instruction) type, nor on which patients can perform which tasks. Since not all patients are able to perform every task, pathology-specific assessment protocols should be evaluated, such as proposed by Rusz, et al. for hypokinetic or hyperkinetic dysarthria [9]. Furthermore, clinical acoustic analysis in comparatively large natural speech samples requires automatized tools with easy interfaces and affordable prices for widespread clinical implementation. This calls for far more research into technical solutions and suitable voice task types for the clinical application.

3.3. How Technical Solutions May Help to Improve Baseline Data and Application of Techniques

For a reliable and valid diagnostic application of voice quality measures, an extensive database with normative data for age, gender, training, token, language, and phonetic context effects, considering SPL and f0, are required. Moreover, the clinical usefulness of group comparisons or cut-off values for acoustic voice quality characteristics such as jitter, shimmer, HNR, CPPS, and combined indices must be reviewed. Thus, at present, interindividual comparisons seem far more valid and informative for clinical applications.
From a technical viewpoint, big data analysis techniques may help to gather more profound knowledge about voice function plasticity within and between individuals with and without dysphonia. Natural Language Processing (NLP) techniques are especially targeted at using samples of uninfluenced speech tokens, and have the potential to support the development of the much-needed massive normative databases [63]. Furthermore, machine learning has been applied in scientific databases and has shown great potential in the detection of voice disorders [64].
Thus, funding and publishing bodies should reinforce the availability of more opensource data to allow further data use and standardization across centers and countries. Again, this will require more research to systematize the procedures, establish minimal requirements, and propose calibration solutions for mobile communication devices to support the broad use of these tools in less economically privileged circumstances. This will help to further develop the huge potential of instrumental acoustic voice quality measures to objectively describe voice function. This would also improve the evidence basis for clinical decisions, as well as research funding and clinical cost coverage applications.

4. Conclusions

Acoustic voice analysis techniques are highly important tools in voice clinics to offer an objective estimate of pathological alterations in voice function, as well as to supplement a diagnosis, tailor a patient-specific treatment plan, and to document treatment effects. Knowledge, measurement technique, and paid time are determining factors for the further development and application of clinically useful procedures. Currently, there are several major challenges for widespread reliable clinical application, including (a) the need to follow standardized recording, analysis, and reporting protocols; (b) improving the understanding of the relation between perceptual and instrumental acoustic results; (c) the necessity to account for covariables related to speech and language, such as differences in speaking voice SPL and f0, phonetic context, and content; (d) the need of a far larger database to understand normal variability within and between individuals both without and with dysphonia, related to age and gender; and (e) affordable recording and analysis instruments, including mobile communication devices, that can be accessed in various settings. This calls for further research into technical developments and optimal assessment procedures for specific patient groups and the establishment of a massive database to understand biological human voice variation. All in all, we must realize that voice production is a highly complex phenomenon, and that fulfilling this list will be a challenge.

Author Contributions

Conceptualization, M.B.-B. and M.F.d.P.S.; methodology M.B.-B. and M.F.d.P.S.; writing—original draft preparation, M.B.-B.; writing—review and editing, M.B.-B. and M.F.d.P.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The here presented data bases on reanalysis of data obtained with protocols approved by the institutional review board of Partners HealthCare System at Massachusetts General Hospital, Boston USA. Please find the reference to the original study in the figure descriptions of Figure 2 and Figure 3.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Acknowledgments

We thank N. Iob, Dept. of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zürich, for the construction of Figure 2. Moreover, we than S. Ternström, Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden, for the provision of Figure 3.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Deary, V.; McColl, E.; Carding, P.; Miller, T.; Wilson, J. A psychosocial intervention for the management of functional dysphonia: Complex intervention development and pilot randomised trial. Pilot Feasibility Stud. 2018, 4, 46. [Google Scholar] [CrossRef] [Green Version]
  2. Schwartz, S.R.; Cohen, S.M.; Dailey, S.H.; Rosenfeld, R.M.; Deutsch, E.S.; Gillespie, M.B.; Granieri, E.; Hapner, E.R.; Kimball, C.E.; Krouse, H.J.; et al. Clinical practice guideline: Hoarseness (dysphonia). Otolaryngol. Head Neck Surg. 2009, 141 (Suppl. S2), S1–S31. [Google Scholar] [CrossRef]
  3. Houtte, V.; Van Lierde, K.; Claeys, S. Pathophysiology and treatment of muscle tension dysphonia: A review of the current knowledge. J. Voice 2011, 25, 202–207. [Google Scholar] [CrossRef] [PubMed]
  4. Florencio, V.D.O.; Almeida, A.A.; Balata, P.; Nascimento, S.; Brockmann-Bauser, M.; Lopes, L.W. Differences and Reliability of Linear and Nonlinear Acoustic Measures as a Function of Vocal Intensity in Individuals With Voice Disorders. J. Voice 2021. [Google Scholar] [CrossRef]
  5. Patel, R.R.; Awan, S.N.; Barkmeier-Kraemer, J.; Courey, M.; Deliyski, D.; Eadie, T.; Paul, D.; Svec, J.G.; Hillman, R. Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. Am. J. Speech Lang. Pathol. 2018, 27, 887–905. [Google Scholar] [CrossRef]
  6. Titze, I.R. Workshop on Acoustic Analysis: Summary Statement; National Center for Voice and Speech, USA: Iowa City, LA, USA, 1995. [Google Scholar]
  7. Dejonckere, P.H.; Bradley, P.; Clemente, P.; Cornut, G.; Crevier-Buchman, L.; Friedrich, G.; Van De Heyning, P.; Remacle, M.; Woisard, V. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur. Arch. Oto-Rhino-Laryngol. 2001, 258, 77–82. [Google Scholar] [CrossRef] [PubMed]
  8. Ternström, S.; Pabon, P.; Södersten, M. The Voice Range Profile: Its Function, Applications, Pitfalls and Potential. Acta Acust. United Acust. 2016, 102, 268–283. [Google Scholar] [CrossRef]
  9. Rusz, J.; Tykalova, T.; Ramig, L.O.; Tripoliti, E. Guidelines for Speech Recording and Acoustic Analyses in Dysarthrias of Movement Disorders. Mov. Disord. 2021, 36, 803–814. [Google Scholar] [CrossRef] [PubMed]
  10. Oliveira, G.; Fava, G.; Baglione, M.; Pimpinella, M. Mobile Digital Recording: Adequacy of the iRig and iOS Device for Acoustic and Perceptual Analysis of Normal Voice. J. Voice 2017, 31, 236–242. [Google Scholar] [CrossRef]
  11. Maryn, Y.; Ysenbaert, F.; Zarowski, A.; Vanspauwen, R. Mobile Communication Devices, Ambient Noise, and Acoustic Voice Measures. J. Voice 2017, 31, 248.e11–248.e23. [Google Scholar] [CrossRef]
  12. Zhang, C.; Jepson, K.; Lohfink, G.; Arvaniti, A. Comparing acoustic analyses of speech data collected remotely. J. Acoust. Soc. Am. 2021, 149, 3910–3916. [Google Scholar] [CrossRef]
  13. Fuchs, R. The Effects of mp3 Compression on Acoustic Measurements of Fundamental Frequency and Pitch Range. In Proceedings of the International Conference on Speech Prosody, Boston University, Boston, MA, USA, 31 May–3 June 2016. [Google Scholar] [CrossRef] [Green Version]
  14. Boersma, P. Should jitter be measured by peak picking or by waveform matching? Folia Phoniatr. Logop. 2009, 61, 305–308. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Maryn, Y.; Corthals, P.; De Bodt, M.; Van Cauwenberge, P.; Deliyski, D. Perturbation measures of voice: A comparative study between Multi-Dimensional Voice Program and Praat. Folia Phoniatr. Logop. 2009, 61, 217–226. [Google Scholar] [CrossRef] [PubMed]
  16. Magee, M.; Lewis, C.; Noffs, G.; Reece, H.; Chan, J.C.S.; Zaga, C.J.; Paynter, C.; Birchall, O.; Azocar, S.R.; Ediriweera, A.; et al. Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols. J. Acoust. Soc. Am. 2020, 148, 3562–3568. [Google Scholar] [CrossRef] [PubMed]
  17. Cavallaro, G.; Di Nicola, V.; Quaranta, N.; Fiorella, M.L. Acoustic voice analysis in the COVID-19 era. Acta Oto-Rhino-Laryngol. Ital. 2021, 41, 1–5. [Google Scholar] [CrossRef]
  18. Chen, Z.; Li, M.; Wang, R.; Sun, W.; Liu, J.; Li, H.; Wang, T.; Lian, Y.; Zhang, J.; Wang, X. Diagnosis of COVID-19 via acoustic analysis and artificial intelligence by monitoring breath sounds on smartphones. J. Biomed. Inform. 2022, 130, 104078. [Google Scholar] [CrossRef] [PubMed]
  19. Jannetts, S.; Schaeffler, F.; Beck, J.; Cowen, S. Assessing voice health using smartphones: Bias and random error of acoustic voice parameters captured by different smartphone types. Int. J. Lang. Commun. Disord. 2019, 54, 292–305. [Google Scholar] [CrossRef]
  20. Grillo, E.U.; Brosious, J.N.; Sorrell, S.L.; Anand, S. Influence of Smartphones and Software on Acoustic Voice Measures. Int. J. Telerehabil. 2016, 8, 9–14. [Google Scholar] [CrossRef] [Green Version]
  21. Barsties v Latoszek, B.; Kim, G.H.; Delgado Hernandez, J.; Hosokawa, K.; Englert, M.; Neumann, K.; Hetjens, S. The validity of the Acoustic Breathiness Index in the evaluation of breathy voice quality: A Meta-Analysis. Clin. Otolaryngol. 2021, 46, 31–40. [Google Scholar] [CrossRef]
  22. Maryn, Y.; Roy, N.; De Bodt, M.; Van Cauwenberge, P.; Corthals, P. Acoustic measurement of overall voice quality: A meta-analysis. J. Acoust. Soc. Am. 2009, 126, 2619–2634. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Jayakumar, T.; Benoy, J.J. Acoustic Voice Quality Index (AVQI) in the Measurement of Voice Quality: A Systematic Review and Meta-Analysis. J. Voice 2022. [Google Scholar] [CrossRef]
  24. Sauder, C.; Bretl, M.; Eadie, T. Predicting voice disorder status from smoothed measures of cepstral peak prominence using Praat and Analysis of Dysphonia in Speech and Voice (ADSV). J. Voice 2017, 31, 557–566. [Google Scholar] [CrossRef]
  25. Sprecher, A.; Olszewski, A.; Jiang, J.J.; Zhang, Y. Updating signal typing in voice: Addition of type 4 signals. J. Acoust. Soc. Am. 2010, 127, 3710–3716. [Google Scholar] [CrossRef] [Green Version]
  26. Wuyts, F.L.; Bodt, M.S.D.; Molenberghs, G.; Remacle, M.; Heylen, L.; Millet, B.; Lierde, K.V.; Raes, J.; Heyning, P.H.V.D. The dysphonia severity index: An objective measure of vocal quality based on a multiparameter approach. J. Speech Lang. Hear. Res. 2000, 43, 796–809. [Google Scholar] [CrossRef]
  27. Carding, P.; Steen, I.; Webb, A.; MacKenzie, K.; Deary, I.; Wilson, J. The reliability and sensitivity to change of acoustic measures of voice quality. Clin. Otolaryngol. 2004, 29, 538–544. [Google Scholar] [CrossRef] [PubMed]
  28. Barsties, V.L.B.; Mathmann, P.; Neumann, K. The cepstral spectral index of dysphonia, the acoustic voice quality index and the acoustic breathiness index as novel multiparametric indices for acoustic assessment of voice quality. Curr. Opin. Otolaryngol. Head Neck Surg. 2021, 29, 451–457. [Google Scholar] [CrossRef] [PubMed]
  29. Kreiman, J.; Gerratt, B.R. Comparing two methods for reducing variability in voice quality measurements. J. Speech Lang. Hear. Res. 2011, 54, 803–812. [Google Scholar] [CrossRef]
  30. Kreiman, J.; Gerratt, B.R.; Ito, M. When and why listeners disagree in voice quality assessment tasks. J. Acoust. Soc. Am. 2007, 122, 2354–2364. [Google Scholar] [CrossRef] [Green Version]
  31. Kreiman, J.; Gerratt, B.R. Perceptual sensitivity to first harmonic amplitude in the voice source. J. Acoust. Soc. Am. 2010, 128, 2085–2089. [Google Scholar] [CrossRef]
  32. Samlan, R.A.; Story, B.; Bunton, K. Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling. J. Speech Lang. Hear. Res. 2013, 56, 1209–1223. [Google Scholar] [CrossRef]
  33. Aghajanzadeh, M.; Saeedi, S. Efficacy of Cepstral Measures in Voice Disorder Diagnosis: A Literature Review. Mod. Rehabil. 2022, 16, 120–129. [Google Scholar] [CrossRef]
  34. Mizuta, M.; Abe, C.; Taguchi, E.; Takeue, T.; Tamaki, H.; Haji, T. Validation of Cepstral Acoustic Analysis for Normal and Pathological Voice in the Japanese Language. J. Voice 2022, 36, 770–776. [Google Scholar] [CrossRef]
  35. Sampaio, M.; Masson, M.L.V.; Soares, M.F.D.P.; Bohlender, J.E.; Brockmann-Bauser, M. Effects of Fundamental Frequency, Vocal Intensity, Sample Duration, and Vowel Context in Cepstral and Spectral Measures of Dysphonic Voices. J. Speech Lang. Hear. Res. 2020, 63, 1326–1339. [Google Scholar] [CrossRef] [PubMed]
  36. Sampaio, M.C.; Bohlender, J.E.; Brockmann-Bauser, M. Fundamental Frequency and Intensity Effects on Cepstral Measures in Vowels from Connected Speech of Speakers with Voice Disorders. J. Voice 2021, 35, 422–431. [Google Scholar] [CrossRef]
  37. Phadke, K.V.; Laukkanen, A.-M.; Ilomäki, I.; Kankare, E.; Geneid, A.; Švec, J.G. Cepstral and Perceptual Investigations in Female Teachers With Functionally Healthy Voice. J. Voice 2020, 34, 485.e33–485.e43. [Google Scholar] [CrossRef]
  38. Brockmann, M.; Drinnan, M.J.; Storck, C.; Carding, P.N. Reliable Jitter and Shimmer Measurements in Voice Clinics: The Relevance of Vowel, Gender, Vocal Intensity, and Fundamental Frequency Effects in a Typical Clinical Task. J. Voice 2011, 25, 44–53. [Google Scholar] [CrossRef] [PubMed]
  39. Brockmann-Bauser, M. Improving Jitter and Shimmer Measurements in Normal Voices; Schulz-Kirchner Verlag: Idstein, Germany, 2012. [Google Scholar]
  40. Brockmann-Bauser, M.; Beyer, D.; Bohlender, J.E. Clinical relevance of speaking voice intensity effects on acoustic jitter and shimmer in children between 5;0 and 9;11 years. Int. J. Pediatr. Otorhinolaryngol. 2014, 78, 2121–2126. [Google Scholar] [CrossRef]
  41. Brockmann-Bauser, M.; Beyer, D.; Bohlender, J.E. Reliable acoustic measurements in children between 5;0 and 9;11 years: Gender, age, height and weight effects on fundamental frequency, jitter and shimmer in phonations without and with controlled voice SPL. Int. J. Pediatr. Otorhinolaryngol. 2015, 79, 2035–2042. [Google Scholar] [CrossRef]
  42. Brockmann-Bauser, M.; Bohlender, J.E.; Mehta, D.J. Acoustic Perturbation Measures Improve with Increasing Vocal Intensity in Individuals With and Without Voice Disorders. J. Voice 2018, 32, 162–168. [Google Scholar] [CrossRef]
  43. Brockmann-Bauser, M.; Van Stan, J.H.; Sampaio, M.C.; Bohlender, J.E.; Hillman, R.E.; Mehta, D.D. Effects of Vocal Intensity and Fundamental Frequency on Cepstral Peak Prominence in Patients with Voice Disorders and Vocally Healthy Controls. J. Voice 2021, 35, 411–417. [Google Scholar] [CrossRef]
  44. Pierce, J.L.; Tanner, K.; Merrill, R.M.; Shnowske, L.; Roy, N. Acoustic Variability in the Healthy Female Voice Within and Across Days: How Much and Why? J. Speech Lang. Hear. Res. 2021, 64, 3015–3031. [Google Scholar] [CrossRef] [PubMed]
  45. Park, Y.; Stepp, C.E. Test-Retest Reliability of Relative Fundamental Frequency and Conventional Acoustic, Aerodynamic, and Perceptual Measures in Individuals With Healthy Voices. J. Speech Lang. Hear. Res. 2019, 62, 1707–1718. [Google Scholar] [CrossRef] [PubMed]
  46. Laukkanen, A.-M.; Ilomäki, I.; Leppänen, K.; Vilkman, E. Acoustic measures and self-reports of vocal fatigue by female teachers. J. Voice 2008, 22, 283–289. [Google Scholar] [CrossRef]
  47. Laukkanen, A.-M.; Kankare, E. Vocal loading-related changes in male teachers’ voices investigated before and after a working day. Folia Phoniatr. Logop. 2006, 58, 229–239. [Google Scholar] [CrossRef]
  48. Brockmann, M.; Storck, C.; Carding, P.N.; Drinnan, M.J. Voice Loudness and Gender Effects on Jitter and Shimmer in Healthy Adults. J. Speech Lang. Hear. Res. 2008, 51, 1152–1160. [Google Scholar] [CrossRef]
  49. Pu, T.; Huang, M.; Kong, X.; Wang, M.; Chen, X.; Feng, X.; Wei, C.; Weng, X.; Xu, F. Lee Silverman Voice Treatment to Improve Speech in Parkinson’s Disease: A Systemic Review and Meta-Analysis. Park. Dis. 2021, 2021, 3366870. [Google Scholar] [CrossRef]
  50. Angadi, V.; Croake, D.; Stemple, J. Effects of Vocal Function Exercises: A Systematic Review. J. Voice 2019, 33, 124.e13–124.e34. [Google Scholar] [CrossRef]
  51. Watts, C.R.; Ronshaugen, R.; Saenz, D. The effect of age and vocal task on cepstral/spectral measures of vocal function in adult males. Clin. Linguist. Phon. 2015, 29, 415–423. [Google Scholar] [CrossRef]
  52. Stathopoulos, E.; Huber, J.; Sussmann, J. Changes in Acoustic Characteristics of the Voice across the Life-span: Measures from 4-93 Year Olds. J. Speech Lang. Hear. Res. 2011, 54, 1011–1021. [Google Scholar] [CrossRef]
  53. Schaeffer, N.; Knudsen, M.; Small, A. Multidimensional Voice Data on Participants With Perceptually Normal Voices From Ages 60 to 80: A Preliminary Acoustic Reference for the Elderly Population. J. Voice 2015, 29, 631–637. [Google Scholar] [CrossRef]
  54. Walzak, P.; McCabe, P.; Madill, C.; Sheard, C. Acoustic changes in student actors’ voices after 12 months of training. J. Voice 2008, 22, 300–313. [Google Scholar] [CrossRef]
  55. Brown, W.; Rothman, H.; Sapienza, C. Perceptual and acoustic study of professionally trained versus untrained voices. J. Voice 2000, 14, 301–309. [Google Scholar] [CrossRef]
  56. Ramig, L.A.; Ringel, R.L. Effects of physiological aging on selected acoustic characteristics of voice. J. Speech Lang. Hear. Res. 1983, 26, 22–30. [Google Scholar] [CrossRef]
  57. Lortie, C.L.; Rivard, J.; Thibeault, M.; Tremblay, P. The Moderating Effect of Frequent Singing on Voice Aging. J. Voice 2016, 31, 112.e1–112.e12. [Google Scholar] [CrossRef]
  58. Titze, I.R. A model for neurologic sources of aperiodicity in vocal fold vibration. J. Speech Lang. Hear. Res. 1991, 34, 460–472. [Google Scholar] [CrossRef]
  59. Pabon, J. Objective acoustic voice-quality parameters in the computer phonetogram. J. Voice 1991, 5, 203–216. [Google Scholar] [CrossRef]
  60. Pabon, J.P.; Plomp, R. Automatic phonetogram recording supplemented with acoustical voice-quality parameters. J. Speech Lang. Hear. Res. 1988, 31, 710–722. [Google Scholar] [CrossRef]
  61. Pabon, P.; Ternstrom, S. Feature Maps of the Acoustic Spectrum of the Voice. J. Voice 2018, 34, 161.e1–161.e26. [Google Scholar] [CrossRef] [Green Version]
  62. Patel, R.R.; Ternström, S. Quantitative and Qualitative Electroglottographic Wave Shape Differences in Children and Adults Using Voice Map-Based Analysis. J. Speech Lang. Hear. Res. 2021, 64, 2977–2995. [Google Scholar] [CrossRef]
  63. Mehta, N.; Pandit, A. Concurrence of big data analytics and healthcare: A systematic review. Int. J. Med. Inform. 2018, 114, 57–65. [Google Scholar] [CrossRef]
  64. Syed, S.A.; Rashid, M.; Hussain, S. Meta-analysis of voice disorders databases and applied machine learning techniques. Math. Biosci. Eng. 2020, 17, 7958–7979. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Smoothed Cepstral Peak Prominence (CPPS) with reference to sound pressure level (SPL) (@10 cm distance) in 58 voice-disordered and 58 age-matched, vocally healthy women. All phonated at individually “soft”, “comfortable”, and “loud” intensity levels. In this sample, “comfortable” phonation spanned a total range of 71–99 dB SPL (in between dashed lines) (Figure: own image, derived from [42,43]).
Figure 1. Smoothed Cepstral Peak Prominence (CPPS) with reference to sound pressure level (SPL) (@10 cm distance) in 58 voice-disordered and 58 age-matched, vocally healthy women. All phonated at individually “soft”, “comfortable”, and “loud” intensity levels. In this sample, “comfortable” phonation spanned a total range of 71–99 dB SPL (in between dashed lines) (Figure: own image, derived from [42,43]).
Applsci 13 00941 g001
Figure 2. Sound pressure level (SPL) distribution (@10 cm) in “soft”, “comfortable”, and “loud” phonations of 58 women with voice disorders and 58 age-matched, vocally healthy controls. The area under each curve indicates the probability (termed density) that a value will fall within the indicated interval. (Figure: own image, derived from [42,43]).
Figure 2. Sound pressure level (SPL) distribution (@10 cm) in “soft”, “comfortable”, and “loud” phonations of 58 women with voice disorders and 58 age-matched, vocally healthy controls. The area under each curve indicates the probability (termed density) that a value will fall within the indicated interval. (Figure: own image, derived from [42,43]).
Applsci 13 00941 g002
Figure 3. Voice maps of Smoothed Cepstral Peak Prominence (CPPS) (dB) values for one male patient with cysts during text reading before (left) and after (middle) surgical treatment, plotted with the software FonaDyn. The (right) figure shows the differences between pre- and post-measurements. This figure illustrates that voice function varies by voice fundamental frequency and intensity. (Figure: provided by S. Ternström, based on one individual from [42,43].
Figure 3. Voice maps of Smoothed Cepstral Peak Prominence (CPPS) (dB) values for one male patient with cysts during text reading before (left) and after (middle) surgical treatment, plotted with the software FonaDyn. The (right) figure shows the differences between pre- and post-measurements. This figure illustrates that voice function varies by voice fundamental frequency and intensity. (Figure: provided by S. Ternström, based on one individual from [42,43].
Applsci 13 00941 g003
Table 1. Summary of Acoustic Analysis Recommendations and Guidelines.
Table 1. Summary of Acoustic Analysis Recommendations and Guidelines.
General DescriptionParameters with Specific RecommendationsRecording Equipment, Environment and File FormatsVocal Tasks
Titze, I. R. (1995) [6]
  • Summary statement of expert panel regarding standardization and utility of voice perturbation methods.
  • Fundamental frequency (f0) and amplitude (SPL)
  • 16-bit A/C converter or DAT recorder
  • Sustained vowels [a], [I], [u], at comfortable, high, and low pitch and medium, soft, and loud intensity levels
  • Review based detailed technical guidelines with classification scheme for voice recordings after signal quality and suitability for analysis methods (Types 1 to 3).
  • Perturbation parameters jitter and shimmer in more regular Type 1 voice signals
  • Sampling frequencies 20–100 kHz
  • Multiple tokens of sustained vowels (10×)
  • Short average cyclic parameter contour (f0 a and SPL b) in Type 1 and 2 signals
  • Sound-treated room with ambient noise < 50 dB
  • Utterances with induced f0 glides and intensity glides
  • Spectrographic and perceptual analysis in irregular or aperiodic Type 3 signals
  • Control of room reverberation
  • Maximum phonation time [s] and [z]
  • Recommends assessment of f0, SPL and voice quality for inter- and intra-subject comparisons
  • File formats: specific formats developed for scientific purposes or .wav files
  • Counting 1 to 100 at comfortable pitch and loudness
  • Sentences at soft, medium and loud intensity levels
  • Descriptive speech (picture)
  • Reading standard text (“Rainbow passage”)
  • Parent-child speech (“Goldilocks and The Three Little Bears”)
  • Dramatic speech with emotions (such as sadness or happiness)
  • Singing passage (“Happy Birthday”)
Dejonckere, P. H., et al. (2001) [7]
  • Recommendations of ELS c experts and members for technical recording standards and token types for multidimensional functional voice assessment
  • VRP d (highest frequency, softest intensity, and lowest frequency)
  • Digital recording. Sampling frequency at least 20–100 kHz
  • [a:] e at comfortable pitch and loudness (3×)
  • Summary of evidence, calls attention for low reliability of acoustic analysis in strongly aperiodic signals
  • Perturbation parameters jitter and shimmer
  • Voice quality measurements: microphone-to-mouth distance of 10 cm, with 45° to 90° angle. VRP measurements with 30 cm distance.
  • [a:] louder than comfortable
  • Noise and spectral measures: NNE f, HNR g, CPPh
  • Sound treated booth or quiet room with ambient noise < 50 dB
  • Single sentence or short standard passage
Patel, R. R., et al. (2018) [5]
  • Recommendations of ASHA i expert panel for technical recording and reporting standards and token types for clinical voice assessments to supplement diagnosis
  • f0 (mean, SD j, minimum and maximum)
  • Minimum 16-bits of resolution
  • [a:] at habitual pitch and loudness, for 3 s to 5 s (3×)
  • Summary and review of evidence base, comprehensive description of role of technical characteristics for measurement reliability
  • SPL (habitual, minimum, maximum). Recommends calibration procedures for SPL
  • Adjust gain to avoid saturation or peak clipping, especially in loudest phonation
  • Pitch range: [a:] high and low, for 2 s (3×)
  • Omnidirectional head-mounted microphone at 4–10 cm from mouth, with 45° to 90° angle
  • Loudness range: [a:] soft and loud, for 2 s to 3 s (3×)
  • Ambient noise level < 10 dB weaker than the level of the quietest phonation
  • Standard reading passage (“Rainbow passage”)
  • SNR k for signal quality > 30 dB
  • File formats: .wav files, or audio file format with no compression
Rusz, J., et al. (2022) [9]
  • Recommendations for recording technique, environment, and process with acoustic outcome data for patients with dysarthria
  • f0 mean and variability (SD in contour-semitones)
  • Digital recorder with preamplifier, resolution of 16 bits. Sampling frequency of 44 kHz
  • [a:] at habitual pitch and loudness, for 3 s to 6 s
  • Summary of evidence base with description of recording and analysis technique, with specific proposals for patients with dysarthria
  • SPL mean and variability
  • Disabling of any mechanism that could modify original audio signal
  • SMR l [papapa] for 5 s and AMR m [pataka] (12×)
  • Jitter and shimmer
  • Omnidirectional head-mounted condenser microphone at 10 cm from mouth, with 45° to 90° angle
  • Reading passage at comfortable pitch and loudness (80 to 120 words)
  • HNR and CPPS n
  • Monologue for 60 s to 90 s
  • Vowel space area–F1 and F2 plot [a, I, u]
  • Preferable soundproof booth or quiet room
  • VOT o
  • Ambient noise level < 10 dB weaker than the level of the quietest phonation
  • Articulation rate and pause (connected speech)
  • Microphone-to-mouth distance can be reduced to 4 cm to ensure the speech sample is at least 10 dB greater than the background noise
  • Avoid large, empty rooms with numerous equipment
  • Chair without wheels and armrest
  • File formats: .wav files, or audio file format with no compression
af0 = fundamental frequency; b SPL= sound pressure level; c ELS = European Laryngological Society; d VRP = voice range profile; e [a:] sustained vowel [a]; f NNE= normalized noise energy; g HNR= harmonics-to-noise ratio; h CPP = cepstral peak prominence; i ASHA = American Speech-Language-Hearing Association; j SD = standard deviation; k SNR = signal to noise ratio; l SMR = sequential motion rate; m AMR = alternating motion rate; n CPPS = cepstral peak prominence smoothed; o VOT = voice onset time.
Table 2. Distribution of mean Smoothed Cepstral Peak Prominence (CPPS) (dB) values with standard deviation (SD) (in brackets) for a group of individuals without voice disorders at soft, comfortable, and loud intensity phonation. (Table: own table, derived from [39]).
Table 2. Distribution of mean Smoothed Cepstral Peak Prominence (CPPS) (dB) values with standard deviation (SD) (in brackets) for a group of individuals without voice disorders at soft, comfortable, and loud intensity phonation. (Table: own table, derived from [39]).
Speaking ConditionSoftComfortableLoud
Mean (SD)Mean (SD)Mean (SD)
CPPS (dB)13.3 (2.2)16 (2.3)18.0 (2.0)
SPL (dB)81.1 (6.0)87.7 (5.6)95.8 (4.7)
f0 (Hz)244.1 (41.2)249.2 (36.5)266.6 (43.6)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Brockmann-Bauser, M.; de Paula Soares, M.F. Do We Get What We Need from Clinical Acoustic Voice Measurements? Appl. Sci. 2023, 13, 941. https://doi.org/10.3390/app13020941

AMA Style

Brockmann-Bauser M, de Paula Soares MF. Do We Get What We Need from Clinical Acoustic Voice Measurements? Applied Sciences. 2023; 13(2):941. https://doi.org/10.3390/app13020941

Chicago/Turabian Style

Brockmann-Bauser, Meike, and Maria Francisca de Paula Soares. 2023. "Do We Get What We Need from Clinical Acoustic Voice Measurements?" Applied Sciences 13, no. 2: 941. https://doi.org/10.3390/app13020941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop