1. Introduction
This article investigates the occurrence of non-modal voicing in word-final vowels in Spanish. Modal voicing is characterized by periodic vibration of the vocal folds (
Bissiri et al. 2011;
Garellek 2014), while non-modal voicing involves non-periodicity and/or a degree of noise (
Laver 1980;
Esling et al. 2019). Modal voicing is commonly referred to in the literature as “voicing”, while non-modal voicing encompasses phonatory qualities such as voicelessness, creaky voice and breathy voice. Voicelessness is defined by a lack of vocal fold vibration, while creaky voice (or creak) involves a glottal constriction, a low rate of vocal fold vibration (i.e., low pitch), and/or irregular F0 (
Ladefoged 1971;
Gordon and Ladefoged 2001;
Garellek 2019, among others). Breathy voice involves voicing in addition to noise, and concentration of acoustic energy in the F3 region (
Laver 1980;
Keating et al. 2015;
Garellek 2014,
2019;
Esling et al. 2019). These phonation qualities relate to the relative degree of the vocal fold aperture, which is most open for voicelessness, then breathy voice, and most constricted for modal voice, then creaky voice (see, for example,
Gordon and Ladefoged 2001).
In Spanish, as in many other languages, vowels are typically voiced, but non-modal realizations are attested in specific phonological environments and dialects. For example, vowel sequences across words (as in
la uva ‘the grape’) can have creaky voice (
Lorenzo Criado 1948;
Lope Blanch 1987;
González and Weissglass 2017). This is often the case when Spanish is in a contact situation with a language that has phonological glottalization, such as Yucatec Maya, Guaraní, or Arabic (
Thon 1989;
Valentín-Márquez 2006;
Chappell 2013;
Michnowicz and Kagan 2016;
McKinnon 2018;
Mohamed et al. 2019;
Gynan and López Almada 2020). Vowels can also have creaky voice word-finally in various dialects, including Peninsular, Chilean, and Mexican Spanish (
Morrison and Escudero 2007;
Garellek and Keating 2015;
Bolyanatz 2023; cf.
Kim 2017). Fewer studies focusing on breathy voice are available for Spanish.
Mendoza et al. (
1996) and
Trittin and de Santos y Lleó (
1995) find that Spanish-speaking females have breathier voices than males. This is also the case in English and other languages, and it is considered to result from anatomical differences in males and females: the larynx in females tends to have a longer opening, particularly posteriorly, which is conducive to aspiration noise in the F3 region (
Klatt and Klatt 1990). Breathy vowels are also reported to occur utterance-finally in Andalusian Spanish (
O’Neill 2005). As for voiceless vowels, they are also attested word- and utterance-finally in Spanish, often when preceded (or followed) by voiceless consonants (
Delforge 2009,
2012;
Torreira and Ernestus 2011;
Sessarego 2012;
Dabkowski 2018).
In a previous study,
González et al. (
2022) examined the occurrence of creaky voice in word-final Spanish vowels. Creaky voice was analyzed via visual inspection of waveforms and spectrograms, following
Dilley et al. (
1996),
Redi and Shattuck-Hufnagel (
2001) and
Ladefoged and Maddieson (
1996). Specifically, creaky voice was coded when one or more of the following acoustic cues was present: aperiodicity (non-regular duration of pulses), creak (gradual pulse widening with F0 lowering and damping), diplophonia (variable pulse amplitude or shape), glottal squeak (sudden F0 increase), or the presence of a glottal stop.
González et al. (
2022) found a prevalence of creaky voice word-finally for vowels across several Spanish dialects, particularly for men, and at the end of higher prosodic constituents. The data were originally examined for creaky voice only, but further inspection showed that at least some of the vowels coded as creaky tended to end in a noisy and/or non-periodic interval which could be consistent with breathy voice and/or devoicing.
In this study, we re-examine our original dataset to provide a more nuanced analysis of non-modal phonation in word-final Spanish vowels, focusing not just on creaky voice, but also on breathy voice and devoicing. We provide a more detailed analysis of non-modal phonation using waveforms and spectrograms, and, unlike in
González et al. (
2022), we additionally include a measure of spectral tilt, i.e., the degree of energy present in lower vs. higher frequencies. We focus in particular on H1–H2 values, i.e., the relative amplitude of the first harmonic (H1) (corresponding to the fundamental frequency or F0) compared to the second harmonic (H2), since previous studies show that H1–H2 correlates well with differences between modal voice, breathy voice, and creaky voice (see, for example,
Klatt and Klatt 1990;
Hillenbrand et al. 1994;
Trittin and de Santos y Lleó 1995;
Gordon and Ladefoged 2001;
Kreiman and Gerratt 2012;
Keating et al. 2015;
Kim 2017;
Garellek 2019). Specifically, breathy voice has a higher spectral tilt (and, therefore, a higher H1–H2 amplitude) than modal voice, and modal voice has in turn a higher spectral tilt than creaky voice. These differences spring from open quotient differences related to phonation. Creaky voice, for example, has a low open quotient since it involves glottal constriction and increased medial vocal fold thickness. Open quotient is high for modal voice, and highest for breathy voice, since the vocal folds have a wider opening in the latter.
This study focuses on two research questions: (1) what is the prevalence of non-modal phonation beyond creaky voice in word-final vowels, and (2) does the distribution of phonation type differ in terms of speaker sex (i.e., male vs. female) or prosodic context (i.e., at the end of full or intermediate intonational phrases). Regarding (1), we expect to find examples of both breathy voice and devoicing in addition to creaky voice in our dataset. For (2), we hypothesize that non-modal phonation will be more prevalent at the end of full intonational phrases (IPs) than at the end of intermediate ones (ips), and that creaky voice will be more prevalent for males and breathy voice in females.
The remainder of this paper is organized as follows.
Section 2 outlines the methodology of the study; experimental findings are presented in
Section 3.
Section 4 provides a discussion, and
Section 5 closes with concluding remarks.
2. Methodology
This study is part of a larger project and builds on the analysis of creaky voice reported in
González et al. (
2022). The participants were 10 native Spanish speakers from a range of different dialects (Argentina, Bolivia, Colombia (2), Cuba (2), Peru, Puerto Rico, Spain, Venezuela). The participants from Spain, Argentina, Peru, and one from Colombia were male; the rest were female. All were 20–39 years old at the time of recording and had spent between 0 and 14 years in the US. All were raised in Spanish-speaking countries and spoke Spanish daily in their personal and professional life.
The study was approved by the IRB board of Florida State University. Participant data were collected in the phonetics laboratory after obtaining written informed consent. A digital recorder with a high-quality cardioid condenser microphone and a presence boost adapter was used to record audio data. Recordings were obtained in .wav format, in mono, with a sampling rate of 44,100 Hz.
This phase of the project involved a picture identification task comprising 12 images that participants had to name and frame in a short sentence. For example, when participants were shown a picture of a window, they were asked to say Ventana. Es una Ventana. (‘Window. It’s a window.’). A short training phase preceded the task. The picture identification task involved 10 token words and two distractors, one at the beginning and one at the end, to avoid list intonation effects. The task was conducted twice per participant.
Stimuli had penultimate stress, were two or three syllables long, and ended in /a/ or /o/ (
Table 1). The final vowel of each token word occurred at the end of two different prosodic contexts: (i) a lower prosodic constituent, corresponding to an intermediate intonational phrase (ip) in the Spanish Tones and Breaks Indices framework (Sp_ToBI; see
Beckman et al. 2002;
Sosa 2003;
Aguilar et al. 2009;
Estebas-Vilaplana and Prieto 2009); and (ii) a higher prosodic constituent, corresponding to a full intonational phrase (IP) (1). The ends of full intonational phrases in our data involve a distinct pause coinciding with the end of the participant’s turn and tend to be realized with a final low boundary tone (L%). In contrast, the ends of intermediate phrases are cued by a slight or no pause and often involve a rise (a high boundary tone H-) to indicate the message is continuing (
Frota et al. 2007;
Aguilar et al. 2009;
Baxter 2017).
(1) | a. | [Ventana.]ip | [Es | una | ventana.]IP |
| | Window. | (It) is | a | window. |
A total of 400 words were analyzed (10 participants × 10 tokens × 2 prosodic contexts × 2 repetitions) using Praat (
Boersma and Weenink 2023). All measurements were taken by hand. Acoustic analysis involved inspection of the waveform and spectrogram for cues of modal and non-modal voicing. As in
González et al. (
2022), modal voice was characterized by periodicity, while creaky voice was characterized by one or more of the following: (i) irregular F0 (‘aperiodicity’), (ii) F0 lowering, (iii) changes in pulse amplitude or shape (‘diplophonia’), and/or (iv) presence of silence followed by a stop burst (‘glottal stop’) (
Dilley et al. 1996;
Docherty and Foulkes 2005;
Gordon and Ladefoged 2001;
Huber 1988;
Keating et al. 2015;
Ladefoged and Maddieson 1996;
Redi and Shattuck-Hufnagel 2001). All token vowels were also examined for intervals involving lack of voicing—which were coded as devoiced—and for intervals involving noise in the F3 region, coded as breathy voice (
Laver 1980;
Keating et al. 2015;
Garellek 2014,
2019;
Esling et al. 2019). H2-H1 measurements were also taken via FFT spectra. Afterwards, token vowels were analyzed and divided into phonation intervals of modal voicing, creaky voice, breathy voice, and voicelessness based on visual cues from the spectrogram and waveform. FFT spectra were generated at the middle of each phonation interval to measure the relative amplitude of the first harmonic (H1) compared to the second harmonic (H2). As indicated in
Section 1, H1–H2 is highest for breathy voice and lowest for creaky voice (
Garellek 2019).
Figure 1,
Figure 2 and
Figure 3 provide examples of waveforms, spectrograms, and FFT spectra for vowels fully realized as breathy, modal, and creaky (see
Figure 4 and
Figure 5 below for examples of vowels involving more than one phonation interval). As shown in
Figure 1b, breathy voice is characterized by a much higher amplitude of the first harmonic compared to the second. For modal voice, the first harmonic is higher in amplitude than the second (
Figure 2b), but not as much as with breathy voice. On the other hand, creaky voice is characterized by a higher amplitude of the second harmonic compared to the first (
Figure 3a).
All coding and measurements were checked by at least two of the authors. Note that we chose to conduct the spectral analysis of phonation intervals rather than full vowels or syllables, unlike in previous studies focusing on Spanish (see, for example,
Trittin and de Santos y Lleó 1995;
Mendoza et al. 1996;
Kim 2017). This methodological approach allows for a fine-grained exploration of voice quality, capturing cases where vowels have two or more phonation type sequences or ‘dynamic combinations of non-modal phonations’ in the words of
Esposito and Khan (
2020, p. 2) (for example, vowels that begin as creaky but end as breathy, as shown in
Figure 4 and
Figure 5 below; see also
Ladefoged 1983;
DiCanio 2009).
For the analysis of phonation type distribution in word-final vowels, 18 tokens were discarded since they involved errors (4 tokens) or were produced as non-final in the ip context (14 tokens). Descriptive statistics were produced for 382 vowels. When determining if phonation type was associated with prosodic context and/or speaker sex, the 382 vowels were partitioned into 685 phonation intervals coded as modal, creaky, breathy, or devoiced. The analysis of H1–H2 did not include voiceless intervals (
n = 72) since harmonics are only present in periodic sounds. In addition, any phonation intervals occupying less than 30% of the vowel duration (
n = 64) were also discarded. The reason for this exclusion is two-fold: we were interested in obtaining clear H1–H2 cut-off points among modal, creaky, and breathy voice, which would be facilitated by longer analysis windows in FFT spectra. In addition, the 30% threshold has been used in other studies focusing on the analysis of creaky voice (
Bolyanatz 2023), based on the fact that at least 30% of a vowel needs to have creak to be perceived as such utterance-finally (
Crowhurst 2018). This resulted in the H1–H2 analysis of 549 phonation intervals (females,
n = 354; males,
n = 195).
Statistical analyses were performed using IBM SPSS Statistics (Version 29) (
IBM Corp. 2022) and R (Version 4.3.1) (
R Core Team 2023); these included ANOVAs, chi-squared tests, and Bonferroni post hoc tests. Statistical analyses were considered significant if
p ≤ 0.05.
4. Discussion
Our study re-examines the dataset in
González et al. (
2022) to investigate the occurrence of non-modal voice quality beyond creaky voice at the end of prosodic constituents in Spanish, an area that is generally understudied for Spanish. Our results show that both creaky voice and breathy voice occur frequently at the end of ips and IPs; devoicing, while attested, is much less common.
Unlike previous studies, we move beyond considering one type of non-modal phonation only (as in
Kim 2017 or
González et al. 2022 for creaky voice; or Mendoza et al. 1996; and Trittin and de Santos y Lleó 1995, which focus on breathy voice) and provide a fine-grained description and analysis of phonation combinations in word-final vowels in Spanish. We find that vowels with double phonation are the most widespread in our data, particularly those beginning with modal voice and ending in breathy voice, followed by vowels with single phonation, especially fully creaky vowels. In addition, 11% of the vowels in our dataset involved triple phonation. The latter tend to begin with modal voice, although some instances involving initial creaky voice are also attested.
Fully devoiced vowels are unattested in the contexts examined; in addition, devoicing is positionally restricted to the end of the vowel. Other languages with positional restrictions for non-modal phonation include Santa Ana del Valle Zapotec, where non-modal phonation occurs only vowel-finally (
Esposito 2005), and White Hmong, where breathy voice can only occur vowel-initially (
Keating et al. 2010). In our dataset, vowels with double or triple phonation overwhelmingly tend to end in breathy voice or devoicing. As suggested by one reviewer, it is possible that this relates to vocal fold spreading in anticipation of a pause or breath intake. The fact that dynamic phonation combinations are common in Spanish vowels, and that some are more frequent than others, is not only a novel finding, but might be helpful for the segmentation of Spanish vowels in future studies, particularly those focusing on word-final contexts.
Our findings reveal a significant effect of prosodic context on phonation type. Modal voice is more widespread at the end of intermediate phrases (ips) overall, while creaky and breathy voice are more common at the end of full intonational phrases (IPs). These results are in line with prior studies showing that non-modal phonation voice is a cue to the end of higher prosodic constituents across several languages. This is the case for devoicing and breathy voice in French (
Smith 1999) and for creaky voice in English, Spanish, and Italian, among other languages (
Keating et al. 2015;
Dilley et al. 1996;
González et al. 2022;
Di Napoli 2015).
In addition, our results show a significant effect of sex on phonation type, with creaky voice being more common in males, and modal and breathy voice occurring more frequently in females. Similar results for creaky and modal voice were reported in
González et al. (
2022). Biologically, males tend to have longer and thicker vocal folds than females, resulting in a lower pitch/F0, which may be more conducive to creaky voice overall, particularly at the end of prosodic constituents when intonation falls. On the other hand, females’ vocal folds tend to not close completely when vibrating, which can often result in breathy voice (
Laver 1980). We find of particular interest the fact that while males favored creaky voice, they tended to use breathy voice more often at the end of IPs. On the other hand, while females tended to prefer modal or breathy voice, they significantly had more creaky voice at the end of IPs. These findings suggest that both creaky voice and breathy voice can cue the end of higher prosodic units in Spanish, depending on the sex of the speaker. It is interesting that, as shown in
Table 2, among non-modal phonation creaky voice has the longest duration, followed by breathy voice, while devoicing tends to be very short. This might indicate that creaky voice is a more salient prosodic cue at the end of constituents in Spanish than breathy voice or devoicing.
Our study also investigates the relative alignment between spectral tilt, as measured by H1–H2, and visual phonation cues in waveforms and spectrograms. As expected, spectral tilt is highest for breathy voice, both for males and females (see, for example,
Klatt and Klatt 1990;
Garellek 2019). For males, spectral tilt is also higher in modal voice than in creaky voice, as expected, although the average H1–H2 value for creak is not necessarily negative, as reported in some of the prior literature. For females, however, H1–H2 values for modal and creaky voice are practically the same. As
Esposito and Khan (
2020, p. 8) point out, H1–H2 sometimes fails to successfully measure specific voice qualities. For example, in Marathi H1–H2 is a reliable acoustic cue of breathy voice for male speakers but not so consistent for female ones (
Berkson 2012).
There are several possible reasons explaining the lack of complete alignment between spectral tilt and visual phonation cues for creaky and modal voice in the female participants in our dataset. These include (i) a wide range of individual variation in spectral tilt for Spanish females; (ii) the occurrence of nasalization in stimuli where the final vowel is preceded by a nasal segment (as in
ventana ‘window’); (iii) a possible effect of vowel quality on harmonic amplitude (
Klatt and Klatt 1990;
Garellek et al. 2016;
Garellek 2019,
2022). We consider (iii) to be the most likely explanation: our study included tokens ending in /a/, which has a high F1 that does not largely impact voice quality, but also ending in /o/, usually involving a relatively low first formant that might have influenced the amplitudes of the first and second harmonics. Further studies investigating voice quality in Spanish could include the use of additional spectral tilt measures and/or use formant correction algorithms in the analysis of H1–H2 (
Iseli et al. 2007;
Esposito et al. 2021). In any case, we conclude that the identification of breathy voice in Spanish vowels benefits from using H1–H2 measurements, for both males and females. For creaky voice, however, visual examination of cues such as aperiodicity and diplophonia in waveforms and spectrograms might be enough to identify it consistently.
The participants in this study were Spanish-dominant bilingual speakers with English as their L2 who used Spanish daily in personal and professional interactions. Some had just arrived in the US, and some had been living in this country for up to 40 years.
Cantor-Cutiva et al. (
2023) reports that Spanish-dominant bilingual speakers do not produce creaky voice as often as English-dominant bilinguals, while
Kim (
2017) shows that Spanish–English bilinguals often transfer creaky voice from English into Spanish (
Kim 2017).
González et al. (
2022), who examined creaky voice in the participants of the present study, argue against English transfer of this voice quality based on the fact that (i) creaky voice was as frequent in participants who had just moved to the US as in participants who had been residing in the US for many years, and (ii) creaky voice is more frequent for females than males in American English, unlike in the Spanish dataset, which shows the opposite trend. While the present study still shows this pattern, we find it intriguing that the female participants had significantly more creak at the end of IPs, while male participants preferred breathy voice in the same content. We consider then that transfer of creaky voice from English might be a possibility at the end of higher prosodic units, at least for females, but we leave this point for future investigation.
Finally, the participants of this study were from eight different Spanish-speaking countries. Except for the case of Colombian Spanish, all Spanish dialects represented included male or female speakers, but not both. Some dialectal differences in phonation are reported for Spanish; for example,
Kim (
2017) did not find creaky voice utterance-finally in native speakers of Mexican Spanish (unlike
Garellek and Keating 2015). We leave a more detailed investigation of dialectal differences in the voice quality of Spanish vowels for future investigations.
5. Conclusions
This study contributes to the investigation of phonation in Spanish vowels. It shows that creaky voice and breathy voice are pervasive word-finally, particularly at the end of full intonational phrases (IPs). While creaky voice was favored by males and modal and breathy voice by females, at the end of full intonational phrases males tended to have more breathy voice, and females more creaky voice. Our findings also show that word-final Spanish vowels often involves double or triple phonation types. In vowels realized with multiple phonation types, modal voice tends to precede other voice qualities; devoicing, when present, always occurs at the end of the vowel. These results have implications for future acoustic studies of Spanish vowels, which are often assumed to be modally voiced throughout.
Our study included the examination of phonatory visual cues in waveforms and spectrograms in addition to the measurement of H1–H2 values. The latter was calculated for each phonation interval, rather than as the average for the entire vowel, resulting in a more fine-grained investigation of voice quality. To the best of our knowledge, this is the first time this methodology is employed, at least for Spanish. Our results show that H1–H2 differences between modal and breathy voice align well with phonatory visual cues in acoustic displays. However, H1–H2 values consistently distinguish modal and creaky voice for males only. Future studies on Spanish vowels should include formant correction algorithms for H1–H2 and/or the use of additional spectral tilt measures, particularly if focusing on female speakers. It is our hope that other scholars continue to investigate phonation in vowels, since it can inform our understanding of their acoustic and perceptual characteristics both in L1 and L2 Spanish.