1. Introduction
This study aims to investigate pre-vocalic consonant devoicing in Brazilian Portuguese (BP), in order to ascertain whether an ongoing sound change is taking place. Consonant devoicing (CD) is a linguistic phenomenon where voiced consonants such as /b, d, g/ are pronounced as voiceless segments such as [p, t, k]. In Brazilian Portuguese (BP), certain patterns suggest that formerly reliably voiced sibilants may now be undergoing a process of devoicing, even in contexts where voicing would be expected. For example, the plural form redes ‘hammocks’ is typically pronounced [hedz] before a word-initial vowel, but speakers may also be producing [heds] or [hets]. This variability raises questions about whether BP is experiencing a sound change affecting its word-final sibilants, particularly in environments previously associated with voicing.
In BP, /s/ and /z/ are distinct phonemes, occurring in minimal pairs of words such as
casa /’kazɐ/ (‘house’) and
caça /’kasɐ/ (‘hunt’). However, the contrast between /s/ and /z/ is neutralized word-finally, such that a final /z/ does not typically occur. For example, the plural marker in BP, spelled
-s, is realized as either [s] or a neutralized sibilant segment depending on the phonological context. When followed by a vowel or a voiced consonant, this segment undergoes regressive assimilation, as in
casas amarelas [’kazɐzamaɾɛlɐs] (‘yellow houses’), where the plural marker /s/ in
casas becomes voiced [z] due to the following voiced segment [ɐ] in
amarelas (
Cristófaro-Silva & Mendes, 2022).
Traditionally, this assimilation results in a voiced realization [z], mirroring processes observed in other Romance languages. Catalan, for example, neutralizes voiced and voiceless obstruents word-finally (e.g.,
amiga [ə’miɡə] ‘female friend’ vs.
amic [ə’mik] ‘male friend’) but voices them before vowels or voiced consonants. In
cas (‘case’), the final /s/ is voiceless when followed by a pause [’kas], but it is realized as [z] when followed by a vowel-initial word, as in
cas obert (‘open case’) [‘kaz u’βɛrt] (
Carbonell & Llisterri, 2009).
However, if the sibilants in (stop + sibilant) clusters (e.g., [ts], [ps]) surface as voiceless in Brazilian Portuguese even when followed by a word-initial vowel—for instance, in phrases like
botes amarelos (“yellow boats”) or
envelopes azuis (“blue envelopes”)—this might suggest that a devoicing phenomenon is taking place. Here, the sibilant remains voiceless despite the following vowel, indicating a resistance to the expected assimilation observed elsewhere.
1 Preliminary spectrographic analyses (
Mendes, 2023) provide evidence for such voiceless realizations, which contrasts with the general trend of regressive voicing in BP. This behavior warrants further investigation into the phonetic and phonological factors governing such devoicing.
Consonant Devoicing (CD), especially in word-final positions, is a common pattern in sound change phenomena confirmed by numerous cross-linguistic studies and consistently appears in both first and second language acquisition contexts (
Blevins, 2006;
Albuquerque, 2011;
Broselow, 2018;
Jatteau et al., 2019a). Various motivations for final devoicing have been proposed in the literature. These include the absence of clear transitions between consonants and vowels essential for perceiving voicing contrasts (
Jatteau et al., 2019a); anticipatory glottal opening for respiratory purposes (
Myers, 2012;
Hutin et al., 2020); reduction in subglottal pressure toward the end of utterances leading to voicing cessation before obstruent release (
Westbury & Keating, 1986); and difficulties in voicing production and perception during final lengthening (
Blevins, 2006;
Ohala, 1997). While these explanations account for devoicing in pre-pausal environments, the pattern observed in Brazilian Portuguese occurs in pre-vocalic contexts, raising the question of what might have led to this pattern.
When stop + sibilant clusters undergo devoicing in BP, one might initially attribute the sibilant’s devoicing to the influence of a preceding voiceless segment. However, pre-vocalic sibilant devoicing also appears in contexts with a preceding voiced stop—for instance, in
bodes amarelos ‘yellow goats’ ([‘bɔds.a.ma.’ɾɛ.lus] ~ [‘bɔdz.a.ma.’ɾɛ.lus]) and
jegues argentinos ‘Argentine mules’ ([‘ʒɛgs.ah.ʒẽ.’tʃi.nʊs] ~ [‘ʒɛgz.ah.ʒẽ.’tʃi.nʊs]), where the devoiced sibilant follows [d] or [g].
2 Such examples indicate that devoicing cannot be explained solely by the preceding stop’s voicelessness; rather, the broader preceding phonological environment—including both voiced and voiceless stops—must be evaluated to fully understand the conditions under which sibilant devoicing occurs.
CD may also interact with the following phonological environment. The study of
Strycharczuk (
2012) highlights how the following phonological context can either implement or inhibit word-final devoicing in West Flemish. In this dialect, word-final obstruents typically undergo devoicing in isolation or when followed by a voiceless segment. However, when a sonorant consonant or vowel follows, devoicing may be blocked, and instead, pre-sonorant voicing emerges, maintaining or introducing voicing in the obstruent. For example, in
dat mens (‘that person’), the final fricative in
mens would usually devoice to [mɛns], but in the context of the voiced sonorant /ɪ/ in
dat mens is (‘that person is’), the fricative surfaces as voiced, yielding [mɛn.zɪs]. This shows that the following phonological environment, particularly the presence of a sonorant, can prevent word-final devoicing.
Another factor involves the influence of orthographic representations on CD.
Hayes-Harb et al. (
2018) explored this by exposing native English speakers to German-like words (e.g., /ʃtɑit/ and /ʃtɑid/, both pronounced [ʃtɑit]), paired with pictures and, in some cases, their written forms (e.g.,
steit and
steid). During the test, participants who had seen the written forms were more likely to produce final voiced obstruents when naming the pictures. This suggests that visual access to the written forms interfered with their ability to acquire target-like pronunciations.
The influence of word frequency on CD has also been explored.
De Schryver et al. (
2008) found that Dutch low-frequency words, like
plonzen [plɔnzən] ~ [plɔnsən] (to splash) and
omhelzen [ɔm’ɦɛl.zən] ~ [ɔm’ɦɛlsən] (to embrace), were more susceptible to /z/ devoicing than high-frequency words. The study showed that less frequent words, which have weaker mental representations, are more likely to deviate from standard pronunciation patterns. For instance, low-frequency words were more often produced with devoiced fricatives compared to high-frequency words like
reizen (to travel) or
blozen (to blush), which resisted devoicing due to their frequent use.
Nevertheless, the literature presents two opposing views on the role of word frequency in lenition processes like final devoicing. On the one hand,
Bybee (
2010) underscores that frequent usage promotes segmental weakening, whereby words become more streamlined. For example, in English, high-frequency phrases such as “gonna” (going to) and “wanna” (want to) demonstrate how frequent words undergo phonological reduction to facilitate faster and more efficient speech. On the other hand, the aforementioned
De Schryver et al. (
2008) view emphasizes that high-frequency words maintain stronger lexical representations, which can shield them from undergoing devoicing.
These phonological insights underline that even in languages like contemporary Brazilian Portuguese, where certain consonantal contrasts (such as /s/-/z/) are already neutralized word-finally, variable final devoicing can still emerge. In BP, consonant devoicing might reflect an ongoing sound change. Earlier research, referred to here as ‘Stage 1’ (
Cristófaro-Silva, 2003;
Bisol, 2005;
Seara et al., 2017), characterizes the expected realizations of word-final stops followed by /s/ as follows:
a. Voiced stop + /s/ before pause → [s], e.g., /bs/## → [bs]##
b. Voiced stop + /s/ before a vowel → [z], e.g., /bs/#V → [bz]#V
c. Voiceless stop + /s/ before pause → [s], e.g., /ps/## → [ps]##
d. Voiceless stop + /s/ before a vowel → [z], e.g., /ps/#V → [pz]#V
In other words, for a form like
clubes, the expected surface pattern would be [klubs] (with no laryngeal assimilation from [b] to [s]) if followed by a pause. Under this Stage 1 description, there was no devoicing reported in the position of the prevocalic voiced sibilant (i.e., the /s/ following a voiced stop was realized as [z] before a vowel).
3However, the current study suggests a ‘Stage 2’, in which devoicing now affects precisely those environments where the literature predicted voiced realizations. Specifically, we observe that (1) instead of the expected [bz]#V in (b), speakers increasingly produce [bs]#V or even [ps]#; and (2) instead of the expected [pz]#V in (d), speakers now frequently realize [ps]#V. This shift might indicate that BP is in the midst of a sound change.
Building upon the phonetic and phonological factors discussed above, this study aims to determine which mechanisms might be driving consonant devoicing in Brazilian Portuguese. By examining the phonetic properties of CD in BP and assessing predictors such as adjacent phonological context, orthographic input (task type), and word frequency—as well as incorporating lexical item and individual speaker variation as random effects—we aim to determine whether BP is undergoing a sound change. This investigation will not only shed light on the specific dynamics of CD in BP but also contribute to the broader understanding of how phonetic tendencies interact with language-specific phonological systems.
This paper is structured as follows: the
Section 2 outlines what is currently known about consonant devoicing in BP. Following that, the
Section 3 outlines the methodology utilized in this study. Subsequently, the
Section 4 presents and reports on the results, followed by the conclusions.
2. Consonant Devoicing in Brazilian Portuguese
To the best of our knowledge, prior investigations into the phenomenon of obstruent devoicing in L1 BP are lacking. However, some studies have provided insights into the devoicing of BP vowels (
Meneses, 2012,
2016), as well as the devoicing of L2 English consonants by BP speakers (
Albuquerque, 2011;
Cristófaro-Silva & Mendes, 2022). These studies will be described below, as they may shed light on the potential occurrence of CD in BP.
Meneses (
2012) examined the devoicing of the high vowels /i/ and /u/ in BP, exemplified by words such as
[kus]tódia for ‘custódia’ and
[bis]coito for ‘biscoito’, respectively. The author states that, in phonetic terms, devoicing can be understood as a result of coarticulation between adjacent segments during everyday speech: an increase in the overlap of articulatory gestures, influenced by structural factors, may lead to the compression of the vocalic gesture relative to the glottal gesture.
Meneses (
2012) found that total vowel devoicing occurred in 38% of the data, with partial devoicing observed in 23%, suggesting that the phenomenon is not categorical and may represent an ongoing sound change in BP. Furthermore, the findings imply a tendency for vowel devoicing when followed by a voiceless consonant, indicating a phonetic assimilation between the unstressed vowel and the adjacent voiceless segment.
Subsequently,
Meneses (
2016) also investigated devoicing in BP, focusing on the reduction of post-stressed vowels (e.g.,
lance ‘bid’ [lã.si] ~ [lã.si] ~ [lãs]). The study suggests that post-stressed vowels undergo a reduction process that leads to devoicing rather than outright deletion, challenging traditional views that consider this process as apocope. The findings indicate that vowel devoicing is facilitated by an extreme overlap between consonant and vowel gestures, which results in insufficient time to maintain voicing during the vowel segment. This articulatory overlap creates a narrow oral aperture, reducing the ability to sustain voicing, thus leading to devoicing. This pattern is not merely epiphenomenal but arises from specific phonetic conditions that involve adjustments in the motor program of speech.
Other studies have investigated devoicing processes in the development of L2 English consonants by BP speakers.
Albuquerque’s (
2011) study, for instance, unveiled challenges encountered by BP speakers in perceiving contrasts such as ‘cap /kæp/ vs. cab /kæb/’, ‘bat /bæt/ vs. bad /bæd/’, and ‘back [bæk] vs. bag [bæg]’. The findings indicated a common tendency among participants to misperceive voiced stops as voiceless ones, prompting inquiries into whether this challenge could be attributed to cross-linguistic interference. Results also showcased greater accuracy in discriminating the voiced–voiceless contrast for bilabials compared to alveolars and velars, regardless of the participants’ proficiency level.
Still within the domain of L2, but this time concerning production, the study conducted by
Cristófaro-Silva and Mendes (
2022) assessed devoicing patterns of voiced alveolar fricatives among BP speakers of L2 English. Their research examined the pronunciation of word-final /z/ in English plural forms—such as in
labs,
sides, and
bags—employing visual identification of the voicing bar as a method to evaluate voicing. Results showed that only 14% of words with underlying /z/ were produced as being voiced when the fricative was followed by a pause. When the word was followed by a word-initial vowel, voicing rates were higher (41%), reflecting a regressive assimilation rule from BP. The authors then ponder whether the tendency to devoice in the L2 could stem from their own L1, as opposed to being a result of mimicking English input, where word-final fricatives are only partially voiced (
Maniwa et al., 2009).
The present study aims to address the gap highlighted by
Cristófaro-Silva and Mendes (
2022) by investigating whether BP word-final sibilants tend to devoice before a vowel and which factors influence this. To comprehend the potential devoicing of word-final obstruents, one must also consider another emerging phenomenon that interacts with CD: the weakening of high-front vowels in BP.
An integral aspect of BP phonology involves the strategic insertion of a high-front vowel to prevent the formation of illicit consonant clusters, typically represented in spelling by two consecutive consonant letters (
Collischonn, 1996). This phonetic tactic applies across the native lexicon and loanwords, typically involving vowel insertion between two consonants. For instance, in native words like
dogma [‘dɔ.gi.mə] ‘dogma’ and
afta [‘a.fi.tə] ‘cold sore’, and in loanwords like
podcast [pɔ.dʒi.’kɛs.tʃi],
4 the inserted vowel, most often [i] but sometimes [e], surfaces between the consonants to resolve marked clusters. In the case of #sC- clusters such as in
Skype [is.’kaj.pi], the vowel is instead introduced at the onset of the word, effectively breaking up the initial cluster. However, recent studies indicate a gradual decline in the usage of high-front vowels within word-final post-tonic syllables in BP (
Soares, 2016). Consequently, formerly illicit consonant clusters are beginning to emerge and alternate with sequences that still retain the vowel (e.g., [duks] ~ [‘du.kis] for
duques, meaning “dukes”). To fully understand the phenomenon of BP devoicing, we should also acknowledge this ongoing sound alternation in BP, which will be formalized as [Cs] ~ [Cis].
5 Please refer to
Table 1 for illustrative examples of this phenomenon.
Table 1 exhibits BP nouns in their singular orthographic form in the first column, followed by their corresponding transcriptions in the second column, which include an unstressed high-front vowel word-finally. The third column displays the plural forms, appending the letter <s> to the singular form. Our primary interest, shown in the fourth column, demonstrates the alternation between [Cs] and [Cis] word-finally. As mentioned earlier, the alternation between [Cs] ~ [Cis] in BP stems from the reduction and eventual disappearance of unstressed high-front vowels when flanked between a consonant and a final sibilant (
Cristófaro-Silva & Leite, 2015;
Soares, 2016). The fluctuation between the presence and absence of an unstressed high vowel between consonants signifies an emerging phonetic pattern in BP undergoing gradual stabilization (
Cristófaro-Silva, 2016;
Mendes, 2023).
6 Furthermore, as also outlined in
Section 1, a more recent development—what we label “Stage 2”—manifests when a voiceless alveolar fricative appears even before a word-initial vowel, contrary to the more conventional expectation of a voiced fricative. For instance, [hets.ah.ʒẽ.’tʃi.nəs] is produced instead of [hedz.ah.ʒẽ.’tʃi.nəs] for “redes argentinas” (Argentine hammocks). Consider
Figure 1.
Figure 1 displays spectrograms and waveforms of the string [hedza] from the phrase “redes argentinas” (Argentine hammocks) as produced by three participants. In all panels, there is no vowel between the stop and the sibilant, visible in the continuous closure and frication without intervening periodic energy. In
Figure 1a, a pronounced voicing bar indicates a fully voiced cluster [dz].
Figure 1b reveals partial voicing, with a faint voicing bar present. By contrast,
Figure 1c shows no clear voicing bar at all, indicating a voiceless cluster [ts] in a context where a voiced cluster might typically be expected.
The gradient nature of the spectrographic findings in
Figure 1 raises the question of how best to model such variable patterns. In a traditional model of phonology, such as Generative Phonology, CD would be seen as a categorical transformational rule: an underlying voiced obstruent becomes voiceless in word-final position. However, contemporary models such as Exemplar Theory (ET) (
Johnson, 1997;
Pierrehumbert, 2001;
Bybee, 2010) provide a more dynamic perspective by emphasizing the role of frequency in shaping speech sounds. According to ET, sounds are not governed solely by abstract rules but are stored as numerous detailed instances (exemplars) in memory. Crucially, the frequency with which specific exemplars are encountered plays a significant role in determining their prominence in shaping speech perception and production. More frequent exemplars become stronger and more easily accessed. In this sense, the process of devoicing is not a categorical rule but emerges from the accumulation of frequent instances of word-final voiceless obstruents.
ET refines the notion of phonemic categories by proposing that they emerge from clusters of stored exemplars along a continuum of variation. While traditional binary distinctions (e.g., voiced vs. voiceless consonants, tense vs. lax vowels) remain relevant as points of contrast, ET posits that phonemes are better understood as distributions of related exemplars rather than strictly discrete entities. These gradient distributions can influence both the perception and production of speech sounds, emphasizing that a single “phoneme” may encompass significant internal variability.
Furthermore, ET provides a framework for understanding how phonetic detail can be influenced by a wide range of linguistic and non-linguistic factors. Factors such as word frequency, orthography, lexical robustness, speech rate, and speaker identity can all shape the distribution of exemplars and, consequently, affect the realization of speech sounds (
Bybee, 2001). This holistic approach to phonetic variation and change highlights the dynamic nature of language use and challenges the notion of a fixed set of phonological rules governing speech production. Such premises will serve as the focal point of our analysis.
4. Results
This section explores the voicing rates of word-final sibilants in BP, in order to account for a possible ongoing devoicing process. The data include target words produced with and without the presence of orthographic input, and in two contexts: plural nouns with a final sibilant followed by a pause (e.g., cidades [sidads#] ‘cities’) and plural nouns with a final sibilant followed by a vowel (e.g., cidades azuis [sidadzazuɪs] ‘blue cities’).
Given that the production of final sibilants in BP is variable, we investigated the impact of fixed effects—namely, preceding and following phonological contexts, task type, and word frequency—on HNR values. We also treated individual behavior and lexical item as random effects to capture speaker-specific variation and idiosyncrasies tied to particular words. These factors will be addressed in the following subsections.
4.1. Following Phonological Context
Table 5 displays the number of tokens, mean, median, and standard deviation rates of word-final sibilants in BP, utilizing the harmonics-to-noise ratio (HNR) measurement, expressed in dB.
As described in the Methodology section, higher HNR values are associated with the production of segments with a higher degree of voicing. According to
Table 5, sibilants before a pause (e.g.,
cheques [ʃɛks#], ‘cheques’), which are expected to be produced as [s], have a mean value of 2.8 dB and a median of 2.6 dB. On the other hand, sibilants before a vowel (e.g.,
cheques amarelos [ʃɛkz a.ma.’ɾɛ.lʊs], ‘yellow cheques’), which are expected to be produced as [z], have a mean value of 7.9 dB and a median of 7.4 dB.
Analyzing the standard deviation values in
Table 5, it can be seen that there is greater dispersion around the mean values of [z] than of [s]. In terms of ET, we can assume that currently, exemplars that incorporate the phonetic detail of the voiceless sibilant are more robust (i.e., consistent) in BP. On the other hand, exemplars associated with the production of the voiced sibilant show greater phonetic gradience, thus being more variable. This alternation contradicts the premises of traditional phonological literature, where there would be the categorical production of either voiced or voiceless sibilants.
As established in
Section 1 (
Bisol, 2005;
Seara et al., 2017), the alveolar fricative /s/ in BP is typically voiceless in word-final or syllable-final position but becomes voiced before a vowel or voiced consonant. Consequently, we predict higher HNR indices when sibilants are followed by vowels. Refer to
Figure 2.
Figure 2 provides a visual complement to the data reported in
Table 5 by displaying how final sibilant voicing varies according to the following phonological context. The results support the earlier observation that voicing levels increase in pre-vocalic environments (e.g., potes [pɔts#] ‘pots’ vs. potes azuis [pɔtzazuɪs] ‘blue pots’). Analysis from the linear mixed-effects model confirmed this factor as statistically significant, showing a substantial positive effect of 5.04 dB for sibilants preceding vowels (t ≈ 28.76,
p < 0.0001) (see
Table A1 in
Appendix A for the entire regression output).
This pattern parallels observations by
Cristófaro-Silva and Mendes (
2022), who reported that L2 English voiced sibilants produced by Brazilian speakers exhibit significant variability when followed by vowels, being produced both with and without voicing. In order to understand the variability observed in the production of these sibilants in BP, we revisited our spectrographic analysis. Refer to
Figure 3.
Figure 3 displays the spectrogram and waveform of the phrase “os alpes italianos [...]” (the Italian alps), produced by a male participant during the reading task. Notice the presence of the voicing bar during the production of the sibilant in the target word. That is, the form [awp
zitaljənʊs] was produced due to the influence of the following vowel, as predicted by the principle of regressive assimilation in BP. However, the results reported in
Figure 2 indicate that many sibilants remain voiceless at word boundaries even when followed by a vowel, as a significant amount of tokens concentrate in the bottommost part of the violin plot. In order to illustrate such tokens, refer to
Figure 4.
Figure 4 displays the spectrogram and waveform of the phrase “duas redes argentinas [...]” (two argentine hammocks), produced by a male participant. Notice the absence of the voicing bar during the production of the sibilant in the target word. Also note the absence of the dark voicing bar during the production of the stop consonant. This indicates that not only did the sibilant remain voiceless, but the preceding stop also assimilated the voicelessness property.
12 Thus, there was the production of the form [‘hetsah.ʒẽ.’tʃi.nəs] instead of the traditionally expected [‘hedizah.ʒẽ.’tʃi.nəs]. It is worth noting that such behavior was observed in all cluster types followed by vowels evaluated in this paper: [pz#V], [tz#V], [kz#V], [bz#V], [dz#V], [gz#V] became [ps#V], [ts#V], [ks#V] or [bs#V], [ds#V], [gs#V]. Although infrequently, this behavior was observed even in words where consonantal devoicing results in a loss of phonemic contrast (cf.
Section 4.5). For instance, “grades” (fences) and “grátis” (free) were sometimes pronounced the same ([’gɾats]). Similar cases included pairs like “sedes” (headquarters) and “setes” (sets) [’sɛts], “ringues” (boxing rings) and “rinques” (ice rinks) [’hĩks], and “tardes” (afternoons) and “tartes” (pies) [’tahts].
Hence, the data reported above suggest that the devoicing of (stop + sibilant) sequences in BP may represent a gradually unfolding sound change. Notably, devoicing processes have also been documented in other Romance languages, including European Portuguese (
Jesus & Shadle, 2002), Catalan (
Carbonell & Llisterri, 2009), French (
Jatteau et al., 2019a,
2019b), and Romanian (
Hutin et al., 2020).
4.2. Preceding Phonological Context
This section investigates whether the preceding phonological environments influence sibilant (de)voicing in BP. We begin by examining the presence or absence of a high-front vowel [i] before /s/.
From the total of 2833 tokens collected after filtering, 64% represent stop + sibilant sequences without any production of an intrusive high-front vowel, as in
cheques pronounced [ʃɛks]. Conversely, 36% of the tokens include gradient productions of an intrusive [i], as in
cheques pronounced [‘ʃɛ.kis]. This distribution indicates that while [i]-occurrence remains, a significant portion of speakers are omitting it, reflecting the reported gradual decline of this vowel insertion in unstressed positions in BP (see
Soares, 2016;
Cristófaro-Silva & Mendes, 2022). Consider
Figure 5.
Figure 5 illustrates the voicing rates of final sibilants per preceding phonological context. Sibilants preceded by a high-front vowel [i] (left violin plot) exhibit an average HNR of 6 dB. Sibilants preceded by voiced consonants (middle violin plot) show an average of 6.2 dB. In contrast, sibilants preceded by voiceless consonants (right violin plot) display a lower average HNR of 4 dB.
Although these descriptive means suggest that sibilants are somewhat more voiced when preceded by a vowel or a voiced consonant, the model does not reveal a statistically significant main effect of the preceding environment in isolation (voiced consonant: Estimate = −0.1538,
t = −0.648,
p = 0.5170; voiceless consonant: Estimate = −0.2364,
t = −1.007,
p = 0.3143) (cf.
Appendix A,
Table A1).
However, when we re-leveled the factor so that ‘voiced consonant’ became the reference rather than ‘vowel’, the model did uncover a significant difference between ‘voiceless consonant’ and ‘voiced consonant’ (Estimate = −0.506,
t = −2.537,
p < 0.05). This direct comparison aligns with the descriptive means in
Figure 5, where voiceless consonants show lower HNR values. The effect also emerges more clearly when we take into account the following phonological context. Let us now focus on the results regarding the interaction between the preceding and following phonological environments.
4.3. Interaction Between Adjacent Contexts
To capture whether sibilant voicing differs depending on both the preceding and following phonological environments, the analysis was fit to include an interaction between the adjacent contexts. The model reveals a highly significant effect (β ≈ −2.15,
t ≈ −7.69,
p < 0.0001) indicating that, in pre-vocalic contexts, sibilants preceded by voiceless stops are realized with notably lower voicing levels than those preceded by voiced stops (see
Table A1 in
Appendix A for the entire regression output).
Because the LME model selected [i] as the reference level for the preceding context variable, the estimates for voiceless and voiced consonants are compared to the effect of the [i] vowel. Results indicate that when the following context is a pause (baseline), neither a preceding voiced consonant (Estimate = −0.1538, t = −0.648) nor a preceding voiceless consonant (Estimate = −0.2364, t = −1.007) significantly affects HNR relative to the [i] baseline. In contrast, in a pre-vocalic environment, the interaction term for voiceless consonants is highly significant (Estimate = −2.1465, t = −7.694, p < 0.001), indicating a substantial decrease in sibilant voicing compared to [i].
Post-hoc pairwise comparisons (see
Table A2 in
Appendix A) confirm this pattern in two ways. First, when the following context is a vowel (i.e., in pre-vocalic position), sibilants preceded by voiceless stops have significantly lower HNR than those preceded by the [i]-vowel baseline, and those preceded by voiced stops (difference ≈ 2.71 dB,
p < 0.0001). By contrast, when the following context is a pause, neither a preceding voiceless stop nor a preceding voiced stop leads to a significant HNR difference relative to the [i] baseline (both
p > 0.05).
These results regarding the interaction between preceding and following environments also allow us to compare two situations originally labeled (b) and (d) in
Section 1, where we discussed four possible outcomes for stop + /s/ sequences. Specifically, (b) is the case of a voiced stop + /s/ before a vowel, hypothesized to devoice from [bz] to [bs], and (d) is the case of a voiceless stop + /s/ before a vowel, realized as [ps] instead of [pz]. The interactions (cf.
Appendix A,
Table A1) show that sibilants following voiceless consonants remain significantly less voiced in pre-vocalic contexts. Crucially, however, the data do not provide strong evidence that sibilants after voiced stops are devoicing before a word-initial vowel: the average HNR for /voiced C + s/ #V remains relatively high (i.e., closer to [z]-like voicing). This asymmetry suggests that the situation described in (d)—the shift toward voiceless realizations (e.g., /ps/#V → [ps]#V)—is robustly supported by the current findings, whereas the situation proposed in (b)—devoicing /bs/#V → [bs]#V—does not receive equivalent statistical support. In other words, progressive voiceless assimilation from a preceding voiceless stop (as in (d)) appears to override the vowel’s regressive voicing effect on the adjacent sibilant, whereas a preceding voiced stop still preserves the expected voicing (as in [bz]#V). Notably, the post-hoc comparisons further indicate that HNR after a voiceless stop in a pre-vocalic context remains significantly lower than in the pause condition (see
Table A2), reinforcing that progressive voiceless assimilation is robust across these contexts.
4.4. Task Type
Regarding task type, we predicted that the reading task would elicit lower rates of sibilant voicing compared to the picture-naming task. This assumption stemmed from the notion that visual exposure to the grapheme <s> might prompt participants to associate it with the phoneme /s/, which has lower voicing rates compared to [z], and, by extension, potentially link it to patterns of word-final devoicing. Consider
Figure 6, which displays sibilant voicing rates per task type.
Figure 6 displays the voicing rates of final sibilants by task type. Sibilants produced in the picture-naming task (with no orthographic input) showed an average HNR of 5.4 dB, whereas those produced in the reading task (with orthographic input) averaged 5.5 dB. Statistical analysis confirmed that this difference was not significant (Estimate = 0.13, SE = 0.12,
t = 1.03,
p = 0.30).
This indicates that additional variables, such as phonetic context, lexical properties, and individual linguistic behavior, may exert greater influence on the manifestation of consonant devoicing patterns than orthographic input. Having considered the potential impact of phonetic and orthographic factors on sibilant voicing, the focus now shifts towards examining the influence of lexical properties, particularly the characteristics of individual words.
4.5. Word Frequency, Lexical Item and Homophony Avoidance
In ET, word frequency is regarded as an important factor shaping phonetic variation. Consider
Figure 7.
Figure 7 presents a polar plot illustrating the relationship between Log Word Frequency and Mean HNR for different lexical items. Each word is positioned radially according to its log-transformed frequency—the farther from the center, the more frequent the word. The color gradient, ranging from blue (lower HNR values) to red (higher HNR values), represents the mean HNR for each word. Although some frequent words (e.g., “cidades”, “redes”) appear toward the red end of the spectrum and some infrequent words (e.g., “chopes”, “leques”) appear in the blue, the overall distribution shows no consistent pattern. Indeed, the LME model indicates that word frequency does not significantly affect HNR values (β = −0.11,
t = −0.96,
p = 0.34), confirming that frequency exerts negligible influence on voicing in these data.
Given that ET posits that individual words serve as the primary locus of representation, it was also anticipated that each word would exhibit varying levels of sibilant voicing. Consider
Figure 8.
Figure 8 exhibits the voicing rates of final sibilants in BP per lexical item. The results are grouped as follows: nouns with a final sibilant followed by a pause are located in the upper part and are represented by yellow boxplots; nouns with a final sibilant followed by a vowel are located in the lower part and are represented by green boxplots. The distribution of lexical items is displayed in descending order considering the average values of HNR in both production contexts (i.e., sibilant followed by pause or vowel).
A closer inspection of
Figure 8 shows that the words with the highest HNR were ‘plebes’ [plɛbz], ‘caribes’ [ka.’ɾibz], ‘sangues’ [s
gz], and ‘jegues’ [ʒɛgz]. We highlight the fact that these words are constituted of a voiced stop preceding the sibilant. Conversely, the words with the lowest HNR values—“cliques” [kliks], “botes” [bɔts], “artes” [ahts], and “chutes” [ʃuts]—contain a voiceless stop preceding the sibilant, and cluster on the right side of the graph. These findings corroborate the earlier observation that sibilants tend to exhibit a higher degree of voicing when preceded by a voiced stop. Furthermore, they also confirm that HNR values are highest in intervocalic contexts: as shown in the lower part of the graph, sibilants preceded by a voiced stop and followed by a vowel display the most elevated HNR values. This pattern underscores how the surrounding phonological environment—particularly the presence of adjacent vowels or voiced segments—fosters greater sibilant voicing at word boundaries.
An examination of the random effects from the LME model indicates that the random intercept for words (i.e., (1 ∣ word)) has a variance of approximately 0.038, suggesting that different lexical items contribute relatively little variability to baseline HNR values. However, the random slope of the following phonological environment by word (i.e., (1 + following environment ∣ word)) shows a larger variance of about 0.754, indicating that some words exhibit a bigger difference in voicing (HNR) depending on whether the sibilant is followed by a pause or a vowel. The correlation between the random intercept and slope is 1.00, suggesting an overlap in how those random effects are being estimated—possibly due to specific lexical properties or data limitations.
In practical terms, these findings imply that while the overall contribution of individual words to HNR baseline (the intercept) is relatively small, how much an item’s final sibilant voicing increases in vowel environments versus pause environments can differ substantially across lexical items. This variability could stem from word-level factors such as phonotactic constraints, word similarity, or lexical frequency.
In light of Exemplar Theory (ET), the results suggest that learners store multiple phonetic representations for words, encompassing varying degrees of voicing in (stop + sibilant) clusters. ET posits that these exemplars reflect different voicing patterns shaped by prior exposure, which may explain the observed variability in voicing rates across words. Although frequency does not significantly affect HNR values, individual lexical items still exhibit distinct voicing tendencies, likely due to the strength and recurrence of particular exemplar-based representations. Indeed, some exemplars capture fully devoiced clusters preceding vowels, illustrating how phonetic details can spread among related words. In the next section, we will explore how these exemplar-driven patterns are further shaped by individual experience, leading to variation across speakers.
4.6. Individual Behavior
It is expected that different individuals exhibit varying rates of sibilant voicing. This is because, according to the ET, interindividual variation in a linguistic system is dynamic and typically unpredictable, given that speakers have individual learning experiences with the language (
Bybee, 2010). Consider
Figure 9.
From the mixed-effects model, the random intercept variance for participants is approximately 0.59, indicating that some speakers consistently produce higher or lower HNR overall, possibly due to individual voice quality or habitual articulatory settings. More importantly, the random slope variance for the following context (pause vs. vowel) is about 3.21, which underscores that speakers differ substantially in how strongly they voice sibilants in pre-vocalic environments. In other words, some participants exhibit a large boost in HNR before vowels, while others show a smaller or even negligible difference between the two contexts.
Looking at the specific HNR ranges in
Figure 9 helps illustrate these findings. When the sibilant precedes a pause, HNR values across individuals cluster in a relatively narrow band (approximately 1.6 to 4.9 dB), reflecting the robust stability of the voiceless [s] in BP before a pause. By contrast, when a sibilant is expected to be voiced [z] before a vowel, HNR values span a much wider range (about 3.4 to 11.7 dB). This substantial inter-speaker variability is captured statistically by the large random slope variance mentioned above: some participants produce near-categorical voicing, whereas others barely increase their HNR from the pause baseline, resulting in a rather dynamic pattern of voicing.
For instance, participant P, a 16-year-old from Belo Horizonte, demonstrates one of the highest mean HNR values (11.7 dB) in the vowel context, whereas participant I—also 16 and from Belo Horizonte—shows a much lower mean value (3.4 dB). These extremes highlight how individuals can diverge in their phonetic realization of sibilants, even within the same dialect community.
Taken together, these observations reinforce two key points. First, the voiceless sibilant [s] remains categorically stable before a pause for most speakers. Second, the voiced sibilant [z] before a vowel shows extensive phonetic variability, driven both by the phonological environment and by ongoing processes of devoicing. Within ET (
Bybee, 2010), such variability reflects a competition between established patterns (fully voiced clusters before vowels) and emerging realizations (devoiced sibilants)—ultimately underscoring how sound changes may unfold differently across speakers.
5. Discussion and Conclusions
This study sheds light on the occurrence of consonant devoicing in Brazilian Portuguese. Our study introduces methodological innovations by assessing sibilant voicing in a gradient manner, utilizing the harmonics-to-noise ratio measurement. This approach allows for the observation of fine phonetic detail, contributing significantly to our understanding of sibilant variation in BP.
Our results indicate that the [z] sibilant, which typically occurs in BP when followed by vowels and voiced consonants (e.g.,
cheques amarelos [ʃɛkz a.ma.’ɾɛ.lʊs] ‘yellow checks’), now exhibits significant rates of pre-vocalic devoicing. These findings suggest that a sound change could be taking place in the Belo Horizonte dialect of Brazilian Portuguese. Sibilant devoicing appears to be more frequent than previously reported in earlier studies (e.g.,
Cristófaro-Silva, 2003;
Bisol, 2005;
Seara et al., 2017), indicating a possible shift in this particular dialect. However, our current data do not allow for broader generalizations regarding Brazilian Portuguese as a whole. Additional research is needed to determine whether this devoicing pattern is limited to a synchronic variation in the Belo Horizonte dialect or reflects a more widespread phonological development in BP.
By employing a Linear Mixed-Effects model, we investigated multiple predictors of sibilant voicing, including adjacent phonological contexts, task type, and word frequency. The model incorporated random slopes for speaker and word, capturing variability in how individual participants and lexical items responded to the presence or absence of a following vowel.
Results of the LME model indicate that sibilant voicing depends on both the preceding and following phonological environments, as shown by a significant interaction in the statistical model. In particular, when the following context is a vowel, sibilants preceded by voiceless stops exhibit substantially lower voicing levels than those preceded by voiced stops. Post-hoc pairwise comparisons confirm that these pre-vocalic sibilants maintain a robust voiceless realization when following a voiceless stop, suggesting strong progressive devoicing (e.g., /ps/ → [ps] before a vowel).
In contrast, pre-vocalic sibilants following a voiced stop generally exhibit relatively high HNR values, indicating a tendency toward [z]-like voicing—for example, /bs/#V often surfaces as [bz]#V. However, it should be noted that there is still some overlap in HNR values across voiced- and voiceless-stop contexts, so these sibilants are not categorically fully voiced in every token. Although progressive assimilation leading to voicelessness is clearly observed after voiceless stops, there is no comparably strong evidence for devoicing when the sibilant follows a voiced stop; nonetheless, some degree of partial devoicing may occur in certain cases. This discrepancy highlights an asymmetry: a preceding voiceless stop consistently overrides the vowel’s regressive voicing effect, whereas a preceding voiced stop tends to preserve higher levels of sibilant voicing before a word-initial vowel—albeit with variability in individual tokens.
Our analysis of task type revealed that visual exposure to the grapheme <s> did not significantly influence participants’ production of the sibilant, as no statistically significant differences were found between the reading and picture-naming tasks. Similarly, lexical frequency showed no significant effect on voicing rates.
In analyzing different lexical items, we found that while the overall influence of each word on baseline voicing remained relatively modest, some words displayed substantial shifts in sibilant voicing across pause and vowel environments. This variation likely reflects word-specific properties—such as phonotactic constraints or segmental composition—that can either enhance or suppress voicing in final sibilants.
At the speaker level, participants showed little variability in producing expected pre-pausal voiceless [s] sibilants, but significant variability occurred with expected pre-vocalic voiced [z], as many tokens were devoiced. Speakers also varied considerably in how strongly they voice sibilants before vowels versus pauses. Some showed a large difference (high HNR in vowel contexts), whereas others were more uniform across contexts.
Our findings highlight the competitive dynamics between phonetic variants within the mental lexicon, showing that Exemplar Theory offers an insightful framework for investigating the kind of phonetic variation under study (
Johnson, 1997;
Pierrehumbert, 2001;
Bybee, 2010). Specifically, exemplars associated with traditionally categorical sound patterns—such as the production of voiced clusters followed by vowels—are competing with an emerging sound pattern characterized by the devoicing of word-final clusters. The competition between progressive voiceless assimilation from the preceding stop and regressive voicing assimilation from the following vowel necessitates a dynamic, stochastic model of sound change, as outlined in Exemplar Theory, where phonological representations are continually reshaped by language use and exposure to variable exemplars.
According to Exemplar Theory, fine phonetic detail plays a crucial role in shaping phonological representations. The complex relationship between sibilant voicing and the voicing of their preceding stops in BP suggests an ongoing sound change influenced by multiple phonological and aerodynamic factors, consistent with cross-linguistic findings (
De Schryver et al., 2008;
Strycharczuk, 2012;
Hayes-Harb et al., 2018;
Hutin et al., 2020). Moreover, the variability we observed—both across speakers and across lexical items—fits well with the notion that each individual’s mental lexicon stores a multitude of exemplars, and that phonological categories are continually updated as speakers encounter both traditional and emerging patterns.
In conclusion, our gradient, variable, and mixed-effects approach provides a richer understanding of consonant devoicing in BP than previously available. By examining sibilant voicing in detail with HNR measures, we demonstrate that devoicing patterns are not limited to a single context but arise from the interplay of multiple phonological and aerodynamic factors. The statistical modeling incorporating random slopes shows that (1) contextual (preceding and following) environments remain the strongest predictors of devoicing, and (2) individual and lexical variation are substantial, suggesting that phonological change may be spreading unevenly through the speech community and the lexicon.
Future research could examine whether devoicing extends beyond word-final sibilants to other segments, including the occasional devoicing of preceding stops observed in this paper through spectrographic analyses (e.g., redes argentinas produced as [’hets ah.ʒẽ.’tʃi.nəs] rather than [’hedʒiz ah.ʒẽ.’tʃi.nəs]). Additionally, it would be important to investigate how the presence of a following voiced consonant might influence such patterns, in order to provide a more complete picture of devoicing in Brazilian Portuguese. Investigating additional cluster types (e.g., fricative + stop, nasal + stop, or liquids + sibilants) and dialects of BP could further clarify whether these patterns reflect a broader phenomenon, and shed light on the role of fine phonetic detail in shaping phonological representations.