**Information-Seeking Question Intonation in Basque Spanish and Its Correlation with Degree of Contact and Language Attitudes**

#### **Magdalena Romera <sup>1</sup> and Gorka Elordieta 2,\***


Received: 18 November 2020; Accepted: 9 December 2020; Published: 14 December 2020

**Abstract:** The present study analyzes the prosodic characteristics of the variety of Spanish in contact with Basque (in the Basque Country, Spain). We focus on information-seeking yes/no questions, which present different intonation contours in Spanish and Basque. In Castilian Spanish, these sentences end in a rising contour, whereas in Basque, they end in a falling or rising–falling circumflex contour. In our previous work, this topic was investigated among the urban populations of Bilbao and San Sebastian. The results were that 79% of information-seeking yes/no questions had final falling intonational configurations. All the speakers presented a substantial presence of final falls regardless of their linguistic profile, but there were differences among speakers in the degree of presence of such features. A correlation was observed between the dependent variable of 'frequency of occurrence of final falls in absolute interrogatives' and social factors, such as 'degree of contact with Basque' and 'attitudes towards Basque and the Basque ethnolinguistic group'. The correlation was that the higher the degree of contact with Basque and the more positive the attitudes towards Basque and the Basque ethnolinguistic group, the greater the frequency of occurrence of final falling intonational contours in information-seeking absolute interrogatives. The interpretation of this correlation was that the adoption of the characteristic Basque prosody allows speakers to be recognized as members of the Basque community. In the present study, we focused on rural areas. Falling intonational contours at the end of information-seeking absolute interrogatives were even more common than in urban areas (93.4%), and no correlation was found with degree of contact with Basque and with attitudes towards Basque. Our interpretation is that in rural areas the presence of Basque in daily life is stronger, and that there is a consolidated variety of Spanish used by all speakers regardless of their attitudes. Thus, the adoption of intonating features of this language is not the only indicator belonging to the Basque ethnolinguistic group. Our study reveals the great relevance of subjective social factors, such as language attitudes, in the degree of convergence between two languages.

**Keywords:** intonation; language contact; bilingualism; language attitudes; social factors; Spanish; Basque

#### **1. Introduction**

Within the growing field of research on phonetic and phonological issues of language contact and bilingualism, aspects of suprasegmental phonology have started receiving more attention, especially in prosody and intonation (a comprehensive list of references is provided in Elordieta and Romera (2020a)). A particularly interesting issue is that the presence of features of a language variety (LV-A) in another language variety (LV-B) is variable within the contact population. The main goal of this paper is to show that individual social factors of speakers of LV-A may help explain the differences among speakers in

the degree of presence of LV-B features. Such factors can be the degree of contact of LV-A speakers with LV-B speakers and the attitudes of LV-A speakers towards LV-B and the LV-B ethnolinguistic group.

Sociolinguistic studies commonly explain linguistic variation in contact situations as the result of two types of factors: internal tendencies of the languages that favor a linguistic change, and the external influence that one language exerts on another in contact situations (Winford 2005, 2014; Poplack and Levey 2010, among others). Aspects such as the extent of contact, the density of speakers of each language, the relative prestige of each language, or the knowledge and relative use of each language by the speakers are usually invoked to explain transfer of features or convergence. However, other factors of a psychosocial nature also play a determining role in the adoption of linguistic features by speakers of languages in contact. Work along these lines includes that of Romera and Elordieta (2013) on Catalan intonational features in Spanish, Kozminska (2019) on the presence of English intonational features on the variety of Polish spoken by Polish immigrants, and Elordieta and Romera (2020a) on falling intonational contours in yes/no questions typical of Basque in the variety of Spanish in the Basque Country.

Elordieta and Romera (2020a) investigated the presence of (rising–)falling intonational configurations at the end of information-seeking absolute interrogatives in Basque Spanish as a feature that is received from Basque, directly or indirectly. Our study focused on the cities of Bilbao and San Sebastian. These two cities have always been in contact with Basque, although Spanish is the dominant language. In San Sebastian, a vernacular variety of Gipuzkoan Basque is still spoken, and in Bilbao, a variety of northern Bizkaian was spoken until the beginning of the last century. Next to the two cities, there are towns where Bizkaian and Gipuzkoan Basque are still spoken, and historically, there has been close contact with inhabitants of those towns in the form of commercial relationships and local immigration to the two capital cities. Bilbao and San Sebastian have populations of 343,430 and 181,652, respectively (858,236 and 327,428 when the urban areas they form together with smaller towns that surround them are considered; cf. (Eustat 2019)). The 1,185,664 inhabitants of the Bilbao and San Sebastian metropolitan areas amount to 54% of the population of the Autonomous Community of the Basque Country (2,188,017).

Elordieta and Romera (2020a) found that 79% of all information-seeking yes/no questions had final falling configurations in these two cities. The circumflex nuclear contour of information-seeking yes/no interrogatives in Basque Spanish is found in Castilian Spanish in absolute interrogatives that are not information seeking in nature, but rather have other pragmatic nuances, at least in elicited speech (from read sentences or from Discourse Completion Tasks). Escandell Vidal (1998, 1999, 2017) claims that a circumflex contour can appear in yes/no echo questions, used to express surprise at what the interlocutor has just said or to beg a clarification from the interlocutor of what (s)he has just uttered (cf. also (Estebas-Vilaplana and Prieto 2010; Hualde and Prieto 2015, p. 378)). Torreira and Floyd (2012) find this contour in yes/no questions that serve the discourse function of signaling that the topic of the discourse is being followed up, or that the "course of action" is maintained. In our corpus, the absolute questions were of the genuine information-seeking type, which, in Castilian Spanish, have been reported to have a final rising configuration. In order to be able to establish a more direct comparison with central varieties of Peninsular Spanish, we recorded seven speakers from Madrid in conversations of the same type as those in the Basque Country. The results showed that two-thirds of the information-seeking absolute interrogatives ended in a rising configuration (a percentage that rose to 84% for five of the seven speakers). Thus, the relative frequency of appearance of rising and falling configurations was roughly the opposite in Basque Spanish and Madrid Spanish.

This influence could be understood in diachronic terms as a historical transfer by native Basque speakers to their Spanish, followed by a consolidation of falling contours as a characteristic of Basque Spanish. Other northern varieties of Spanish, such as those spoken in Galicia, Asturias, and Cantabria, present falling final contours in absolute interrogatives, especially in non-urban areas. However, the falling contours are different from those found in Basque Spanish. For a detailed comparison, the reader is referred to Elordieta and Romera (2020a).1

In order to explain this variation, we tested several social factors, namely the attitudes towards the Basque language and the Basque ethnolinguistic group and the degree of contact with Basque. Our results showed that the attitudes speakers presented provided a high degree of explanation for the prosodic convergence of Spanish and Basque in two cities that were analyzed (Elordieta and Romera 2020a). Speakers who showed more positive attitudes towards Basque and the Basque-speaking group also produced higher rates of prosodic features present in Basque. The degree of contact was also a determining factor in explaining convergence of prosodic features. Together, the degree of contact with Basque and the attitudes toward this language explained almost 80% of the variation.

The one question that arises now is whether the situation is different in smaller towns where a vernacular variety of Basque has always been dominant. First, it is worth investigating whether a higher degree of contact with Basque and the Basque ethnolinguistic group in non-urban towns determines an even higher use of final falls in yes/no questions. Second, if the population in non-urban towns has positive attitudes towards Basque and the Basque ethnolinguistic group, it would be interesting to know whether this fact also leads to higher percentages of occurrence of falling nuclear contours in absolute interrogatives. These questions are the goal of our present paper.

#### **2. Previous Study on the Prosody of Spanish in Contact with Basque and the Influence of Social Factors**

#### *2.1. Final Falling Contours in Absolute Interrogatives in Bilbao and San Sebastian*

In Elordieta and Romera (2020a), we conducted a study on the intonation of information-seeking absolute interrogatives in Spanish as spoken in Bilbao and San Sebastian. We collected data through sociolinguistic interviews (Silva-Corvalán 2001) from 12 speakers of different linguistic profiles: Spanish monolinguals, L1 Spanish/L2 Basque bilinguals, and L1 Basque/L2 Spanish bilinguals. There were six females and six males, all between 35 and 55 years old and with secondary education at least.

The study revealed that in Bilbao and San Sebastian, 79% of all information-seeking yes/no questions (136 of a total of 172) had final configurations with a rising–falling circumflex contour. This contour can be transcribed in the autosegmental–metrical annotation system as L+(¡)H\* (H)L%, that is, a rising pitch accent with the peak on the stressed syllable followed by a drop in tone in the final syllable. The pitch reached in the stressed syllable may exceed the level reached in the rest of the sentence, hence the upstep diacritic '¡'. On the other hand, the high tonal level may be maintained in the final syllable of the interrogative sentence and may fall even more abruptly towards the end, hence the possible presence of the high tone H in the boundary tone (subject to intra- and inter-speaker variation). Figure 1 shows an intonation contour of an absolute interrogative sentence in Basque Spanish, corresponding to a male bilingual speaker from San Sebastian with Spanish as his native language.

<sup>1</sup> There is a growing literature on the presence of prosodic features of one language on another it is in contact with. For a comprehensive bibliography, which includes studies on Spanish in contact with other languages in the Iberian Peninsula and in America, the reader can consult Elordieta and Romera (2020a).

**Figure 1.** F0 contour of an absolute interrogative statement in Basque Spanish by a male L1 Spanish/L2 Basque speaker from San Sebastian.

Final rising or sustained pitch configurations are found in 21% of the information-seeking yes/no questions (36 interrogatives). Of these, only 10 interrogatives present the L\* H% configuration of Castilian Spanish (6% of the total number of absolute questions), the rest having a rising nuclear accent, L+(¡)H\* (¡)H%.

The finding that only 21% of information-seeking absolute interrogatives end in rising nuclear contours in Basque Spanish contrasts with the traditional description of neutral information-seeking absolute interrogatives in Castilian Spanish (central and southern varieties). They are characterized primarily by rising final contours in which the final stressed syllable of the statement presents a low tonal value followed by a rising intonation in the last syllable (see, among others, Navarro Tomás (1944), Quilis (1993), Face (2008), Hualde (2005), Estebas-Vilaplana and Prieto (2010), Henriksen (2010), Henriksen and García-Amaya (2012), and Hualde and Prieto (2015)). In Sp\_ToBI, this tone is transcribed as L\* H%.

Given that falling circumflex tones in absolute interrogatives are typical of Basque (cf. (Elordieta 2003; Gaminde et al. 2016; Robles-Puente 2012; Elordieta and Hualde 2014)), Elordieta and Romera (2020a) attributed the high frequency of falling circumflex tones in the Spanish spoken in the Basque Country to an influence from Basque (cf. also (Robles-Puente 2012; González and Reglero 2021)). Figure 2 shows an intonation contour of an absolute interrogative statement in Gipuzkoan Basque (example taken from (Elordieta and Hualde 2014, p. 457)).

**Figure 2.** F0 contour of an absolute interrogative statement in Gipuzkoan Basque by a female speaker (example from (Elordieta and Hualde 2014, p. 457)).

Several studies have reported the existence of falling intonation patterns in absolute interrogatives in Central Castilian Spanish. These interrogative sentences are not genuine information-seeking questions, but rather have pragmatic connotations of echo, imperative, or confirmatory questions, in which the speaker is attributing the content of the interrogative to another person (cf. (Escandell Vidal 1998, 1999, 2017; Estebas-Vilaplana and Prieto 2010; Hualde and Prieto 2015; Henriksen et al. 2016), among others). These are annotated by the above-mentioned sources as L+H\* L%, with a rise in pitch on the nuclear syllable above the level of other previous high tones (hence also annotated as L+¡H\* L% by (Torreira and Floyd 2012)).

In spontaneous speech in Madrid Spanish, Torreira and Floyd (2012) claim that circumflex tones may be even more common than rising tones, which suggests that the most neutral intonation patterns are not necessarily the most common in conversational speech (cf. (Hualde and Prieto 2015)). Torreira and Floyd (2012) mention several but disperse pragmatic and discourse contexts where the circumflex contour occurs, but they come up with the generalization that this type of contour is mainly used as a "topic follow-up", and secondarily as a signal that the speaker is "maintaining the course of action". These interrogatives appear with the following functions in discourse in the corpus analyzed by these authors: responding to a previous question, providing receipt of news, initiating a repair, checking the listener's attention during a statement, or providing a pre-announcement during a statement.

Henriksen et al. (2016) also observed that rising tones are rare in spontaneous speech among speakers of Manchego Spanish, a variety of Castilian Spanish spoken to the south of Madrid. Inspired by Escandell Vidal (1998), the authors associated the rising contours with statements in which the content of the question is attributable to the speaker (in other words, a genuine information-seeking question). Falling contours are more common in interrogative sentences in which the content of the question can be attributed to another person, be it the speaker's conversation partner or another, external party.

The absolute interrogatives analyzed by Elordieta and Romera (2020a) in Basque Spanish, however, corresponded to a genuine search for information. The absolute interrogative statements

occurred in the context of a semi-directed interview or conversation in which the interviewer asked questions that sought information about the interviewee that was unknown to the interviewer. Thus, the comparison remains legitimate: The unmarked pattern in Castilian Spanish for neutral, information-seeking absolute interrogatives is a final rise, but in Basque Spanish, it is a final fall. In any case, with the aim of settling the issue, Elordieta and Romera (2020b) carried out an analysis of the nuclear configurations of absolute interrogatives in Madrid Spanish by applying the same methodology followed in Elordieta and Romera (2020a) to seven speakers of Madrid Spanish so that the data could be directly comparable. The main result in Elordieta and Romera (2020b) was that in Madrid Spanish, 66.3% of information-seeking questions ended in rising contours, two-thirds of the total. In Basque Spanish, Elordieta and Romera (2020a) had found only 21% of information-seeking absolute interrogatives ending in a rising contour. That is, whereas final falls are the norm in Basque Spanish (79%), final rises are the norm in Madrid Spanish (only 33.7% of final falls). The pervasive presence of final falling configurations in Basque Spanish is such that there are no subjects with fewer than 64% of rising–falling nuclear contours, and two of the twelve subjects analyzed had all interrogatives ending in this type of contour, i.e., 100% of final falls, with no final rises at all.

#### *2.2. The Role of Degree of Contact with Basque and Linguistic Attitudes towards Basque on the Prosody of Spanish in Contact with Basque in Urban Areas*

In principle, one could consider the hypothesis that there could exist a correlation between the frequency of occurrence of final falling contours and the degree of knowledge of Basque. However, as shown in Table 1 below, Elordieta and Romera (2020a) found no significant differences in the frequency of occurrence of falling or rising contours in absolute interrogatives depending on the linguistic profile of the subjects (i.e., whether they were monolingual speakers of Spanish, L1 Spanish/L2 Basque speakers, or L1 Basque/L2 Spanish speakers). There were no differences depending on the city of origin (Bilbao or San Sebastian) or on the gender of the speakers, either.


**Table 1.** Numbers and percentages of information-seeking yes/no interrogatives with rising–falling and rising final contours (Falls and Rises, respectively) for each linguistic profile in San Sebastian and Bilbao.

These data indicate that at least 70% of interrogatives of all speakers ended in a fall in all groups. Although the monolingual group had a lower production of falling contours as compared to the other groups, no statistical correlation was found with the linguistic profiles (chi-square = 12.000; *p* = 0.285 for Bilbao; and chi square = 6.000; *p* = 0.306 for San Sebastian).

Rather, Elordieta and Romera (2020a) found a stronger relationship between the frequency of occurrence of final falling configurations with individual social factors, such as (a) the attitudes that each individual has towards Basque and the Basque ethnolinguistic group, and (b) the degree of contact of each individual with Basque and the Basque ethnolinguistic group. With respect to attitudes, all speakers expressed positive attitudes towards the Basque language and the Basque group in both cities. A total of 75.1% of the speakers showed very positive attitudes and only 24.9% showed not-so-positive attitudes. However, it was precisely the speakers who showed not-so-positive attitudes who also showed less production of downfalls (between 64% and 75% of falling interrogatives), as opposed to speakers more inclined towards Basque, whose production of falls was higher (77% to 100%). Despite this, the attitudinal factor could only account for roughly 50% of the variation (R2 = 0.466, F (1, 10) = 8.721, *p* = 0.01). In contrast, the degree of contact accounted for 70% of the

variation. Speakers with lower contact levels produced lower percentages of final falling contours (between 64% and 80%), while speakers with a higher contact value produced between 83% and 100% of final falls.

The degree of contact, however, seems to be a relevant factor in explaining the linguistic behavior of certain L1 Basque speakers, whose attitudinal value is positive but who had a percentage of interrogative falling circumflex contours relatively lower than expected for this attitudinal value. The substantial exposure not only to Basque Spanish, but also to other Peninsular Spanish varieties, as business trips to other parts of Spain might lead to producing more interrogative utterances with this type of final contour.

Finally, when both social factors were combined, 80% of the differences in the percentages of circumflex contours could be accounted for. That is, those speakers who had a closer contact with Basque or with speakers of Basque and who had favorable attitudes towards Basque and speakers of Basque presented higher percentages of final falling intonational contours (R2 = 0.807, F (2, 9) = 18.844; *p* = 0.001). Therefore, although the degree of contact may account for a higher percentage of the production of falling contours, the combination of both factors provided a better explanation of the results.

#### **3. Spanish Prosody and Social Factors in Non-Urban Basque-Speaking Areas**

#### *3.1. Methodology*

For the present study, we recorded speakers from two rural or non-urban small towns: Lekeitio, in the same province as Bilbao, and Ibarra, in the same province as San Sebastian. Their populations as of 2020 were 7227 and 4306, respectively. The criteria for speaker selection were the same as those followed for Bilbao and San Sebastian. There were six speakers in each town (i.e., 12 in total), and they belonged to the same three linguistic profiles as in Bilbao and San Sebastian: Spanish monolinguals, L1 Spanish/L2 Basque bilinguals, and L1 Basque/L2 Spanish bilinguals. Gender, age, and level of education were also taken into account. Therefore, six women and six men between the ages of 35 and 55 years old and with a medium–high education level were recorded. (All subjects gave their informed consent for inclusion before they participated in the study. The ethic code for the research: CEISH/115/2012/ELORDIETA ALCIBAR)

The speakers were recorded using the same sociolinguistic interviews as for Bilbao and San Sebastian (Silva-Corvalán 2001). Two interviews were conducted with each experimental subject. The first interview was one in which the subjects were asked questions by the interviewers. With such an interview, declarative utterances were obtained from the subjects. The questions were divided into three modules. The first module contained questions on the subjects' degree of knowledge of the two languages in contact, Basque and Spanish. The second module had questions related to the degree of use of each of these languages. In the third module, the subjects were questioned on their attitudes towards Basque and the Basque ethnolinguistic group: for instance, whether they thought that speaking Basque improved the social image of a person, whether, in their opinion, knowing and speaking Basque was useful in their personal and professional lives, or whether they thought that Basque should be taught obligatorily. The questions were written on sheets of paper in the form of bulleted topics rather than in the form of full questions, the idea being that the interviewer posed the questions in as natural a style as possible, not as read speech.

The second interview was one in which the subjects took the role of interviewers and asked their interlocutors (i.e., the interviewers of the first part) the same questions that they had been asked. This way, a number of absolute and partial interrogative utterances (i.e., yes/no and wh-questions, respectively) were recorded from the subjects. The interviewers were speakers of Basque Spanish and members of the community, and had been trained to conduct the interviews. That way, we fostered a situation in which our experimental subjects could feel more at ease with the activity, talking to a person that speaks the same variety of Spanish as theirs. This could favor a more natural production

from the subjects. The interviewers also served the role of selecting speakers who fulfilled the social and linguistic profiles described above.

The recordings took place in quiet rooms at the speakers' homes or workplaces in order to facilitate an optimal level of confidence and comfort for the speaker. The interviews were recorded with a Tascam DR-100 digital recorder through a built-in omni-directional microphone pointing towards the subjects but able to capture the speech of the two participants. The audio was recorded with a sampling rate of 44,000 Hz in wav format. The conversations were also recorded on video with a Sony video camera held on a tripod, with the objective of analyzing the speakers' level of (dis)comfort, relaxation, or nervousness while answering the questions about their attitudes towards Basque and the Basque ethnolinguistic group. In all, 9 h and 25 min of conversations were recorded for the 12 subjects, with an average of 47 min per subject. This is a remarkable similarity with our recordings in Bilbao and San Sebastian, where we collected 9 h and 20 min of conversations for 12 speakers, with an average of 46 min per speaker. A total of 360 declarative utterances, 155 absolute interrogatives, and 201 partial interrogatives were segmented—a similar amount in comparison with the one for Bilbao and San Sebastian (albeit with fewer absolute interrogative utterances). As for Bilbao and San Sebastian, we collected more declarative utterances than interrogatives because speakers produce more utterances when they respond than when they ask. We collected an average of 30 declarative utterances as well as almost 13 absolute interrogatives and 17 partial interrogatives.

Like for Bilbao and San Sebastian, in this article, only neutral or information-seeking absolute interrogatives were considered, without any pragmatic bias on the part of the speaker uttering the question. That is, the interrogative sentences corresponded to a genuine search for information. This choice was motivated by the fact that these are the sentence types that have the most significantly different intonational contours in Castilian Spanish and Basque. Hence, any intonational features in yes/no interrogatives in the Spanish variety of the Basque Country that differ from the typical Castilian Spanish features and which resemble those of Basque are more easily discernible. They commonly end in rising contours in the former (cf. (Face 2008; Estebas-Vilaplana and Prieto 2010; Hualde and Prieto 2015; Henriksen et al. 2016; Elordieta and Romera 2020b)) and in falling contours in the latter (cf. (Elordieta 2003; Gaminde et al. 2016; Robles-Puente 2012; Elordieta and Hualde 2014; Eguskiza et al. 2017)). A higher frequency of final falling intonational configurations would suggest an influence from Basque. In this regard, Elordieta and Romera's (2020b) study on information-seeking absolute interrogatives in Madrid Spanish is eloquent, as it uses the same methodology as the one in Elordieta and Romera (2020a), and the results are thus directly comparable. In this variety of Castilian Spanish, two-thirds of the information-seeking yes/no questions end in a rising contour, and thus only one-third of these types of questions end in a falling contour.

With respect to the social factors analyzed, the degree of contact, and the attitudinal component, all answers were coded both qualitative and quantitatively. From a quantitative point of view, each speaker was given two values, one regarding the degree of contact they maintained with the Basque language, and other regarding the attitude they showed towards the Basque language and the Basque ethnolinguistic group. Following studies in the field, each response was given a value on a scale of 1 to 3, 1 being the lowest value and 3 the highest. Then, the mean value of these ratings was calculated for each speaker, and the resulting number was assigned to each of them and taken as an index. We called these the contact value and attitudinal value, respectively (Elordieta and Romera 2020a, p. 25).

Finally, although the number of tokens analyzed may be judged to be low, it must be taken into account that conversational speech is not so straightforward to analyze. Overlaps, distortions, and interruptions in natural speech hinder the task of segmentation. The annotation of prosodic features is complicated as well due to uncontrolled phonological processes of resyllabification and phonetic lenition of consonants and vowels frequently observed in informal and fast speech. Moreover, the phonetic analysis was combined with a sociolinguistic analysis of the 12 speakers. In the end, the size of the data was roughly similar to the one for San Sebastian and Bilbao, and it will hence allow for a direct comparison as well.

#### *3.2. Final Intonational Configurations in Absolute Interrogatives in Lekeitio and Ibarra*

Of the 155 absolute questions in Lekeitio and Ibarra, we finally analyzed 137 utterances. We had to discard 18 interrogatives because they did not represent genuine information-seeking absolute interrogatives. Some of them were disjunctive (i.e., 'do you like X or do you prefer Y'?), and others were uttered without a verb, as not-full-fledged questions (e.g., 'place of origin?'). In the two towns, 93.4% of all information-seeking yes/no questions ended in final falling contours (128 out of 137). There was a slightly bigger percentage of falls in Lekeitio than in Ibarra, but the difference was small and, hence, not significant: 95.5% in Lekeitio (64 final falls out of 67 interrogatives) and 91.4% in Ibarra (64 final falls out of 70 interrogatives). Only 6.6% of all information-seeking yes/no questions ended in rising contours (9 out of 137), the most frequent contour being L+(¡)H\* (¡)H%, that is, a rising nuclear accent with a peak on the nuclear syllable followed by a further rise on the final syllable, sometimes reaching a very high level, hence the upstepped diacritic. We only found one case of the contour reported traditionally as the most frequent one for Castilian Spanish, L\* H%, that is, a low tone on the nuclear syllable followed by a high boundary tone.

Of all the information-seeking interrogatives, 53.3% were of the rising–falling circumflex type found in San Sebastian and Bilbao, that is, a rising nuclear tone with a peak on the tonic syllable (L+H\*) followed by an L% or HL% boundary tone. The latter is distinguished from plain L% because the high tone level of the nuclear accent is maintained up to the middle of the final syllable of the word (also final in the utterance) before it falls to a low pitch level. In many of the instances, an upstepped pitch level was observed on the accentual H\* tone, which can be transcribed in a Tones and Break Indices (ToBI) model as L+(¡)H\*. Thus, the general shape for the most common nuclear contour in information-seeking interrogatives in Lekeitio and Ibarra is L+(¡)H\* (H)L%. As we said, this is also the most common type of contour in San Sebastian and Bilbao (69.5%, cf. Elordieta and Romera 2020a). Figure 3 shows a pitch track of an information-seeking yes/no question with an L+H\* L% contour uttered by a male Spanish monolingual speaker from Lekeitio.

**Figure 3.** F0 contour of an absolute interrogative statement in Basque Spanish, by a male monolingual Spanish speaker from Lekeitio.

In several instances of final falling contours, the nuclear accent was not a rising one, or at least not clearly, and an L% boundary tone followed. We chose the label H\* L% for this type of contour, shown by 38.7% of all the information-seeking absolute interrogatives. Figure 4 illustrates an example, uttered by a female L1 Basque speaker from Ibarra. The high F0 level at the beginning of the syllable with the nuclear accent (i.e., the syllable ke, in the word euskera 'Basque') is a microprosodic effect of the voiceless plosive [k].<sup>2</sup>

**Figure 4.** F0 contour of an absolute interrogative statement in Basque Spanish by a female L1 Basque/L2 Spanish speaker from Ibarra.

Finally, Figure 5 illustrates an F0 contour of an utterance with a final rising contour by a female L1 Spanish speaker from Lekeitio.<sup>3</sup>

<sup>2</sup> The lengthening of word-final vowels is very common in spontaneous speech. In Figure 4, the final vowel of the complementizer *que* 'that' is written with two vowels to indicate that it is lengthened.

<sup>3</sup> An anonymous reviewer asks whether the complexity of the syntactic structure of the interrogative utterances may influence the intonational contour. Since the utterance in Figure 5 (with a final rising configuration) is shorter and with simpler syntactic constituency than the ones in Figures 1, 3 and 4 (with a final falling configuration), the reviewer asks whether a correlation was found between the length and syntactic complexity of the utterances and the falling or rising final contours. However, there is no correlation. On the one hand, in our data there are many short utterances with final falls, such as *¿Estás casado?* 'Are you married?', *¿Tienes hijos?* 'Do you have children?', *¿Te gusta Lekeitio?* 'Do you like Lekeitio?', or *¿Te has fijado?* 'Have you noticed?'. On the other hand, among the very few utterances with final rises, there are long utterances such as *¿Y la persona que viva aquí, debería o tendría que hablar en euskera?* 'And the person that lives here, should (s)he or would (s)he have to speak in Basque?'.

**Figure 5.** F0 contour of an absolute interrogative statement in Basque Spanish by a female L1 Spanish/L2 Basque speaker from Lekeitio.

#### *3.3. Social Factors in Ibarra and Lekeitio*

The fact that 93.4% of information-seeking absolute interrogatives in Ibarra and Lekeitio end in falling contours can be compared to the fact that 79% of yes/no questions of the same type end in falling contours in Bilbao and San Sebastian, as found by Elordieta and Romera (2020a). Within the Basque Country, then, it seems that falling configurations in information-seeking yes/no questions are more common in non-urban small towns than in cities. The difference is even bigger, of course, when compared to Madrid Spanish, where only 33.7% of information-seeking yes/no questions end in falls, as recently found by Elordieta and Romera (2020b). These results would suggest that the influence of Basque prosody is stronger in small towns than in cities.

Language dominance does not seem to have an effect on the occurrence of final falls or rises, like in Bilbao and San Sebastian (cf. Elordieta and Romera 2020a). Table 2 shows the percentages of final falling configurations depending on the speakers' linguistic profiles: monolingual Spanish, L1 Spanish, and L1 Basque. Although L1 Spanish speakers appear to have a smaller frequency of use of final falling configurations compared to the other two groups, the cross-tabulation test did not return a significant difference (χ<sup>2</sup> = 1.262; *p* = 0.532). It is noteworthy that monolingual speakers produced more falling configurations than bilingual speakers.

**Table 2.** Numbers and percentages of information-seeking yes/no interrogatives with rising–falling and rising final contours (Falls and Rises, respectively) for each linguistic profile in the two towns combined: Lekeitio and Ibarra.


Tables 3 and 4 show the frequencies of occurrence of final falls and rises in absolute interrogatives in Lekeitio and Ibarra, respectively. It is in Lekeitio where L1 Spanish bilinguals have fewer falls, but the differences with the other groups are not statistically significant (χ<sup>2</sup> = 2.844; *p* = 0.241). In Ibarra, there is no difference between groups. It is rather telling that monolingual Spanish speakers in Lekeitio have 100% of final falling configurations, even more than L1 Basque speakers.

Like in Bilbao and San Sebastian, gender is not a significant factor. Males produced 71 informationseeking absolute interrogatives, of which 67 ended in a falling configuration and four ended in a rising contour (94.3% and 5.7%, respectively). Females produced 61 final falls and five final rises (92.4% and 7.6%, respectively). This similarity across genders holds in each town. Table 5 below shows the percentages of falling and rising contours according to the speakers' gender.

**Table 3.** Numbers and percentages of information-seeking yes/no interrogatives with rising–falling and rising final contours (Falls and Rises, respectively) for each linguistic profile in the town of Lekeitio.


**Table 4.** Numbers and percentages of information-seeking yes/no interrogatives with rising–falling and rising final contours (Falls and Rises, respectively) for each linguistic profile in the town of Ibarra.


**Table 5.** Percentage of information-seeking yes/no interrogatives with falling and rising final contours according to speakers' gender.


Consequently, our data indicate that in Lekeitio and in Ibarra, the production of falling interrogatives is the norm among the speakers interviewed. Table 6 below illustrates that for 75% of the speakers (9 out of 12), 90–100% of their information-seeking absolute interrogatives had final falling contours (these speakers are highlighted in gray). For 16% of the speakers (2 out of 12), 86–87.5% of this type of interrogative showed falling contours, and only one person had 78% of final falling configurations. That is, the frequency of occurrence of final falling intonational contours is 78–100%, with three-fourths of the speakers having frequencies of 90–100%. In fact, half the speakers in Ibarra and Lekeitio (6 out of 12, cf. Table 6) had all their information-seeking absolute interrogatives displaying final falling tonal configurations. These data indicate that the falling pattern is widespread in the area and that differences in the production of interrogatives are minimal.

In spite of this, we were interested in contrasting these results with those in the cities and seeing to what extent social factors were playing similar roles, i.e., whether the degree of contact and attitude towards the Basque language and the Basque ethnolinguistic group could still be playing a role in this minimal variation. The presence of Spanish in the cities of Bilbao and San Sebastian is stronger than in smaller, non-urban towns, where a vernacular variety of Basque has always been dominant. Hence, we would like to know whether a higher degree of contact with Basque and the Basque ethnolinguistic group in non-urban towns leads to a higher use of final falls in yes/no questions. In the same line, regarding the attitudinal factor, which proved relevant in explaining the variation in Bilbao and San Sebastian, we would like to know whether a positive attitude also led to higher percentages of occurrence of falling nuclear contours in absolute interrogatives in Lekeitio and Ibarra.


**Table 6.** Percentage of interrogative sentences ending in falling contours according to the speakers' language profile4.

With regard to the degree of contact with Basque, it showed correlation with the linguistic profile of the speakers (*p* = 0.008) (cf. Table 7, where the degree of contact is named "contact value"). On a scale from 1 to 3 (1 being the lowest degree of contact, and 3 the highest), monolingual speakers ranged between 1.22 and 1.44, while bilingual speakers (L1Sp-L2Bas; L1Bas-L2Sp) scored between 1.67 and 2.33. Monolingual speakers acknowledged having more limited contact with Basque, since although they partially understood the language, and their in-laws, some friends, and people at work could occasionally address them in Basque, their interactions took place only in Spanish. All of them had tried to study Basque at some point in their adult lives. Official schooling in Basque started in the 1990s, so given that the average age of these speakers was 45–55, they only received formal education in Spanish. All of them emphasized the effort and difficulty that studying Basque as an adult meant for them.


**Table 7.** Degree of contact according to the speakers' language profile and gender.

However, as shown in Table 8 below, unlike in the urban areas, the degree of contact was not a relevant factor in explaining the differences in the production of falling interrogatives (*p* = 0.1). Three monolingual speakers and one L1Sp speaker whose degree of contact ranged between 1.22 and 1.44 produced all interrogatives (100%) in a falling contour. In contrast, the one L1Sp speaker with the highest degree of contact with Basque (2.33) performed only 78% of her interrogatives as falling.

<sup>4</sup> In Tables 6–10, language profiles are abbreviated as follows: Mon for Monolingual, L1Sp for L1 Spanish, and L1Bas for L1 Basque.


**Table 8.** Percentage of interrogative sentences ending in falling contours according to degree of contact and speakers' language profile.

With respect to the other relevant social factor in urban areas, the attitudinal component, certain differences were found between monolingual and bilingual speakers (L1Sp and L1Bas). As shown in Table 9, the former presented slightly less favorable attitudes towards Basque and the Basque ethnolinguistic group (1.67–2.00), while the latter showed very favorable attitudes (2.33–2.56). Monolingual speakers considered that Basque was not very useful for them professionally or in their daily life, and that languages such as English should be promoted as much as Basque. The bilingual groups, on the other hand, considered Basque useful professionally and personally, and supported it being a language spoken by all inhabitants in the Basque Country. These differences, nonetheless, were not statistically significant (*p* = 0.200).

**Table 9.** Attitudinal values according to the speakers' language profile.


However, no relation was found between the attitudes shown by the speakers and their production of falling contours in information-seeking absolute interrogatives (*p* = 0.300). Unlike in urban areas, where attitudes were clearly influential in the production of falling patterns (cf. Elordieta and Romera 2020a), this factor does not provide an explanation for the small differences in non-urban areas. That is, higher attitudinal values did not lead to higher occurrences of final falling contours. Three monolingual speakers with attitudinal values below 2.00 had 100% of information-seeking yes/no questions with a final descending contour, while bilingual speakers with attitudes above 2.00 showed lower percentages (cf. Table 10 below). It seems, therefore, that the production of falling intonational patterns in non-urban areas is conditioned by factors other than those governing urban areas.


**Table 10.** Percentage of interrogative sentences ending in falling contours according to attitudinal value and speakers' language profile.

#### **4. Two Factors: Type of Variety and Language as Index of Identity in Urban vs. Non-Urban Areas**

As mentioned before, Elordieta and Romera (2020a) showed that two social factors were relevant for explaining the variation among speakers in frequency of occurrence of final falling intonational configurations in information-seeking absolute interrogatives in urban areas (Bilbao and San Sebastian). These factors were the degree of contact with Basque and the attitudes towards this language and the Basque ethnolinguistic group. That study found that a close degree of contact with Basque and a positive attitude towards this language and the Basque ethnolinguistic group were associated with a higher percentage of occurrence of falling nuclear contours, which are typical of Basque but not typical of Castilian Spanish.

In the present study, however, no correlation between the production of descending nuclear contours and social factors was found in non-urban areas (in the small towns of Ibarra and Lekeitio). We suggest that the explanation can lie in two fundamental aspects: first, the lack of real variation among speakers in the production of falling nuclear contours, and second, as a consequence of this, the impossibility of it representing any of the ethnolinguistic groups in these areas.

Regarding the first aspect, in the non-urban towns of Lekeitio and Ibarra, there are very small differences among speakers in the production of the intonational patterns of information-seeking absolute interrogative utterances. As shown in Elordieta and Romera (2020a), in the cities of San Sebastian and Bilbao, there were more differences among speakers (there were speakers with a 63% occurrence of falling intonational patterns, and others could have up to 100%). However, in the case of Ibarra and Lekeitio, the variation was minimal. The production of falling interrogatives ranged between 85% and 100% (except for one speaker, who had 79%), and half of them had 100%. Figure 6 below shows these differences between urban and non-urban populations.

**Figure 6.** Percentage of interrogative sentences ending in a falling contour in urban and non-urban areas.

This lack of variation is characteristic of a process of settlement of a linguistic change (Blas-Arroyo and Lahoz 2018). Some synchronic social factors may help us establish that falling intonational contours at the end of information-seeking absolute questions are a feature of Basque Spanish in non-urban areas, where Basque has historically been the dominant language of use.

First, the generational group that was interviewed (35–55 years old) is the one that suffers the greatest social pressure to accommodate to a standard variety (Labov 2001). Information-seeking absolute interrogatives in central, standard Castilian Spanish are described to end in rising nuclear configurations, as already stated in the introductory section and in Section 2.1 of the present paper (cf. the references there). Our speakers have the opposite pattern, that is, (rising–)falling intonational contours at the end of such types of utterances. It thus seems that this feature is very stable in their variety. One could test the production of other generational groups, but as we know, younger and older generations (i.e., those between the ages of 18 and 30 and those above 60 years of age) do not have such a marked impact in the evolution of language change (Labov 2001). We therefore doubt that these groups can provoke substantial effects on the evolution of this feature.

Second, no gender differences were found in the production of falling yes/no questions. This reinforces the idea that such a feature is a settled one that does not respond to social variation.

A third argument that supports the view that falling intonational contours in absolute interrogatives are a stable feature of the variety of Spanish in non-urban towns is the historical and ongoing dominant presence of Basque. In the cities of Bilbao and San Sebastian, there is a much higher percentage of people for whom Spanish is the dominant language than in Ibarra and Lekeitio. In social terms, this means that in Bilbao and San Sebastian, the presence of the Spanish ethnolinguistic group is bigger than in Basque-dominant towns, such as Lekeitio and Ibarra. In these towns, there is little variation in the type of Spanish spoken. The variety there corresponds mainly to the vernacular variety spoken by L1 Basque speakers, the majority group in the area. The data provided by Eustat (2018) about the language used at home in 2016 reflect the overwhelming use of Spanish in the urban areas, as opposed to the dominance of Basque in the non-urban locations. As can be seen in Table 11 below, in Bilbao and in San Sebastian, the presence of Spanish at home is dominant, as opposed to Ibarra and Lekeitio, where the use of Basque at home amounts to almost 70%5.


**Table 11.** Number of speakers who used Basque at home in 2016 (Eustat 2018).

The variety of Spanish spoken in the Basque Country is the result of the long contact between these two languages (Elordieta and Romera 2020a). However, Spanish in urban areas is the result not only of the Spanish spoken by Basque speakers in the area, but also of the Spanish spoken by different groups who arrived in the Basque Country from other parts of Spain in the late 19th century and the 20th century (Zallo and Ayuso 2009). Rural areas, on the other hand, received less population, and Spanish speakers never outnumbered Basque speakers. Therefore, the Spanish spoken in non-urban areas is mostly the result of the variety of Spanish spoken by L1 Basque speakers, with a strong presence of Basque features. As we pointed out in Romera and Elordieta (2013), the transfer of features of one

<sup>5</sup> The data reported here correspond to the use of Basque at home. We are aware that these data might differ from the use of Basque in other contexts. Unfortunately, only the global knowledge of Basque for populations of more than 40,000 inhabitants is available at this point (Eustat 2018), and although these data reduce the differences between the two languages, Spanish is still dominant in urban areas (Bilbao 61.18% and San Sebastian 57% for the population over 16).

language to another can occur not only by direct influence, but also as an indirect process. When L1 Basque speakers communicated with the Spanish-speaking populations arriving in the area, they used a variety of Spanish with presence of features of their L1 (Basque). In order to communicate, the Spanish speakers accommodated their variety to the Spanish of the L1 Basque speakers as well, and as a result, L1 Basque Spanish became the common variety spoken by all. Consequently, linguistic features of Basque will be more consolidated in the Spanish spoken in non-urban areas due to the lesser presence of Spanish speakers.

This process is responsible for the absence of variation in the prosodic pattern of interrogatives in Lekeitio and Ibarra. The falling interrogative pattern is an almost entirely consolidated feature, which no longer differentiates ethnolinguistic groups in the area. All speakers in these towns use it indistinctly, and therefore, it is no longer a marker of a social group. Rather, it is an indexical feature of geographical location (Chambers and Trudgill 1998). This explains the absence of correlation with social factors, such as linguistic profile, gender, degree of contact, or linguistic attitudes, since the feature has become characteristic of the Spanish variety of the area. In urban areas, on the other hand, the nuclear configurations of absolute interrogatives show more variation (cf. Table 10 above), and there is a correlation with social factors. The feature is not only a geographical index, but also a marker that attaches the speaker to either the Spanish or the Basque ethnolinguistic group. Since linguistic features are capable of being perceived as representative of a social group, they also can be used as an expression of the ethnolinguistic identity of the person who uses them.

The concept of identity has been widely debated through different theoretical frameworks. However, there is more or less agreement in distinguishing an individual identity and a social or collective identity. The former refers to the individual consideration of oneself, whereas the latter entails the definition of the subject as belonging to a group (Bucholtz and Hall 2005; Spencer-Oatey 2007). We claim that social identity is involved in the explanation of our data. Following Butler (1990), Bucholtz and Hall (2005), and Spencer-Oatey (2007), among others, we consider that social identity is expressed, constructed, and shown to others through concrete ways of speaking and acting.

The identities put into play are adapted to the relationships, and the features that the individuals project of themselves are emphasized or minimized according to the context. Linguistic forms are a primary way of expressing identity. As Tejerina (1999), Echeverria (2003), and Baxok et al. (2006) point out, language is the main element on which Basque identity is built in the Basque Country. In urban areas, where the group of Basque speakers is smaller and there is a large group of Spanish speakers, the expression of linguistic features associated with Basque has a clear indexicalizing value that serves to claim Basque identity. The choice of such features has a clear differentiating function, and it is made precisely by those speakers whose attitudes towards Basque and the Basque ethnolinguistic group are more positive (Elordieta and Romera 2020a). Those who use it clearly align themselves with the Basque language and culture, and adopt it in order to be identified as Basque.

In contrast, non-urban areas share a variety of Spanish that already presents a strong presence of Basque features, and all the speakers ascribe to it regardless of their linguistic profile. Therefore, the Spanish language ceases to differentiate one group from the other. Identity-based adscription in these areas cannot be approached in the same terms as in urban areas. Basque linguistic features in Spanish have no possibility of being used as an indexicalizing element of Basque identity, and therefore, the speakers' linguistic profile, the degree of contact with Basque, and the linguistic attitudes cannot have a correlation with them. All the speakers interviewed in these areas presented a strong Basque identity, and differences of identity are probably expressed at other levels (Spanish, European, etc.) (Azurmendi and Bourhis 1998). We leave this point to further investigation in the future.

In summary, we can say that the linguistic features associated with the Basque language are essential for claiming identity in urban areas, and therefore, we find a positive relationship between the use of these and positive attitudes. On the contrary, in non-urban areas, the linguistic features of Basque do not exert an indexicalizing function of Basque identity, since all speakers use them equally. This explains their lack of correlation with the social factors analyzed in this study.

#### **5. Conclusions**

In Elordieta and Romera (2020a), we found that an average of 79% of information-seeking absolute interrogatives ended in circumflex (rising–)falling intonational configurations in the varieties of Spanish spoken in the Basque cities of Bilbao and San Sebastian in northern Spain. In that study, a correlation was also revealed between the frequency of occurrence of final falling contours and two social factors: degree of contact with Basque and the attitudes towards this language and the Basque ethnolinguistic group. A close degree of contact with Basque and a positive attitude towards this language and the Basque ethnolinguistic group were associated with a higher percentage of occurrence of falling nuclear contours. The present study had the goal of continuing with our knowledge of the intonational characteristics of Basque Spanish by focusing on data from two non-urban areas in the Basque Country, the towns of Lekeitio and Ibarra. The research question was whether the diachronic and synchronic presence of vernacular varieties of Basque in these small towns may lead to different results in the frequency of occurrence of final falling intonational contours and the correlations with social factors.

Two main findings can be reported from the present study. On the one hand, a higher average frequency of final falling intonational contours at the end of information-seeking absolute interrogatives was found in the non-urban towns of Ibarra and Lekeitio compared to Bilbao and San Sebastian (93.4% vs. 79%). The same tonal configuration as in Bilbao and San Sebastian was observed: L+(¡)H\* (H)L%. On the other hand, no correlation was observed between the production of descending nuclear contours and social factors. We suggested that the explanation of these differences between urban and non-urban areas lies in the lack of real variation among speakers in the production of falling nuclear contours. The variation is minimal, ranging from 85% to 100% final falling contours. Indeed, half of the speakers had 100% final falling configurations. The generational group that was interviewed (35–55 years old) should suffer the greatest social pressure to accommodate to a standard variety (Labov 2001), but these speakers hardly produced rising final contours as in Castilian Spanish. It is hence apparent that falling intonational contours at the end of information-seeking absolute questions are a solid and well-established feature of Basque Spanish in non-urban areas. We argue that the pervasive presence of a Basque intonational feature, such as a final fall in absolute interrogatives, must be due to the historical dominance of Basque as a language of use in non-urban areas. That is, L1 Basque Spanish (heavily influenced by Basque) became the common variety in non-urban areas. In urban areas, in contrast, Spanish is the dominant language. The variety of Spanish there shows influence from Basque, but other varieties of Castilian Spanish also came to be present with the arrival of immigrants from other parts of Spain.

In urban areas, the nuclear configurations of absolute interrogatives show more variation, and there is a correlation with social factors. Linguistic features are capable of being perceived as representative of a social group, and can be used as an expression of the ethnolinguistic identity of the person who uses them, understood in social or collective terms (cf. (Butler 1990; Bucholtz and Hall 2005; Spencer-Oatey 2007), among others). Indeed, Tejerina (1999), Echeverria (2003), and Baxok et al. (2006) hold that language is the main element on which Basque identity is built in the Basque Country. That is, apart from being an indexical feature of geographical location (Chambers and Trudgill 1998), using falling or rising final configurations in information-seeking absolute questions is also a marker that associates the speaker to either the Spanish or the Basque ethnolinguistic group.

In rural or non-urban areas, on the other hand, the prevalence almost exclusively of falling ends of information-seeking yes/no questions leads to the impossibility of identifying frequency of occurrences of such contours with any of the ethnolinguistic groups in these areas. All speakers in these towns use falling contours indistinctly, and therefore, final falls in these sentences are no longer a marker of a social group. This explains the absence of correlation with social factors, such as linguistic profile, gender, degree of contact, or linguistic attitudes.

**Author Contributions:** Conceptualization, M.R. and G.E.; methodology, M.R. and G.E.; validation, M.R. and G.E.; formal analysis, M.R. and G.E.; investigation M.R. and G.E.; resources, M.R. and G.E.; data curation, M.R. and G.E.; writing—original draft preparation, M.R. and G.E.; writing—review and editing, M.R. and G.E.; visualization, M.R. and G.E.; supervision, M.R. and G.E.; project administration, M.R. and G.E.; funding acquisition, M.R. and G.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been funded by the Ministry of Science and Innovation (grant number FFI2016-80021-P), the Basque Government (grant number IT1396-19) and the University of the Basque Country (grant number GIU18/221).

**Acknowledgments:** We are indebted to our speakers, without whom this work would not exist. We also wish to thank Varun DC Arrazola for help with the analysis of the data, and two anonymous reviewers for their positive and constructive comments. This paper also benefitted from valuable feedback provided by audiences at the following two conferences: *Monterey Bay Applied Linguistics Symposium 2019*, held at the University of California, Santa Cruz, 17 May 2019, and *I ALFALito "Dinámicas lingüísticas de las situaciones de contacto"*, held at the Universidad Autónoma de Madrid, 28–30 October 2019.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


online: https://www.eustat.eus/elementos/ele0014700/poblacion-de-la-ca-de-euskadi-por-el-municipio-d e-residencia-segun-el-sexo-y-la-lengua-hablada-en-casa/tbl0014755\_c.html (accessed on 11 October 2020).


Hualde, José Ignacio. 2005. *The Sounds of Spanish*. Cambridge: Cambridge University Press.


Zallo, Ramón, and Mikel Ayuso. 2009. *Conocer el País Vasco: Viaje al Interior de su Cultura, Historia, Sociedad e Instituciones*. Bilbao: Servicio Central de Publicaciones del Gobierno Vasco.

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Asymmetry and Directionality in Catalan–Spanish Contact: Intervocalic Fricatives in Barcelona**

#### **Justin Davidson**

**and Valencia**

Department of Spanish and Portuguese, University of California, Berkeley, CA 94720, USA; justindavidson@berkeley.edu

Received: 2 September 2020; Accepted: 11 November 2020; Published: 13 November 2020

**Abstract:** Multilingual communities often exhibit asymmetry in directionality by which the majority language exerts greater influence on the minority language. In the case of Spanish in contact with Catalan, the asymmetry of directionality, favoring stronger influence of Spanish as a majority language over Catalan, is complicated by the unique sociolinguistic statuses afforded to different varieties of Catalan. In order to empirically substantiate the social underpinnings of directionality in language contact settings, the present study examines the variable voicing and devoicing of intervocalic alveolar fricatives in Spanish, Barcelonan Catalan, and Valencian Catalan as processes that are historically endogenous and equally linguistically motivated in both languages. Intervocalic fricatives in both languages were elicited using a phrase-list reading task, alongside sociolinguistic interviews for attitudinal data, administered to 96 Catalan–Spanish bilinguals stratified by gender, age, and language dominance in Barcelona and Valencia, Spain. Patterns of sociolinguistic stratification consistent with community-level changes in progress favoring either Catalan-like voicing or Spanish-like devoicing varied by community, with a stronger influence of Catalan on Spanish in Barcelona and Spanish on Catalan in Valencia. These asymmetries, corroborated by attitudinal differences afforded to Catalan and Spanish in Barcelona and Valencia, ultimately reinforce the role of social factors in language contact outcomes.

**Keywords:** multilingualism; agentivity; directionality; fricative (de)voicing; Catalan–Spanish contact; sociophonetics

#### **1. Introduction**

Observations of asymmetry and directionality with regard to language contact effects have long been addressed in linguistics research, applicable both at the level of the individual multilingual speaker, as well as at the broader level of the multilingual speech community. With regard to individual-level effects, crosslinguistic influence between a speaker's first language (henceforth L1) and second language (henceforth L2) is characterized by unequal (i.e., asymmetric) effects by which the L1 more strongly influences the L2 (i.e., directionality of L1 to L2) (Winford 2005, p. 373). Indeed, various phonological models of production and perception of L2 speech (cf. Best 1995; Best and Tyler 2007; Escudero 2005; Flege 1995) posit that L1 categories directly mediate the variable acquisition of L2 categories, which together attempt to account for the persistence of an L2 accent despite relatively early and even prolonged exposure and usage of the L2 (among many, Bosch et al. 2000; Pallier et al. 1997; Flege 2002; Flege et al. 1997, 1995, 2006; Flege and Munro 1994; Guion et al. 2000). At the level of the multilingual speech community, asymmetry and directionality have been characterized along a probabilistic hierarchy of contact influence whereby a majority language is likely to exert greater linguistic influence on a minority language as resultant from an array of typical social differences across L1-speaker groups and the languages themselves, such as population size (e.g., greater number

of L1 speakers of the majority language), sociopolitical status (e.g., official status and linguistic capital afforded to majority language), sociocultural status (e.g., L1 speakers of the majority language as socioeconomically and culturally dominant), and language attitudes (e.g., more positive associations of power and linguistic vitality afforded to the majority language) (Thomason 2001, 2010; Thomason and Kaufman 1988).

Though the empirical investigation of crosslinguistic or contact influence has traditionally centered on cases of L1 to L2 directionality (or source language agentivity (Van Coetsem 2000)) or majority language to minority language directionality at the levels of the individual speaker and greater speech community, respectively, evidence of L2 to L1 directionality (or recipient language agentivity (Van Coetsem 2000)) and minority language to majority language directionality is robust. At the level of the individual speaker, for example, Flege (1987) found that French–English and English–French bilinguals developed a merged L1–L2 category with respect to the voiced onset time (henceforth VOT) of /t/, resulting, respectively, in a partially English-like L1-French /t/ and partially French-like L1-English /t/. Parallel cases regarding the VOT of English and Italian voiced stops by Italian–English and English–Italian bilinguals and the VOT of English stops (in addition to the first and second formant frequencies of select vowels) by L1-English L2-Korean bilinguals are respectively reported in MacKay et al. (2001) and Chang (2012), ultimately argued to evidence systematic phonetic interactions between the L1 and L2 categories in the shared phonetic sound space (Flege 2002). At the level of the speech community, L2 influence on an L1 is most predominantly documented with respect to lexical borrowing or the innovation of loanwords (Winford 2010). Cases of L2 influence in non-lexical domains, or structural borrowing (see, for example, Sanchez (2008)), have been posited to be either less common (Thomason and Kaufman 1988), highly constrained by the languages' grammars (Silva-Corvalán 1986), or perhaps altogether unattested (Poplack and Levey 2010).1 Accordingly, to better address these asymmetries with regard to bidirectional (i.e., L1 to L2 and L2 to L1) contact effects, the present study explores a unique case of sociophonetic variation across bilingual speakers of Catalan and Spanish hailing from communities of distinct sociolinguistic status and language attitudes, operationalized with respect to measures of linguistic vitality and (c)overt associations of power and solidarity. This, alongside the selection of a phonetic variable equally motivated to appear in either language, permits an innovative analysis of the social underpinnings of community-level linguistic variation and change in multilingual communities.

#### **2. Catalan and Spanish in Barcelona and Valencia**

The sociopolitical histories between Spanish and Catalan involve centuries-old contact between the two languages, ultimately culminating in an 18th century shift from the previous state of societal monolingualism in Catalan (as a national language) to the declaration of Spanish as the sole language of the state, and indeed the compulsory acquisition of Spanish through public education in the 19th century (Vallverdú 1984, pp. 19–21; Vila-Pujol 2007, pp. 62–63). The rise of Spanish hegemony over Catalan reached a peak during Spain's fascist dictatorship under General Francisco Franco from 1939 until his death in 1975, during which legislation was actively passed to eliminate or otherwise Castilianize all non-Spanish institutions, as well as outlaw Catalan and other non-Spanish languages in the public sphere (Newman et al. 2008, p. 307; Turell Julià 2000, p. 47; Vallverdú 1984, p. 24; Vila-Pujol 2007, p. 64). The restoration of Catalan as a co-official language in the Autonomous Communities of Valencia, Catalonia, and the Balearic Islands came as a product of Spain's 1978 Democratic Constitution, shortly after which (in 1983) the Law of Linguistic Normalization and the Use and Teaching of Valencian

<sup>1</sup> The polemic status of structural borrowing is rooted in competing viewpoints regarding language-internal (or endogenous) and language-external (or contact-induced) factors, which fall outside the scope of the present paper. See Thomason (2008) and references therein for a fuller discussion of these arguments.

Act (respectively for Catalonia and Valencia) restored Catalan as a vehicle for public education (Huguet 2006, p. 150; Newman et al. 2008, pp. 306–7; Vann 1999, pp. 317–18).

Despite the restoration of Catalan as a language of (co-)official status, the sociolinguistic trajectories of Catalan in Catalonia and Valencia have shown considerable degrees of divergence. In Catalonia, thanks in part to ample efforts on behalf of the local government and media to consistently promote Catalan's strong expansion throughout the public and legislative sectors (Pradilla 2001, pp. 63–65), Catalan is readily characterized as the language of local political and economic power, with Spanish being associated with the lower socioeconomic class and immigrant communities (Siguan 1988, p. 454; Sinner 2002, p. 161). A longitudinal series of language attitude research featuring the matched guise technique (Woolard 1984, 1989, 2009, 2011; Woolard and Gahng 1990; Newman et al. 2008) since the 1980s has shown that positive associations of the Catalan language, and even a Catalanized accent in Spanish, commonly index a bilingual, expressly Catalonian identity, tied overtly and covertly to attributes of solidarity in the community (Davidson 2019). Barcelona (city) 2011 census data show that self-reported competence in Catalan for understanding, reading, speaking, and writing are respectively 95%, 79%, 72%, and 53% (Institut d'Estadística de Catalunya 2014), which have steadily increased since the 1980s and reflect the considerable degree of linguistic vitality of this minority language (Pradilla 2001, p. 62).

The status of Valencian Catalan, on the other hand, contrasts rather directly with that of Barcelonan Catalan. Since 1995, efforts to restore the administrative and ideological status of Catalan to match (or even surpass) that of Spanish in Valencia have been actively curtailed by a series of conservative political party leaders who have aligned themselves with a group of pro-Spanish, Valencian elites that gained considerable power and wealth during the Franco regime (Casesnoves Ferrer 2010, pp. 479–80; Casesnoves Ferrer and Sankoff 2004, p. 2; Pradilla 2001, pp. 68–69). A highly successful propaganda campaign was launched against (Catalonian) Catalan, based off the fear that the growing Catalonian independence movement would subsume the Valencian state. Beyond disparaging ties to Catalonia and its speakers, this campaign additionally positioned Valencian as a completely unrelated language from (Catalonian) Catalan, which served to fuel a pro-Valencian (and specifically anti-Catalan) movement that was ideologically aligned with Spanish and the nation-state as symbols of anti-Catalan-ness, rather than Valencian (Casesnoves Ferrer 2010, p. 480; Pradilla 2001, pp. 69–70).<sup>2</sup> Under this campaign, Valencian has rarely been used in administrative contexts, and the once-thriving *Canal 9* Valencian TV station was shut down in 2012 (Pradilla 2001, p. 69). Matched guise research in Valencia has found that whereas positive, local affiliations of solidarity were originally (in 1998) afforded to Valencian, in 2008, these were newly afforded to Spanish in the capital city of Valencia (Casesnoves Ferrer 2010, p. 486). The 2011 census data for the aforementioned self-reported competences for Valencian Catalan in understanding, reading, speaking, and writing in the city of Valencia are respectively 89%, 61%, 48%, and 61% (Generalitat Valenciana 2011), which, when compared to the corresponding aforementioned census data for Barcelona, notably lag behind the most in terms of speaking competence.

Accordingly, Barcelona and Valencia present two unique sociolinguistic and sociopolitical realities for the same language contact pairing between Catalan as a minority language and Spanish as a majority language. While linguistic differences between these contact settings are not unilaterally determined from their distinct social contexts, their comparison nonetheless facilitates an empirical assessment of the contributions of these social differences to linguistic outcomes as concerns the notions of directionality and asymmetry of contact influence.

<sup>2</sup> Anecdotally, the visual landscapes of modern Valencia and Barcelona are quite telling. From my travels in 2018, the hanging of a *senyera* (the Catalan flag of nationhood and independence) off one's balcony has become an extremely prevalent practice in Barcelona. In Valencia, the analogous Valencian flag can only rarely be found, hidden amidst a sea of (national) Spanish flags adorning the balconies of the city's thoroughfares.

#### **3. Alveolar Fricatives in Spanish and Catalan**

North-Central Peninsular Spanish features an apical-alveolar voiceless /s/, articulated with a gesture of the tongue-tip toward the alveolar ridge (Hualde 2014, p. 147; Martínez Celdrán and Planas 2007, p. 110; Quilis 1981, pp. 234–35). In monolingual Spanish varieties that do not exhibit aspiration or deletion of /s/ in pre-consonantal positions, such as North-Central Peninsular Spanish (e.g., Barcelonan Spanish and Valencian Spanish), two allophones of /s/, namely voiceless [s] and voiced [z], are prescriptively found in complementary distribution via regressive assimilation of voicing to the following consonantal segment. Before voiced (semi)consonants, /s/ is realized as [z] (e.g., *rasgo* [ráz.ܵo] 'feature'; *mis hierbas* [miz.ݯéݐ.βas] 'my herbs'), whereas in all other contexts, /s/ is produced as [s] (e.g., *rasco* [ráz.ko] 'I scratch'; *casa* [ká.sa] 'house'; *monos* [mó.nos] 'monkeys') (Hualde 2014, pp. 154–55; Morgan 2010, p. 248). Accordingly, monolingual Spanish productions of [z] outside of the context of a following voiced (semi)consonant (e.g., the intervocalic context in particular) are prescriptively disallowed:

"*La s sonora aparece únicamente, en nuestra lengua, en posición final de sílaba, precediendo inmediatamente a otra consonante sonora; en cualquier otra posición su presencia es anormal y esporádica*" [The voiced /s/ in our language appears solely in syllable-final position immediately preceding another voiced consonant; in any other position, its presence is abnormal and sporadic]. (Navarro Tomás 1918, p. 83)

In contrast to Spanish, Catalan features two apical-alveolar fricative phonemes, voiceless /s/ and voiced /z/. This phonemic voicing contrast is active word-initially and word-medially, producing minimal pairs such as *zel* 'zeal' [zε´ܽ] / *cel* 'sky' [sε´ܽ] and *pesar* 'to weigh' [pˬ.zá] / *passar* 'to pass' [pˬ.sá]. Critically, this phonemic voicing contrast is neutralized word-finally, resulting in [s] or [z] depending on the voicing feature of the following segment (that is, the voicing neutralization of word-final Catalan alveolar fricatives (and, in fact, all Catalan sibilants) resolves by means of anticipatory assimilation). When followed by a voiced segment, such as a vowel, the word-final fricative is systematically voiced (e.g., *gos* [s] 'dog'; *gos estrany* [z] 'strange dog') (Hualde 1992, pp. 371–72, 393–94; Hualde and Prieto 2014, p. 109; Recasens 2014, pp. 239–40; Wheeler 2005, pp. 147–49, 162).

Accordingly, voiced intervocalic fricatives in Catalan are resultant from word-initial /z/, word-medial /z/, and as a product of voicing assimilation of word-final prevocalic /s/ and /z/ (or archiphoneme /S/)). This accordingly sets up an interesting pair of opportunities for bidirectional contact influence contingent on syllable position. With respect to syllable-initial contexts, productions of Spanish *pesar* 'to weigh' or *casa* 'house' as [pe.záݐ [and [ká.za] on the part of an L1-Catalan speaker could evidence the transfer of a Catalan phoneme (/z/) into Spanish, whereas productions of Catalan *pesar* 'to weigh' or *casa* 'house' as [pˬ.sá] and [ká.sˬ] on the part of an L1-Spanish speaker could evidence the substitution of Spanish /s/ for Catalan /z/, potentially eliminating the phonemic voicing contrast in Catalan. With respect to word-final contexts, the production of Spanish *las albas* 'the dawns' as [la.zál.βas] by an L1-Catalan speaker or the production of Catalan *les albes* 'the dawns' as [lˬ.sál.βˬs] by an L1-Spanish speaker would constitute a case of largely phonetic, rather than phonemic, transfer (i.e., the respective transfer of a Catalan or Spanish phonotactic voicing rule, which would not create or eliminate any phonological contrasts).<sup>3</sup>

Notably, though the phonological voicing contrast between Catalan /s/ and /z/ is a feature of the prescriptive, standardized academy norms for both Barcelonan Catalan (Julià i Muné 2008, pp. 66–67) and Valencian Catalan (Real Acadèmia de Cultura Valenciana 2000; Acadèmia Valenciana de la Llengua

<sup>3</sup> Though Catalan /z/ is sometimes framed as a novel L2 category for L1-Spanish learners to acquire (Carrera-Sabaté et al. 2016, p. 48), the existence of Spanish [z] before voiced consonants suggests that, rather than a case of foreign category acquisition, the present study instead entails the acquisition of novel phonotactic structure, wherein [z] is to appear in non-Spanish contexts (e.g., syllable-initially (contrastive with /s/) and prevocalic word-finally (non-contrastive with /s/)).

2006, p. 29), select oral vernaculars of Barcelonan and Valencian Catalan have been characterized as having lost the voicing contrast in favor of exclusively voiceless intervocalic alveolar sibilants. In a sociophonetic investigation of *xava* Catalan, a Barcelonan sociolect originally associated with the L1-Spanish-speaking working class, Ballart (2013, p. 145) finds that /z/ is realized as [s] with a frequency of 15% by L1-Catalan speakers, in comparison to the 58% rate of [s] production exhibited by L1-Spanish speakers. For Valencian Catalan, the regional vernacular known as *apitxat* is similarly characterized as lacking voiced /z/ (Prieto 2004, p. 216; Moll 2006, p. 109), deemed *no recomanable* ("not recommendable") by the Valencian Academy of Language (Acadèmia Valenciana de la Llengua 2006, p. 29). Ultimately, since prescriptive academy norms do not accurately reflect real language use, the existence of *xava* and *apitxat* do not hinder the present investigation concerning intervocalic fricative production in Barcelona and Valencia, and instead are indicative of the pervasive reality of sociolinguistic variation at even the phonological level, which I aim to expressly link to select social and linguistic factors. Indeed, it is unlikely that *apitxat* Catalan exhibits a truly categorical absence of /z/ (despite dialectological entries that insist on the absence of /z/ in this variety), and instead is more likely, as attested by Ballart (2013) for *xava* Catalan, to exhibit variability that is socially and linguistically conditioned.

The selection of intervocalic fricatives in Catalan and Spanish for the present study is motivated by the variable voicing and devoicing of Romance fricatives as "natural" and "unremarkable" processes both historically and synchronically (Hualde and Prieto 2014, p. 111). The voicing of intervocalic /s/ to [z] can be characterized as a product of lenition, modeled within a framework of gestural phonology (cf. Browman and Goldstein 1991) as a reorganizing or even undershooting of glottal gestures (e.g., vocal fold abduction) necessary to restrict voicing for [s] while permitting it for the adjacent vowels. As for the devoicing of /z/, the demands for maintenance of a turbulent airstream for sufficient strident frication and the maintenance of voicing are in aerodynamic opposition, which can be resolved with the loss of voicing (Hualde and Prieto 2014, p. 111; see also Ohala 1983, pp. 201–2). The voicing and devoicing of intervocalic sibilants in Romance (e.g., Latin /kása/ > Old Spanish /káza/ > Modern Spanish /kása/ (Penny 2002, pp. 98–103)) accordingly constitute variable processes that are each equally endogenously motivated in Catalan and Spanish, which facilitates the assessment of potential differences in the directionality and asymmetry of contact influence in the present case of Catalan–Spanish contact as all the more reflective of non-linguistic (i.e., social) factors.

#### **4. Research Methodology**

The subject population for this study consists of 96 Catalan–Spanish bilinguals, stratified equally by each of gender (male vs. female), age (18–30 vs. 45–60), language profile (L1-Catalan vs. L1-Spanish), and community (Barcelona vs. Valencia). This research was approved by the UC-Berkeley IRB, under protocol # 2016-06-8891. Following the Variationist Sociolinguistic framework (Labov 2001; Tagliamonte 2012), gender stratification, wherein female speakers are likely to use variants with overt negative social stigma less than their male counterparts in cases of stable variation or ongoing change from above4, is a social constraint that is highly relevant for investigating L1 and L2 differences in the use of an overtly proscribed variant (as is the case for each of Catalan /z/ and Spanish /s/). Along the same vein, age is included in order to assess potential change in progress via generational differences via the apparent time construct (Bailey 2004; Ballart et al. 1991; Chambers 2004). Notably, when applying this methodological construct, patterns of social stratification (especially age and gender) observed in synchronic data are interpreted to evidence possible diachronic trends (i.e., eventual language change which, in the present, is characterized as a potential change in progress), with the understanding that

<sup>4</sup> Changes from above and changes from below, following Labov (2001, pp. 272–74, 279), respectively refer to the community-wide, gradual adoption of a linguistic variant that either is or is not overtly proscribed. Accordingly, the adoption of Spanish [s] and/or Catalan [z] would constitute a change from above, whereas the adoption of Spanish [z] and/or Catalan [s] would constitute a change from below.

"not all variability and heterogeneity in language structure involves change; but all change involves variability and heterogeneity" (Weinreich et al. 1968, p. 188).

With regard to language profile, participants in the present study are grouped according to first language (matched with parents' L1 and the language in the home so as to avoid complications with using the labels "L1" and "L2" with early simultaneous bilinguals (e.g., L1A–L1B)) and self-reported current estimates of typical language use, since, as was previously discussed, functional or practical bilingualism in both languages is widespread in both communities. Table 1 displays the general distribution of the 96 speakers recruited for this study.



Five test instruments were administered to each of the 96 participants. The first test instrument is a sociodemographic questionnaire containing 22 questions used to screen participants according to the social criteria outlined in Table 1.

The second and third test instruments are a pair of recorded phrase-list readings in Catalan and Spanish that elicit self-monitored speech. In each language, subjects were asked to read aloud, using their best Catalan or Spanish pronunciation, a series of 60 target words (all cognates across the languages) with intervocalic Spanish /s/, intervocalic Catalan /z/, and prevocalic word-final Catalan /S/. Target items were stratified according to two linguistic factors across the languages, namely word position (word-medial vs. prevocalic word-final) and syllable stress (unstressed vs. stressed). Word position was included to assess phonotactic variability produced in each language, since the word-medial context in Catalan is the site of phonemic voicing contrast, as opposed to the prevocalic word-final context in which voicing is the result of phonemic neutralization and anticipatory assimilation. The motivation for the inclusion of syllable stress is grounded in the concept of local hyper-articulation for stressed syllables, or the notion that the speaker may reduce otherwise expected effects of gestural overlap with a neighboring segment across stressed syllables, since these kinds of syllables have longer durations and allow the speaker to better time-articulatory gestures independently of one another (Browman and Goldstein 1991; Hualde 2014, p. 251). More concretely, this would suggest that fricative tokens in a syllable with nuclear stress would be the most resistant to voicing as an effect of the greater opportunity (across stressed syllables) for the successful coordination of vocal fold abduction for voiceless [s] relative to the vocal fold adduction gesture of the adjacent nuclear vowel. Token stratification according to word position and stress yielded four cells (word-medial, stressed: *hombre casado*/*home casat* 'married man'; word-medial, unstressed: *cosa gigante*/*cosa gigant* 'huge thing'; prevocalic word-final, stressed: *compras agua* / *compres aigua* 'you buy water'; prevocalic word-final, unstressed: *las amigas*/*les amigues* 'the friends') of 15 tokens each (per language), which were mixed amongst a set of 60 distractor tokens in each language that did not contain intervocalic fricatives.

The fourth and fifth test instruments consist of a pair of 20-min sociolinguistic interviews in each of Catalan and Spanish, in which participants were asked to discuss their opinions on questions of language identity, the status of Spanish and Catalan in their communities, and issues of linguistic vitality for each language. The interviews accordingly elicited attitudinal data to corroborate sociolinguistic and sociopolitical differences between Catalan varieties in the two communities of study.

Each participant was recorded individually during one experimental session lasting approximately one hour. In order to limit the effects of language mode (Grosjean 2001), given that bilinguals produced Spanish and Catalan speech during a single interview session, the interview session was strictly divided in two parts, namely an L1 portion followed by an L2 portion. The sociodemographic questionnaire was given in each participants' L2, after the L1 tasks (interview and subsequent word reading) and before the L2 tasks (interview and subsequent word reading), providing a buffer of approximately 15 min between language tasks to allow participants to switch from their L1 to their L2. Participants were recorded using an SE50 Samson head-mounted condenser microphone and an H4n Zoom digital recorder (sampling at 44,100 Hz) in an empty classroom at the *Universitat de Barcelona* or *Universitat Pompeu Fabra*, or in a private office at the *Universitat de València*.

Regarding the acoustic analysis of intervocalic fricative tokens, in order to calculate voicing durations for each fricative segment, fricative boundary segmentation was performed manually in Praat by marking left and right boundaries for each segment by using both the waveform and spectrogram to find the zero-intercept in the waveform closest to the first and last signs of aperiodic noise (File-Muriel and Brown 2011, pp. 227–28; Rohena-Madrazo 2015, pp. 298–99). Once intervocalic fricative segments were segmented, exact voicing durations were measured as proportions of each fricative segment that exhibited each of a fundamental frequency (that is, a pitch track), a voice bar at the bottom of the spectrogram, and glottal pulses, with the viewing window exactly twice the size of and centered on the fricative segment (Campos-Astorkiza 2014, p. 21; Gradoville 2011; Hualde 2014, pp. 48–53; Rohena-Madrazo 2015, pp. 298–99; Schmidt and Willis 2011, p. 6; Torreira and Ernestus 2012).5 Example spectrograms illustrating less voiced and more voiced realizations of intervocalic fricative tokens in Catalan and Spanish appear as Figures 1–4.

**Figure 1.** Younger L1-Catalan female rendition of *la ca*/*z*/*a petita* ('the little house') in Valencian Catalan (~9% voiced).

<sup>5</sup> The manual calculations of segments' proportions of voicing were verified with Praat's voice report automated algorithm, though gross discrepancies between the manual calculation and voicing report were resolved in favor of manual calculation, following Gradoville (2011, pp. 69–71).

**Figure 2.** Younger L1-Spanish male rendition of *caminarà*/*S*/ *aquí* ('you will walk here') in Barcelonan Catalan (~100% voiced).

**Figure 3.** Younger L1-Spanish female rendition of *chicas aburridas* ('bored girls') in Valencian Spanish (~6% voiced).

**Figure 4.** Older L1-Catalan male rendition of *bebía*/*s*/ *alcohol* ('you drank alcohol') in Barcelonan Spanish (~100% voiced).

The phrase-list reading tasks in Catalan and Spanish each elicited 5760 intervocalic fricative tokens, yielded 11,520 tokens in total. The relatively few tokens with notable speaker disfluencies (principally pauses between words for prevocalic word-final fricatives) were discarded from analysis, leaving 5654 Catalan tokens and 5635 Spanish tokens. A kernel density plot of all fricatives' voicing proportions per language appears in Figure 5, which evidences a bimodal distribution of voicing proportions.

**Figure 5.** Kernel density plot of Catalan and Spanish intervocalic fricative voicing proportions.

A test of bimodality was conducted in R (R Core Team 2020) using the 'modes' package, which calculates a bimodality coefficient for each language's distribution of voicing proportions ranging from 0 (completely unimodal) to 1 (completely bimodal), for which coefficients greater than 0.555 indicate a bimodal distribution, and coefficients less than or equal to 0.555 indicate a unimodal distribution. The coefficients for Catalan and Spanish fricatives were, respectively, 0.736 and 0.638, indicating bimodal distributions in both languages favoring productions with voicing proportions near either 0% or 100%, with significantly fewer in the middle range of the proportional continuum. Interpreting these modes as articulatory targets for either voiceless [s] or voiced [z], the data were subsequently coded categorically as either [s] for voicing proportions within the range of 0% through 20%, or [z] for voicing proportions within the range of 80% through 100%. This categorical treatment of the bimodal voicing distributions, in line with Campos-Astorkiza (2014), yielded a grand total of 4732 Catalan fricatives and 4578 Spanish fricatives for subsequent statistical analysis (or ~49 Catalan tokens and ~48 Spanish tokens per speaker).

#### **5. Results**

#### *5.1. Intervocalic Alveolar Fricative Production*

Two mixed-effects logistic regression models (one for Barcelonan data and one for Valencian data) were performed in R (R Core Team 2020) using voicing ([s] vs. [z]) as the dependent variable with treatment contrasts, testing for fixed effects of three linguistic factors (language (Spanish vs. Catalan), word position (medial vs. pre-vocalic word-final), and stress (stressed vs. unstressed)) and three social factors (language profile (L1-Spanish vs. L1-Catalan), gender (male vs. female), and age (older vs. younger)). Interaction terms between language profile, language, and each of all the other independent variables were included in order to assess if any of the remaining effects varied significantly according to language and/or whether the language was the L1 or L2 of each speaker. Individual speaker and token (or word) were included as random effects in both models, for which the alpha level was manually adjusted to 0.025 in order to compensate against Type I errors.

The results of each logistic mixed-effects regression appear in Tables 2 and 3 (note that positive and negative β coefficients respectively indicate greater or lesser log-odds of [z] production relative to the intercept). Given the complex nature of these models, I shall elaborate on them separately, offering additional information and post-hoc analyses as necessary for each finding.


**Table 2.** Summary of the mixed-effects logistic regression model fitted to Barcelonan fricatives.

\* The intercept is older, L1-Spanish males producing stressed, word-medial fricatives in Spanish.



\* The intercept is older, L1-Spanish males producing stressed, word-medial fricatives in Spanish.

To begin, I focus on attested social constraints on Catalan and Spanish [z] production in Barcelona and Valencia. With respect to language profile, Tukey post-hoc analyses6 performed on the significant two-way interactions between language profile and language in both communities revealed that while [z] production in Barcelona and Valencia was significantly favored in Catalan over Spanish (*p* < 0.0001

<sup>6</sup> Post-hoc analyses were conducted using the 'emmeans' package (which automatically uses a logit response scale when applied to logistic regression models) with Tukey *p*-value adjustments of multiplicity.

for each community) and by L1-Catalan speakers over L1-Spanish speakers (*p* < 0.0001 for each community), the magnitude of effect for language profile was stronger for Catalan fricatives relative to Spanish fricatives (*p* < 0.0001 for each community). Figures 6 and 7 visualize these differences in Barcelona and Valencia, respectively, and additionally depict the observed categorical favoring of [z] in the Catalan of L1-Catalan speakers in Barcelona. Note that in all subsequent figures, the use of three asterisks denotes comparisons for which *p* < 0.0001, whereas the use of two asterisks denotes comparisons for which *p* < 0.001.

**Figure 6.** Effects of language profile and language on Barcelonan fricative production.

**Figure 7.** Effects of language profile and language on Valencian fricative production.

With respect to gender, Tukey post-hoc analyses on the significant three-way interaction in Barcelona and Valencia between gender, language profile, and language revealed unique stratifications for each community. For Barcelonan bilinguals, whereas [z] production is favored by females over males in Spanish (for L1-Spanish speakers, *p* < 0.0001; for L1-Catalan speakers, *p* < 0.0001), in Catalan, the parallel gender effect is exclusively present for L1-Spanish speakers (*p* < 0.0001), since L1-Catalan speakers display a categorical favoring of [z] across genders (*p* > 0.999). For Valencian bilinguals, whereas no significant gender stratification is attested in Spanish (for L1-Spanish speakers, *p* = 0.749; for L1-Catalan speakers, *p* = 0.768), in Catalan, [z] is favored by males exclusively for L1-Spanish

speakers (for L1-Spanish speakers, *p* < 0.0001; for L1-Catalan speakers, *p* = 0.816). These stratifications are visualized for Barcelona and Valencia in Figures 8 and 9, respectively.

**Figure 8.** Effect of gender as mediated by language profile and language on Barcelonan fricative production.

**Figure 9.** Effect of gender as mediated by language profile and language on Valencian fricative production.

With respect to age, Tukey post-hoc analyses on the significant three-way interaction in Barcelona and Valencia between age, language profile, and language revealed unique stratifications for each community. For Barcelonan bilinguals, whereas [z] production is favored by younger speakers over older speakers in Spanish (for L1-Spanish speakers, *p* < 0.0001; for L1-Catalan speakers, *p* < 0.0001), in Catalan, the parallel age effect is exclusively present for L1-Spanish speakers (*p* < 0.0001), since L1-Catalan speakers display a categorical favoring of [z] across age groups (*p* > 0.999). For Valencian bilinguals, whereas no significant age stratification is attested in Spanish (for L1-Spanish speakers, *p* = 0.682; for L1-Catalan speakers, *p* = 0.704), in Catalan, [z] is exclusively favored by older, L1-Spanish bilinguals (for L1-Spanish speakers, *p* < 0.0001; for L1-Catalan speakers, *p* = 0.757). These stratifications are visualized for Barcelona and Valencia in Figures 10 and 11, respectively.

**Figure 10.** Effect of age as mediated by language profile and language on Barcelonan fricative production.

**Figure 11.** Effect of age as mediated by language profile and language on Valencian fricative production.

As regards linguistic constraints on Barcelonan and Valencian intervocalic fricative production, Tukey post-hoc analyses performed on the pair of significant two-way interactions between word position and each of language profile and language revealed parallel trends in each community. In Barcelona, whereas Spanish [z] production is significantly favored in prevocalic word-final contexts over word-medial contexts (for L1-Spanish speakers, *p* < 0.0001; for L1-Catalan speakers, *p* < 0.0001), Catalan [z] production is not constrained by word position (for L1-Spanish speakers, *p* = 0.739; for L1-Catalan speakers, *p* > 0.999). In Valencia, whereas Spanish [z] production is significantly favored in prevocalic word-final contexts over word-medial contexts (for L1-Spanish speakers, *p* < 0.0001; for L1-Catalan speakers, *p* < 0.0001), Catalan [z] production is not constrained by word position (for L1-Spanish speakers, *p* = 0.684; for L1-Catalan speakers, *p* = 0.703). These stratifications are visualized for Barcelona and Valencia in Figures 12 and 13, respectively, which additionally illustrate the near-categorical absence of [z] tokens in Spanish word-medial contexts across all bilingual participants.

**Figure 12.** Effect of word position as mediated by language profile and language on Barcelonan fricative production.

**Figure 13.** Effect of word position as mediated by language profile and language on Valencian fricative production.

Lastly, with respect to stress, a significant three-way interaction between stress, language profile, and language was attested for Barcelonan bilinguals, whereas a main effect of stress was obtained for Valencian bilinguals. Tukey post-hoc analyses on the significant three-way interaction revealed that whereas [z] production is favored in unstressed contexts over stressed contexts in Barcelonan Spanish (for L1-Spanish speakers, *p* < 0.001; for L1-Catalan speakers, *p* < 0.001), in Barcelonan Catalan, the parallel stress effect is exclusively present for L1-Spanish speakers (*p* < 0.0001), since L1-Catalan speakers display a categorical favoring of [z] across age groups (*p* > 0.999). In Valencia, productions of [z] are similarly favored in unstressed contexts over stressed contexts, independent of language and language profile (*p* < 0.001 for all comparisons). This constraint is visualized for Barcelona and Valencia in Figures 14 and 15, respectively.

**Figure 14.** Effect of stress as mediated by language profile and language on Barcelonan fricative production.

**Figure 15.** Effect of stress as mediated by language profile and language on Valencian fricative production.

#### *5.2. Language Attitudes*

While the majority (95%) of all participants expressed an appreciation of the existence of bilingualism and co-officiality in their respective communities, differences between language attitudes in Barcelona and Valencia largely related to speakers' views toward actively using and promoting Catalan. For example, whereas 81% of Barcelonan participants expressed a desire for their (eventual, hypothetical) children to learn and use Catalan, only 40% of Valencian participants expressed the same desire. As even the Valencian group of L1-Catalan speakers reported a predominance in the use of Spanish over Catalan in their daily lives (refer back to Table 1), perhaps it is unsurprising that the majority (60%) of Valencian participants noted that it was perfectly acceptable to live life in Valencia without even knowing Catalan, and that while it would be nice if their children learned the language, they would not predominantly communicate with them in Catalan.

With regard to language and identity, 100% of Barcelonan participants expressed an association between being Catalan and either understanding the Catalan language or having an appreciation for it. For example, one of the younger female (L1-Catalan) participants noted that "there are many Catalans that choose not to use Catalan, but at least they can understand it and appreciate its presence."

In contrast, 33% of Valencian participants indicated that Valencian identity was tied more strongly to Spanish than to Valencian Catalan, serving to distinguish Valencia from Catalonia: "We put out [on our balconies] Spanish flags and use Spanish to show that in Valencia, we don't reject Spanish like the Catalans do" (Older L1-Spanish Female). Valencian identity as a question of anti-Catalan-ness (via the support of Spanish), rather than as one of Valencian Catalan, is additionally evidenced in the derogatory labeling of overtly pro-Valencian-language individuals as *catalanistas* 'Catalan nationalists': "I've been to Barcelona before, and if you go into a store and speak in Catalan, they either respond in Catalan or in Spanish, but you don't have to change how you speak. Here in Valencia, if you walk into a store speaking Valencian, they'll usually ask you to switch to Spanish, and if you refuse, you're seen as a *catalanista*" (Younger L1-Catalan Male).

When asked if Catalan and Valencian were two different languages, 98% of Barcelonan participants responded negatively, affirming their relationship as related dialects. In Valencia, however, 27% believed Valencian to be an independent language from Catalan. Barcelonan participants were wholly unaware of any conflict regarding the status of Barcelonan Catalan and Valencian Catalan as unique languages, instead noting that Catalan is sometimes wrongly thought to be a dialect of Spanish by outsider, non-Catalonians. Valencian participants, in contrast, were readily able to contextualize the Catalan–Valencian debate within local Valencian politics, noting that it is a point of contention more so for politicians than for the actual Valencian public.

#### **6. Discussion**

The patterns of social and linguistic stratification attested for the voiced or voiceless quality of intervocalic fricatives in Barcelonan and Valencian Catalan and Spanish are consistent with unique directionalities and asymmetries of contact influence across these two communities.<sup>7</sup> First, with respect to Barcelonan Catalan and Spanish, evidence in support of Catalan's phonetic influence on Spanish in the form of (prescriptively) non-standard [z] production consists of the observed stratification by language profile, whereby Spanish [z] was favored by L1-Catalan speakers over L1-Spanish speakers. Notably, across both profiles of speaker, Spanish [z] production was nearly categorically constrained by word position, with Spanish [z] appearing nearly singularly in the prevocalic word-final context as opposed to word-medial contexts, the site of phonemic voicing contrast in Catalan. Though the lenition of intervocalic Spanish /s/ (to [h] or [Ø]) in monolingual varieties has similarly been found to be favored word-finally over onset contexts (cf. Hualde and Prieto 2014; Chappell and García 2017; Torreira and Ernestus 2012), the presently observed magnitude of word position effect, categorical for L1-Catalan speakers and near-categorical for L1-Spanish speakers, has not been attested for monolingual Spanish varieties. Moreover, a matched guise study concerning Barcelonan Spanish [z] by Davidson (2019, p. 67) reveals that this feature is covertly associated with Catalan bilingualism within the local bilingual speech community. Taken together with the linguistic stratification by stress (favoring [z] in unstressed contexts), Barcelonan Spanish [z] illustrates a confluence of both endogenous and contact-induced constraints. The additional social stratifications attested for Barcelonan Spanish [z], namely its favoring by younger female speakers, is consistent with a change in progress from below (cf. Labov 2001). In the prevocalic word-final context, younger L1-Catalan females produced [z] at a rate of 74%, which, given the self-monitored nature of the elicited production task, likely undershoots actual [z] production in more casual and spontaneous (or natural) contexts. Accordingly, younger L1-Catalan females lead in the production of Barcelonan Spanish [z] as a majority variant (prevocalic word-finally).

<sup>7</sup> For the assessment of contact effects, I adopt Thomason (2010, 2008, 2001) more flexible treatment of contact-induced innovation as any case in which a linguistic variant is predicted to be more likely to have arisen in the setting of language contact than in a non-contact setting, which is justified or operationalized with respect to sensitivity to specific linguistic and/or social factor constraints consistent with source language agentivity (e.g., a variant's use being mediated by bilingualism and/or language dominance, cognate status with the source language, or any other non-monolingual-like constraint). Language contact accordingly need not be the only (or even principal) source or impetus behind a feature's use in order for it to be considered contact-induced.

With regard to intervocalic fricatives in Barcelonan Catalan, Spanish contact influence can similarly be ascribed through the stratification by language profile, whereby Catalan [s] (in place of prescriptively expected [z] via /z/ and /S/) was favored by L1-Spanish speakers over L1-Catalan speakers. Indeed, Catalan [z] was categorically favored over [s] by L1-Catalan speakers, suggesting that, at least in contexts of more closely self-monitored (or less spontaneous) speech, the phonemic voicing distinction in Catalan is fully maintained. For L1-Spanish speakers, additional social stratifications of age and gender suggest a possible change in progress from above (cf. Labov 2001), with the gradual adoption of more prescriptively normative [z] being led by younger female speakers, who use [z] as a majority variant at a frequency of 68%. Though unconstrained by word-position, voicing rates in Catalan (by L1-Spanish speakers) are greater in unstressed contexts, indicative of the contributions of phonetic-level lenition on the phonological variability of this voicing contrast.

In comparing intervocalic fricative production across Barcelonan Catalan and Spanish, the aforementioned findings illustrate an intriguing asymmetry. Whereas the sociolinguistic stratification of (prevocalic word-final) Spanish [z] indicates an advancing contact variant whose adoption is led by L1-Catalan speakers, the analogous Catalan [s] instead shows signs of gradual abandonment in favor of [z] on behalf of L1-Spanish speakers. Looking at the production frequencies of Spanish (prevocalic word-final) [z] and Catalan [s] by the younger female leaders of each language profile, Spanish [z] is used at over twice the rate of Catalan [s] (respectively, 74% vs. 32%). For younger female L1 speakers of each language, Spanish prevocalic word-final [z] on behalf of L1-Spanish speakers is used at a frequency of 39%, in comparison to Catalan [s] on behalf of L1-Catalan speakers, which is not attested (0%). Accordingly, in Barcelona, the influence of Catalan on Spanish appears considerably stronger than the influence of Spanish on Catalan, though both directions of effect are still present insomuch as both contact variants are favored by L1 speakers of the contact language (i.e., source language agentivity (Van Coetsem 2000)).

With respect to intervocalic fricative production in Valencian Catalan and Spanish, bidirectional contact influence can similarly be observed regarding the usage patterns of Spanish [z] and Catalan [s]. The influence of Catalan on Spanish is attested in the stratification of Spanish [z] by language profile, with L1-Catalan speakers favoring [z] over L1-Spanish speakers. As was the case for Barcelonan Spanish, in Valencian Spanish, across both profiles of speakers, word-position was a near-categorical constraint, effectively barring [z] in the word-medial context, the site of phonemic voicing in Catalan. Unlike in Barcelona, however, no significant social stratifications of age or gender were obtained for Valencian Spanish, which indicates that Catalan [z] (used by L1-Catalan and L1-Spanish speakers, respectively, with frequencies of 25% and 10% in the prevocalic word-final context) is not presently involved in a process of active adoption or change in the community.

As regards the influence of Spanish on Valencian Catalan, Catalan [s] was again favored by L1-Spanish speakers over L1-Catalan speakers, though, in contrast to Barcelonan Catalan, Valencian Catalan [s] is the majority variant even for L1-Catalan speakers, who notably even self-report a greater use of Spanish than Catalan in their daily lives (see Table 1). Additional social stratification in the form of age and gender effects was exclusive to L1-Spanish speakers, favoring Catalan [s] in the speech of younger female speakers, consistent with a change in progress from below (cf. Labov 2001). Younger L1-Spanish females produced [s] at an overall rate of 93%, suggesting that in less self-monitored speech settings, [s] may likely be (near-)categorical, in line with dialectological descriptions of *apitxat* as lacking the voicing contrast (Prieto 2004; Moll 2006). Still, Valencian Catalan [z] was attested roughly one-third of the time by L1-Catalan speakers, underscoring the reality that the *apitxat* variety, like any linguistic variety, is inherently comprised of sociolinguistic variability. Lastly, stress effects favoring [z] production in unstressed contexts (in parallel with Barcelonan Catalan) highlight the role of phonetically lenitive processes in the variability of a phonological voicing contrast.

As was the case for Barcelonan bilinguals, intervocalic fricative production in Valencian Catalan and Spanish evidences crosslinguistic asymmetry. Beyond stratification by language profile, no sociolinguistic correlates were obtained for Spanish [z], whereas for Catalan [s], L1-Spanish younger females lead their older male counterparts in the adoption of this feature. A comparison of usage frequencies between Spanish prevocalic word-final [z] by L1-Catalan speakers (25%) and Catalan [s] by younger L1-Spanish females (93%) illustrates the greater influence (by a magnitude of nearly four) of Spanish on Catalan for this community. For L1-speakers of each language, Spanish prevocalic word-final [z] on behalf of L1-Spanish speakers occurs at a rate of 10%, while Catalan [s] on behalf of L1-Catalan speakers is used at a rate of 67%. Accordingly, in Valencia, the influence of Spanish on Catalan appears considerably stronger than the influence of Catalan on Spanish, though both directions of effect are still present insomuch as both contact variants are favored by L1-speakers of the contact language (i.e., source language agentivity (Van Coetsem 2000)).

Ultimately, the aforementioned findings evidence a case of opposing contact asymmetries across the bilingual communities of Barcelona and Valencia. Operationalized as differential magnitudes between the production of Spanish (prevocalic word-final) [z] and Catalan [s] by L1-Catalan speakers and L1-Spanish speakers, respectively, the influence of Barcelonan Catalan on Barcelonan Spanish is stronger by a factor of approximately two, whereas in Valencia, the influence of Spanish on Catalan is stronger by a factor of approximately four. As the voicing of Spanish /s/ to [z] is just as articulatorily motivated as the devoicing of Catalan /z/ (or /S/) to [s] (Hualde and Prieto 2014, p. 111), differences in the strength of directionality between Catalan as a minority language and Spanish as a majority language can be more transparently linked to the distinct social realities of each language in each community. In Barcelona, the present sociolinguistic interview data corroborate prior claims (cf. Siguan 1988; Sinner 2002) that Catalan is in a position of equal (if not greater) linguistic and social capital than Spanish. Barcelonan speakers in the present investigation readily articulated their esteem of Catalan as part of an expressly bilingual Catalonian identity (corroborating covert attitudes to the same effect in Davidson (2019)), with most advocating for its continued maintenance (if not predominance) amongst subsequent generations of Catalonians. For Catalonians, the active adoption of Spanish [z] is accordingly a " ... linguistic resource available to [speakers] in their variety of Spanish as another ethnolinguistic and ideological assertion besides language choice" (Vann 2007, p. 271), the directionality of which (i.e., the greater adoption of Spanish [z] than Catalan [s]) notably mirrors the community's active ideological embrace of Catalan.

In Valencia, in contrast, speakers in the present investigation largely expressed a general apathy toward the use and preservation of Valencian Catalan, tied in part to a social stigma of being too pro-Catalan. The predominant outlook toward Valencian as not a particularly essential language for normal life in Valencia, when coupled with the sizeable minority (33%) of informants that affirmed Spanish as the primary language expressive of Valencian identity, accordingly patterns with the directionality favoring the adoption of Catalan [s] over Spanish [z]. While I do not claim the asymmetry regarding the greater social stratification and use of Valencian Catalan [s] as compared to Valencian Spanish [z] to be a singular, direct consequence of the greater hegemonic distance between Spanish and Catalan in Valencia, the stronger contact influence of Spanish on Valencian Catalan can nonetheless be understood as a probabilistically conditioned outcome of social factors in this community, including population size, sociopolitical status, sociocultural status, and language attitudes (Thomason 2001, 2010; Thomason and Kaufman 1988), all of which uniquely favor Spanish over Catalan in this community. Though both linguistic and social factors are posited to contribute to language variation and change, the present case study, specifically as concerns two equally endogenously motivated changes (e.g., Spanish [z] and Catalan [s]), notably demonstrates how unique social contexts serve to probabilistically favor distinct linguistic outcomes.

#### **7. Conclusions**

The present study aimed to explore intervocalic fricative production as a variable feature of Catalan–Spanish contact in two unique communities of Catalan–Spanish bilingualism in order to address questions of directionality and asymmetry of contact influence between them. The unique asymmetries of influence between Catalan and Spanish across Barcelona and Valencia were linked to the asymmetric sociopolitical and sociolinguistic relationships between the languages in each community, which probabilistically condition contact influence at the level of the greater speech community. Accordingly, the social context of language contact plays an essential role in the dynamics of linguistic variation and change in contact settings, in addition to the linguistic and cognitive factors often investigated regarding contact effects at the level of the individual bilingual speaker.

**Funding:** This research was funded by an Invitational Research Grant (*Atracció de Talent*) sponsored by the *Vicerectorat d'Investigació i Política Científica* (*Servei d'Investigació*) of the *Universitat de València*.

**Acknowledgments:** This research would not have been possible without the generous hospitality and support provided by Ferran Robles and Maria Labarta Postigo (*Universitat de València*), Antonio Torres Torres, Gemma de Blas (*Universitat de Barcelona*), and *Mireia Trench-Parera* (*Universitat Pompeu Fabra*), as well as Clara Cervera and the Martori family. I am additionally grateful for the helpful comments and insights provided by two anonymous reviewers, in addition to feedback from the members of the UC Davis Symposium on Language Research.

**Conflicts of Interest:** The author declares no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


Morgan, Terrell. 2010. *Sonidos en context: Una introducción a la fonética del español con especial referencia a la vida real*. New Haven: Yale University Press.

Navarro Tomás, Tomás. 1918. *Manual de pronunciación española*. Madrid: Imprenta de los Sucesores de Hernando.


Prieto, Pilar. 2004. *Fonètica i fonologia. Els sons del català*. Barcelona: Editorial UOC.

Quilis, Antonio. 1981. *Fonética acústica de la lengua española*. Madrid: Gredos.


Tagliamonte, Sali. 2012. *Variationist Sociolinguistics: Change, Observation, Interpretation*. Oxford: Wiley-Blackwell.


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Shared or Separate Representations? The Spanish Palatal Nasal in Early Spanish**/**English Bilinguals**

**Sara Stefanich 1,\* and Jennifer Cabrelli <sup>2</sup>**


Received: 3 September 2020; Accepted: 22 October 2020; Published: 2 November 2020

**Abstract:** The purpose of this study is to examine phonetic interactions in early Spanish/English bilinguals to see if they have established a representation for the Spanish palatal nasal /ñ/ (e.g., /kañon/ *cañón* 'canyon') that is separate from the similar, yet acoustically distinct English /n+j/ sequence (e.g., /kænjn/ 'canyon'). Twenty heritage speakers of Spanish completed a delayed repetition task in each language, in which a set of disyllabic nonce words were produced in a carrier phrase. English critical stimuli contained an intervocalic /n+j/ sequence (e.g., /dEnjα/ 'denya') and Spanish critical stimuli contained intervocalic /ñ/ (e.g., /deñja/ 'deña'). We measured the duration and formant contours of the following vocalic portion as acoustic indices of the /ñ/~/n+j/ distinction. The duration data and formant contour data alike show that early bilinguals distinguish between the Spanish /ñ/ and English /n+j/ in production, indicative of the maintenance of separate representations for these similar sounds and thus a lack of interaction between systems for bilinguals in this scenario. We discuss these discrete representations in comparison to previous evidence of shared and separate representations in this population, examining a set of variables that are potentially responsible for the attested distinction.

**Keywords:** heritage bilingualism; early bilingualism; Spanish; English; phonology; phonetics; speech production

#### **1. Introduction**

An overarching question in the field of bilingual phonology addresses the levels at and degree to which a bilingual's phonetic and phonological systems interact and how these interactions can be modelled within a theory of bilingual grammar. Bilingualism comes in many different forms, one of which is heritage speaker bilingualism. Herein, the term "heritage speaker" (HS) "refer[s] to any bilingual whose [first language] L1 (HL) was learned primarily at home as a minority language and whose [second language] L2 was learned primarily outside the home as the societal (majority) language" (Chang 2020, p. 2). As of result of this acquisition trajectory, heritage speakers' HL and majority language (ML) typically differ with regard to age and context of acquisition, frequency and context of usage, formal education, proficiency, and dominance (which often shifts from the HL to the ML once speakers reach school age), among other factors. These between-language differences yield a unique testing ground for the examination of how these factors modulate the nature of phonetic and phonological interactions in the bilingual mind.

Empirical investigations into the nature and degree of these interactions in heritage speaker phonologies have experienced an uptick over the last decade (see (Chang 2020) for a comprehensive review) and a survey of the growing body of research indicates that production patterns in the heritage language often lie between those attested in late L2 learners of the heritage language that are L1 speakers of the majority language (henceforth, L2ers) and in L1 speakers of the heritage language

that have acquired the L2 as adults (henceforth, L1ers). Preliminary evidence suggests that segmental phenomena might be less vulnerable than suprasegmental phenomena to ML influence, albeit with substantial individual variability given the heterogeneity of HSs' language experience. Production data in the ML, on the other hand, although very limited, shows a clearer pattern of production that is typically indistinguishable from that of L1ers, particularly at the segmental level (e.g., Barlow 2014; Mayr and Siddika 2018; McCarthy et al. 2013). In the HL, much of the production research to date has examined segmental phenomena, with attention given to the representation of analogous sounds that are found in monolingual varieties of the ML and HL. That is, researchers have sought to determine whether HSs' production aligns with the baseline production data from L1ers or whether it shows influence from the ML. Existing research varies in outcomes between HL data that skew towards an L1 baseline (e.g., Chang et al. 2009, 2011; Lein et al. 2016) and those that do not (e.g., McCarthy et al. 2013; Ronquest 2012). This variability has been attributed to factors such as speaker generation and sociocultural factors (e.g., Nagy and Kochetov 2013), dominance (e.g., Amengual 2016, 2018; Shea 2019; Simonet 2014), proficiency (e.g., Shea 2019), age of ML acquisition (e.g., Barlow 2014; Cheng 2019), relative similarity between HL and ML sound(s) (e.g., Godson 2003, 2004; Yao and Chang 2016), and whether testing took place in monolingual versus bilingual testing mode (e.g., Amengual 2018; Simonet 2014; Simonet and Amengual 2020), among other things.

Most empirical studies report data solely from heritage speakers' HL, which prevents a direct comparison between HL and ML production that would allow for the verification of distinct HL and ML representations. While comparisons between heritage and baseline data are valuable in their own right, comparisons between the HL and ML are an important contribution to our understanding of heritage speaker phonology in that they allow us to determine the nature of the interaction of the HS's two phonologies. Specifically, we can determine for a crosslinguistic pair of sounds whether a speaker's system includes separate representations utilized in ML versus HL production or a single representation that relies on the production of both the ML and HL. The few studies that report direct comparisons suggest that heritage speakers maintain distinct representations in a shared phonetic space, even when the sound pair under investigation is considered to be similar—but acoustically distinct—in baseline varieties of the HL and ML (e.g., Amengual 2018; Chang et al. 2009, 2011; Knightly et al. 2003).

The studies that have compared HL and ML productions have tested one-to-one analogous sound correspondences between the HL and ML. In the current study, however, we examine a distinct crosslinguistic scenario, specifically the production of nasal sounds in heritage speakers of Spanish in the Midwest US. While the inventory of monolingual Spanish contains the palatal nasal phoneme /ñ/ (e.g., *cañón* 'canyon' /ka"ñon/), the inventory of monolingual English does not. However, an approximation exists in the form of the heterosyllabic phoneme sequence /n+j/ (e.g., 'canyon' /"kæn.jn/, which can be distinguished acoustically from the complex segment /ñ/ via the duration and formant trajectories (e.g., Bongiovanni 2019).<sup>1</sup> Herein, we ask whether bilingual speakers of Spanish as the HL and English as the ML rely on distinct representations when producing these sounds in Spanish mode versus English mode.

Data from L1 English/(late) advanced L2 Spanish learners (Stefanich and Cabrelli 2016) have shown that advanced (late) L2 Spanish learners' productions patterned together in English and Spanish modes, and that this apparent shared category did not align with baseline (L1) Spanish data nor with the baseline English data provided by beginner L2 Spanish learners. That is, learners did not appear to create a novel L2 category when producing nonce words that were presented to them auditorily as /ñ/; Stefanich and Cabrelli (2016) considered this shared intermediate representation to be a potential reflection of L2 influence on an early established L1 representation. This finding aligns with the hypothesis that "similar" sounds in the L2 with an analogue sound in the L1 will be less salient in the

<sup>1</sup> A note on notation: although category representations are often represented in the literature using brackets, we use slashes when referring to phonemic inventories and representations in the speaker's grammar.

input and, in turn, the learner will be less likely to create a novel category for it in the L2 (Flege 1995; Flege and Bohn 2020, but cf., e.g., van Leussen and Escudero 2015, whose (revised) Second Language Linguistic Perception (L2LP) model predicts that similar sounds will be less difficult than different sounds). In light of these L2 data, we examine herein whether early bilinguals' data align with those of their advanced L2 counterparts, or whether these speakers' qualitative and quantitative differences in language experience yield separate representations when in Spanish versus English mode.

After an overview of the relevant nasal consonant inventory in Spanish and English and their acoustic properties, we present the research question and predictions specific to this crosslinguistic scenario. Then, we detail the methods and the results, followed by a discussion. The results from a delayed repetition task administered in separate Spanish and English modes suggest that early bilinguals rely on distinct representations in each mode; the acoustic data indicate that they produce a complex segment in Spanish mode versus a two-segment sequence in English mode. This outcome thus suggests a lack of interaction between systems for these bilinguals in this case, despite the crosslinguistic proximity between English /n+j/ and Spanish /ñ/.We discuss these discrete representations in comparison to previous evidence of merged versus separate representations in this population and examine the variables that are potentially responsible for the distinction.

#### *1.1. Nasal Consonants in Spanish and English*

Spanish has three nasal phonemes that contrast by place of articulation in syllable onset position: bilabial /m/, alveolar /n/, and alveolopalatal /ñ/ (Díaz-Campos 2004; Recasens 2013) (1).


The palatal nasal /ñ/ is the least frequent phoneme in Spanish (Melgar de González 1976) and is a complex segment comprised of an alveolar nasal element followed in succession by a palatal glide element (Martínez Celdrán and Planas 2007; Massone 1988) posited to be phonologically associated with the nasal segment (e.g., Colina 2009).

Although English lacks a phonemic palatal nasal (the inventory is limited to /m n ŋ/), a similar but heterosyllabic /n+j/ sequence is found in words such as *canyon*, *onion*, and *lanyard* (2).

2. *canyon* /"kænjn/; *onion* /"2njn/; *lanyard* /"lænjô " d/.

While there are no published data on the acoustic quality of the English /n+j/ sequence to inform the acoustic analysis parameters that distinguish the complex segment /ñ/ from the discrete segments of /n+j/, a similar (albeit tautosyllabic) sequence is found in Spanish in the surface form of words such as *uranio* /uRanjo/ 'uranium' and has been investigated acoustically. In Spanish, both /n+j/ 2 and /ñ/ are composed of a combination of a nasal element and a palatal glide element; /ñ/ is a single complex segment in which the glide element is said to be "partial" (versus a "full" element in /n+j/), Martínez Celdrán and Planas 2007). On the other hand, /n+j/ is a sound sequence in which a "full" glide element is an independent segment. Phonologically, the sound sequence is hypothesized to differ from /ñ/ in that the glide element in /n+j/ is associated with the following vowel, forming a complex nucleus (e.g., Colina 2009); we assume this to be the case for English /n+j/ as well.

Despite their commonalities, the pair has been found to be distinguished acoustically in word pairs such as *uranio* /uRanjo/ 'uranium' and *huraño* /uRaño/ 'unsociable' (see Bongiovanni 2019 for a review of acoustic and articulatory evidence). In Bongiovanni's (2019) study of /n+j/ and /ñ/ production

<sup>2</sup> We employ this phonemic notation following Bongiovanni (2019), recognizing that the glide in this sound sequence in Spanish is not phonemic and that this notation conflates phonetic and phonological representations.

in Buenos Aires Spanish, an analysis of the vocalic portion3 following the nasal consonant supported the phonological association of the glide to the nasal consonant in /ñ/ (i.e., ñV) and the glide to the vowel nucleus in /n+j/ (i.e., njV). Specifically, the gestural difference in the timing and degree of lingual-palatal contact reported in studies such as Recasens (2013) was acoustically evident in formant contour trajectories (i.e., the rise of F2 and the decrease in F1 in /n+j/), the timing at which F1 minimum and F2 maximum were reached (i.e., the timing for /ñ/ should be earlier), and the duration of the vocalic portion (predicted to be longer in /n+j/ given its status as part of a complex nucleus). Although there is no crosslinguistic research that examines /n+j/ in English versus Spanish, given the heterosyllabic nature of English /n+j/ versus the tautosyllabic /n+j/ in Spanish, it is logical to predict that the glide will be even more clearly associated with the following vowel in English. These predicted differences are visible in the spectrograms and waveforms in Figures 1–3, which are taken from a participant's productions of the nonce item 'denya' in English mode (Figure 1) and 'deña' in Spanish mode (Figure 2), with 'dena' in Spanish mode (Figure 3) as a point of comparison. In the current study, we follow Bongiovanni (2019) and measure duration and formant contours as a correlate of the phonological association of the glide element. As she notes, reporting data from both measures will allow for the confirmation of the reliability of each measure and the avoidance of the overgeneralization of data based on a single measure.

**Figure 1.** Waveform and spectrogram of a participant's production of /dEnjα/ 'denya' in English mode.

**Figure 2.** Waveform and spectrogram of a participant's production of /deña/ 'deña' in Spanish mode.

<sup>3</sup> In light of the unreliability of acoustic analysis of nasal consonants (see, e.g., Fujimura 1962, cited in Bongiovanni 2019, p. 4), Bongiovanni (2019) limited her analysis to the following vocalic portion.

**Figure 3.** Waveform and spectrogram of a participant's production of /dena/ 'dena' in Spanish mode.

#### *1.2. Research Question and Predictions*

The research question that drives this study is the following: do heritage speakers of Spanish evidence distinct representations in their productions of /ñ/ when in Spanish mode and /n+j/ when in English mode? This is an exploratory question with three possible outcomes: The first is that the differences between language mode in duration and/or formant contours will reveal that these speakers maintain distinct representations. In the case that the quality of these differences patterns with the acoustic parameters associated with /ñ/ versus /n+j/, such an outcome would be suggestive of implicit knowledge of the distinct single complex segment /ñ/ in Spanish versus the segment sequence of /n+j/ in English. However, it is wholly possible that speakers will rely on cues other than those reported in the baseline literature, as seen in work on within-language contrasts (e.g., Amengual 2016). The second is that there are no between-language acoustic differences and that the duration and formant contour data skew towards the acoustic description of /n+j/ (i.e., a two-segment sequence rather than a complex segment). The third, like the second, is a lack of between-language differences, but with data that pattern with the acoustic description of /ñ/. In the latter two cases, it will be necessary to consider what might drive the privileged status of one representation over the other. In terms of predictions, while this is the first study to our knowledge to examine a complex segment compared with a two-segment sequence, we can look to the minimal research that directly compares HL and ML segmental data. As noted in Section 1, when limited to HL data, it is difficult to draw strong conclusions about the interaction of the HL and ML without ML data as a point of comparison. We can of course predict that, if the HL is baseline-like, and knowing that HSs typically are baseline-like in the ML (see Chang 2020, p. 10 for discussion), then they have two representations. Our question, however, is not how close the ML or HL production is to a baseline, but rather whether the speakers' production patterns are different in English mode versus Spanish mode. As we have mentioned, the few studies that directly compare HL and ML data indicate distinct representations in a shared phonetic space, which yields the prediction that the HS in the present study will distinguish acoustically between /ñ/ and /n+j/ in production. In the case that they do not, we predict that the production will skew towards the reported English pattern, given that (a) the HSs are largely English dominant and (b), overall, segmental differences in the ML when compared to the ML "norm" are small and variable, without clear evidence to date that any measured differences are perceivable (Chang 2020, p. 10).

#### **2. Materials and Methods**

#### *2.1. Participants*

Twenty Spanish/English bilinguals participated in this study. At the time of the study, all the participants were undergraduate students living in the Chicagoland area. The participants ranged in age from 18–25 (*M* = 21.05, *SD* = 1.47). All the participants reported learning Spanish before the age of 3 (*M* = 0.25, *SD* = 0.79) and English before the age of 8 (*M* = 3.30, *SD* = 2.73). Specifically, six participants reported learning Spanish and English since birth, whereas thirteen reported learning Spanish before English and one participant learned English before Spanish. We estimate that the majority of these participants are second-generation HS, as approximately 85% of the Spanish HS at the institution where the data were gathered are second-generation speakers (Potowski 2020). The participants reported that, for any given week, they use more English than Spanish with friends and at school/work but more Spanish than English with family (Table 1).


**Table 1.** Mean percent of language use by domain.

As a proxy for language dominance, the participants completed the Bilingual Language Profile (BLP, Birdsong et al. 2012), a bio-linguistic questionnaire which uses the participants' responses to provide a language dominance score on a scale of −218 (Spanish dominant) to 218 (English dominant), with "0" indicating a "balance" between the two languages.<sup>4</sup> The majority of our participants scored on the English side of the scale (*n* = 17), with a range of scores from −22.7 to 88.6 (*M* = 43.56, *SD* = 35.35); the three participants who scored on the Spanish side of the scale fell very close to the balanced zero point. As part of the BLP, the participants rated their Spanish and English proficiency in speaking, understanding, reading, and writing on a scale from 1 (not very well) to 6 (very well) (Table 2). In addition to self-rated proficiency, the participants completed a 50-item written Spanish proficiency assessment composed of portions of the Diploma of Spanish as a Foreign Language (DELE) and Modern Language Association (MLA) assessments commonly administered in heritage research (e.g., Keating et al. 2016; Leal et al. 2015). Our participants averaged a score of 35.50 (*SD* = 7.80) on the written assessment. Dominance and written proficiency were found to be weakly negatively correlated (r(18) = −0.32, *p* = 0.175).

**Table 2.** Self-reported proficiency (scale 1–6).


Heritage speaker populations have been shown to be heterogeneous in terms of language experience and use (e.g., Montrul and Polinsky 2019), and the sample in the current study is no exception. We acknowledge the heterogeneity of these Spanish/English bilinguals in terms of age of acquisition, proficiency, language use, and language dominance and address a number of these factors as they relate to the outcomes in our discussion (Section 4).

#### *2.2. Materials and Procedure*

The experiment consisted of Delayed Repetition Tasks (e.g., Trofimovich and Baker 2006) in English mode and Spanish mode. Each task included 40 trials (10 critical, 10 control, 20 distractor). Each trial

<sup>4</sup> Following authors such as Birdsong (2016) and Solis-Barroso and Stefanich (2019), we recognize the gradient nature of the different dimensions of dominance and treat the variable as scalar rather than categorical.

was composed of a target nonce word presented auditorily within the carrier phrase 'I'm saying \_\_\_ to you' in English and its equivalent *Digo X para ti* in Spanish. A 1000 ms silent pause was then followed by the spoken prompt "What are you saying to me?" in English or the equivalent *¿Qué me dices?* in Spanish, which prompted the participant to produce the original phrase. Items in both tasks had penultimate stress and were phonotactically licit in the respective language presented. Critical items followed a (C)CV1n.jV2 (English) or (C)CV1.ñV<sup>2</sup> (Spanish) structure and were counterbalanced in each language with 10 control items containing the alveolar nasal /n/ in a (C)CV1.nV2 structure.<sup>5</sup> Across critical and control conditions, V1 was a mid or low vowel (/E/ or /α/ in English and /e/ or /o/ in Spanish; V<sup>2</sup> was /a/ in Spanish and /α/ in English. The 20 distractors followed the same general (C)CV.CV structure as the control and critical items. The item composition in the two tasks is summarized in Table 3; the full set of stimuli is in Appendix A. English stimuli were recorded by a phonetically trained female native speaker of Midwest American English; Spanish stimuli were recorded by a phonetically trained female native speaker of Northern Peninsular Spanish.


**Table 3.** Composition of nonce stimuli in the English and Spanish delayed repetition tasks.

Trials were presented using E-prime 2.0 (Psychology Software Tools, Inc., Pittsburgh, PA, USA); audio stimuli were presented over Sennheiser HD-280 PRO (Sennheiser, Wedemark, Germany) headphones through a MOTU Ultralite mk3 interface (MOTU, Cambridge, MA, USA). Recordings took place in a sound-attenuated booth using a head-mounted Shure SM 10A (Shure Inc., Niles, IL, USA) dynamic microphone and a Marantz PMD 661 solid-state recorder (Marantz Corp., Kawasaki, Japan) at a 44.1 kHz sampling rate.

Data were collected in a single session that consisted of separate English and Spanish session modes; the mode order was counterbalanced across participants. All the participants provided informed consent following University of Illinois at Chicago IRB protocol 2015-0040 prior to data collection. The English mode session began with a 10 min interview with the participant to establish the language mode. The participants then completed the English delayed repetition task and the BLP. The Spanish mode session consisted of a 10 min interview, the Spanish delayed repetition task, and the written proficiency assessment.

#### *2.3. Analysis*

#### 2.3.1. Acoustic Analysis

Following the literature presented in Section 1.1, this study examines the duration and formant contours of the vocalic portion following the nasal segment. To that end, sound files were segmented and analyzed in Praat [6.1.16] (Boersma and Weenink 2019). The theoretical ceiling of tokens was 600, or 30 per speaker (10 Spanish critical, 10 English critical, 10 Spanish control). One participant's data was excluded from analysis due to a lack of discernible impressionistic difference between /n/ and /ñ/ in Spanish. Further, an additional 14 tokens were removed due to non-target productions (participants

<sup>5</sup> Spanish alveolar data are reported for contextual comparison; we have excluded the English alveolar data, as they are not relevant to the research question.

skipping, repeating, or producing different segments), creaky voice, or background noise, for a final total of 556 tokens.

During segmentation, the onset of the vocalic portion was determined by the visual presence of an abrupt change in formant structure and frequencies, and the offset was determined by a breaking up of the formant structure and a loss of energy and periodicity in the waveform (Ladefoged 2005). Following Bongiovanni (2019), boundaries between formant transitions or between the glide and the vowel /a/ were not marked.

Once segmentation was completed, measurements were extracted via scripts (Hirst 2012 for automatic duration measurements; McCloy and McGrath 2012 for semi-automatic formant measurements). Formant measurements were taken at 20 points within the vocalic portion (every 5%); 5.2% of the data were manually corrected where it was evident that there were formant tracking errors with the Praat script.

#### 2.3.2. Statistical Analysis

For the duration of the vocalic portion, a linear mixed model (LMM) was fit to the data (measured in ms)<sup>6</sup> using the MIXED procedure in SPSS 26 (IBM Corp. 2019) with a fixed effect of language mode (English, Spanish). The random effects structure (RES) was the maximal structure supported by the data (Barr et al. 2013) and included by-subject and by-item intercepts.

For the formant structure of the vocalic portion, we followed the analysis laid out by Bongiovanni (2019). The formant values were transformed to Bark units, and Smoothing Spline ANOVA (SSANOVA) were fit to the data (time points and corresponding Bark units at each time point) in R, version 4.0.2 (R Core Team 2020), with the gss package. Here, a smoothing spline fits a smooth curve to the observations and the SSANOVA determines whether the curves in question are statistically different from one another (i.e., whether their confidence intervals overlap). As in previous research (e.g., Bongiovanni 2019; Kirkham 2017; Nance 2014; Simonet et al. 2008), we limit our report to the graphical representations of the SSANOVA.

#### **3. Results**

#### *3.1. Duration Results*

A visual representation of the duration data is presented via the boxplot in Figure 4; as predicted, the vocalic portion of /n+j/ produced in English mode was longer than that of /ñ/ produced in Spanish mode. The LMM yielded a significant main effect of language (*F*(1,41.942) = 70.524, *p* < 0.001); a Bonferroni post-hoc comparison showed that the vocalic portion for the English /n+j/ (*M* = 169.50 ms, SE = 5.13, CI [159.17,179.84]) was longer than for the Spanish /ñ/ (107.39 ms, SE = 5.33, CI [96.63,118.16], *p* < 0.001). Hedges' g was calculated as a measure of the effect size on the raw means and standard deviations (English *M* = 137.75 ms, *SD* = 42.95 ms; Spanish *M* = 85.78, *SD* = 17.92) and yielded a large effect size of 1.51 (according to Plonsky and Oswald 2014, for within-subject comparisons). This outcome aligns with the predicted crosslinguistic difference and is indicative of distinct representations in Spanish and English.

<sup>6</sup> To determine the effect of individual differences in speech rate on the outcome, a separate model was fit to z-score-transformed data; the model yielded the same main effect of language (*F*(1,41.942) = 70.524, *p* < 0.001). For ease of interpretation, we report the duration data herein in ms.

**Figure 4.** Duration of the following vocalic portion of /n+j/ produced in English mode and /ñ/ produced in Spanish mode Note: "Nasal segment" refers to /ñ/ and /n+j/; diamonds represent duration means.

#### *3.2. Formant Structure Results*

Recall that, with SSANOVA, statistical significance is indicated by non-overlapping confidence intervals plotted around the data-generated formant curves. The acoustic differences between /ñ/ and /n+j/ are predicted to take the form of a lower F1 and a higher F2 for /n+j/ than /ñ/. Keeping these predictions in mind, the results of the SSANOVA are presented in Figure 5.

**Figure 5.** Smoothing Spline ANOVA of formant trajectories by nasal segment.

For F1, the confidence intervals of the /ñ/ and /n+j/ curves do not overlap between the 0% and 40% points, after which they run adjacent to one another between the 40% and 100% points with a slight overlap between 50% and 75%. For F2, although there is no overlap between 0–20% and 30–80%, the intervals for /ñ/ and /n+j/ overlap at two points (at roughly 25% and 85%), illustrating a steeper negative slope for /ñ/ versus /n+j/. These formant readings follow the predicted shapes, with a lower F1 and higher F2 for /n+j/ than for /ñ/, although these differences are small, with a maximum of between 0.55 and 0.61 Bark at their most different. This difference falls below the assumed just-noticeable difference (JND) threshold of 1 Bark unit, which we address in the discussion in terms of whether this difference is perceivable. In contrast, there is zero overlap in the confidence intervals for Spanish /n/ versus /n+j/ and /ñ/, with differences that exceed the JND threshold. For Spanish /n/ versus English /n+j/, differences in F1 range from 1.15 to 3.17 Bark and in F2 from 1.61 to 3.68 Bark at their most different. For Spanish /n/ versus /ñ/, the differences in F1 range from 1.00 to 3.01 Bark and in F2 from 1.48 to 3.16 Bark at their most different.

#### **4. Discussion**

#### *4.1. Summary*

This study investigated the speech production of a group of Spanish heritage speakers with English as the ML to determine whether their production patterns are acoustically distinct when producing /ñ/ in Spanish mode versus /n+j/ in English mode. The between-mode differences, which we took to indicate separate representations in a shared phonetic space, were determined via two acoustic indices: (1) the duration of the vocalic portion following the nasal segment (hereafter, FV) and (2) the formant trajectories of the same FV. Acoustic differences were predicted to present in the form of (a) a longer FV duration for /n+j/ than /ñ/ and (b) formant trajectories in which the /n+j/ evidenced a lower F1 valley and higher F2 peak than /ñ/, as indicated by non-overlapping formant contours.

The results from the duration analysis confirmed the prediction—the FV for /n+j/ was significantly longer than the FV for /ñ/. The results from the SSANOVA also fell in line with the expected predictions for the differences between the formant contours; the formant trajectory of the FV for /n+j/ diverged from that of /ñ/ for portions of the vowel and evidenced a lower F1 and higher F2. Taken together, these results suggest that this group of Spanish HS draws on distinct representations when producing /ñ/ in Spanish mode versus /n+j/ in English mode. That is, despite the similarities between /n+j/ and /ñ/, for these participants there is no evidence of interaction between the two phonological systems in this particular case.

#### *4.2. Separate Representations and Age of Acquisition*

A lack of interactions between phonological systems suggests that these early bilinguals had sufficient input to develop the representation for the sound they produce when in Spanish mode, despite the fact that /ñ/ is the least frequent phoneme in the Spanish inventory (Melgar de González 1976). Moreover, they appear to have maintained the representation even after (in most cases) switching dominance to the ML and developing a representation of /n+j/ that is evident in the English mode data. These data align with previous findings that suggest that phonological systems are less susceptible to interaction at the segmental level vs. the suprasegmental level (see Chang 2020 for review). Our findings also add to those that have reported distinct representations of sounds in a shared phonetic space that are similar but acoustically different (e.g., Amengual 2018; Chang et al. 2009, 2011; Knightly et al. 2003, but cf. e.g., Godson 2003; Kang et al. 2016).7

<sup>7</sup> One factor that may contribute to why the data do not evidence merged categories, such as those in the voiced stop data in Kang et al. (2016) and the acoustically similar vowel data in Godson (2003), is that some similar crosslinguistic pairs might be "easier" to keep separate. Recall from Section 1.1 that Spanish also has a /n+j/ sequence that contrasts with /ñ/ in pairs, such as *uranio* /uRanjo/ 'uranium' and *huraño* /uRaño/ 'unsociable'. Although the only experimental data on this contrast we are aware of is from Buenos Aires Spanish, in which there is a near-merger of /ñ/ and /nj/, Bongiovanni (2019) found that, even in that case, while the participants did not accurately perceive the difference, their productions were acoustically distinct despite the contrast's low functional load. We posit that one possibility is that the early Spanish bilinguals in this

Interestingly, however, the attested distinction was not found in the production of advanced L1 English/L2 Spanish learners (Stefanich and Cabrelli 2016) in a study that employed the same task. Instead, Stefanich and Cabrelli determined that the advanced L2ers' productions were representative of a merged/hybrid category that was used when producing /ñ/ in Spanish mode and /n+j/ in English mode. Because these advanced L2ers' data in English mode were different from a group of beginner L2ers' data (used as a proxy for the L2ers' L1 baseline in light of their minimal exposure to L2 input), Stefanich and Cabrelli (2016) posited the development of a hybrid category. While it is important to note that the acoustic index used was the duration of the nasal segment and thus not directly comparable8, the contrast leads us to the following question: which factors might possibly yield interaction between phonological systems for adult L2ers but not for HSs? A primary difference between these two groups of bilinguals lies in their age of acquisition (AoA).9 While all the HS participants reported having acquired Spanish before the age of three and English before the age of eight, the mean L2 Spanish AoA was 14.5 (*SD* = 4.21). A substantial body of research suggests a critical period for phonology around 5 years old (e.g., Barlow 2014; Flege et al. 1999; Newport et al. 2001; Scovel 2000, but cf. work that posits a later critical or sensitive period, e.g., DeKeyser 2012, or a lack of one overall, e.g., Abrahamsson and Hyltenstam 2009), and Barlow (2014) indicates that L1 influence on the L2 is more likely after the cutoff age. For instance, in her analysis of Spanish/English laterals, Barlow (2014) found that late bilinguals (AoA > 6) showed evidence of the (English-like) allophonic distribution of [l]~[ë] in English and in Spanish compared with early bilinguals (AoA < 5), who only showed it in English. That is, the late bilinguals evidenced interaction between systems (influence from L2 → L1), whereas the early bilinguals did not. Given that this outcome patterns with our HS and L2 results, we suggest that age of acquisition is a good candidate to be a predictor of whether interactions will occur. To confirm this hypothesis, however, we will need to compare groups with early versus late AoA that are matched (as closely as possible) in Spanish proficiency and dominance.

#### *4.3. Individual Variation*

While the group results indicate that these heritage speakers have separate representations, there is substantial variability in how this distinction is acoustically realized. While it could be the case that these individual patterns are simply noise in the sample, it is valuable to consider whether certain factors previously reported to condition bilingual speech patterns might explain some of the attested variability. Specifically, we discuss dominance, proficiency, and individual differences related to perception.

Although the purpose of this study was not to *a priori* examine the effects of proficiency and dominance, an analysis of the individual participants' data as they relate to the measures of dominance and proficiency used allows us to examine any trends in the relationship between them and the FV duration and formant contours. Figures 6 and 7 illustrate the difference in duration (in ms) between /n+j/ and /ñ/ for each participant by proficiency score and dominance score, respectively.

study successfully developed these separate representations early on, and that doing so facilitated the acquisition of the /n+j/ sequence in English. Comparisons of the /n+j/ productions in English versus Spanish mode will inform whether there is a single representation of the /n+j/ sequence or two, thus providing a more complete picture of the crosslinguistic relationship of these similar sounds.

<sup>8</sup> Reanalysis of the L2 data from Stefanich and Cabrelli (2016), which will include the measurement of the same acoustic indices (FV duration and formant contours), is in progress.

<sup>9</sup> Both groups of bilinguals are overall English dominant, strengthening our conclusion that it is not merely language dominance alone that contributes to the interaction (or lack thereof) between systems. Further, given that language dominance is thought to be fluid and changeable across a bilingual's lifespan (e.g., De Houwer 2011), it makes sense that dominance would not be a determining factor in system interaction.

**Figure 6.** FV duration difference by Spanish proficiency score.

**Figure 7.** FV duration difference by language dominance score.

Previous research on heritage phonology has found that proficiency and dominance in the HL can at least partially account for individual variation, including the relationship between measures of these constructs and the robustness of a distinction (at least for within-language contrasts—see, e.g., Amengual 2016 for dominance, and Shea 2019, who examined dominance and proficiency and found proficiency to be a stronger predictor). In the current scenario in which we examine between-language distinctions, we might predict that bilinguals with a lower Spanish proficiency score would show smaller differences in FV duration and formant contours than those with higher scores. Regarding dominance, we might expect that bilinguals that fall closer to the balanced zero-point on the dominance scale to evidence greater distinctions than those bilinguals who fall towards the ends of the scale, whose representations would be predicted to skew towards the respective languages.

With respect to the Spanish language proficiency measure reported in Section 2.1, there does not seem to be a discernible pattern; the lack of a relationship visible in Figure 6 is supported by a weak negative correlation (r(18) = −0.11, *p* = 0.639). If proficiency as measured here played a larger role, we might expect to find a strong positive correlation whereby, as proficiency in the HL increases, so does the difference in duration, maximizing the distance between productions in English versus Spanish modes. This lack of a relationship also seems to be the case for the language dominance measure; the BLP score and duration difference are only weakly correlated (r(18) = −0.21, *p* = 0.387). Moreover, if we examine dominance as binary, the three Spanish-dominant participants did not produce durational differences that were substantially larger or smaller than the English-dominant participants, nor did we find any clustering of duration differences around the "balanced" point on the scale.

Turning to the formant contours, the individual SSANOVA (Appendix B), unlike the duration differences, do show differences between participants that can be grouped into four patterns. Of the 19 participants, eight participants' individual splines mimic that of the group-level pattern, i.e., a lower F1 and a higher F2 for /n+j/ than /ñ/ (Pattern 1). One participant diverges from the group pattern, with /ñ/ having a lower F1 than /n+j/ (Pattern 2), and seven participants diverge from the group pattern, with /ñ/ having a higher F2 than /n+j/ (Pattern 3). Lastly, three participants diverge from the group pattern for both F1 and F2. In this latter case, the pattern mirrors that of the group, exhibiting a lower F1 and a higher F2 for /ñ/ than /n+j/ (Pattern 4). Thus, the majority either align with the group pattern or produce a more fronted<sup>10</sup> FV in /ñ/ than /n+j/ 11. Figures 8 and 9 present the proficiency and dominance scores by grouped spline pattern to visualize any potential relationships between formant contour patterns and proficiency and dominance scores.

**Figure 8.** Smoothing Spline pattern type by Spanish proficiency score.

Upon examination of Figures 8 and 9, it does not seem to be the case that either Spanish language proficiency or language dominance as measured in this study can explain the individual patterns. That is, it does not appear that participants with lower Spanish proficiency scores differed from the group pattern more frequently than those with higher scores, or vice versa. Further, the three Spanish-dominant participants did not behave differently from the English-dominant participants. Thus, taking into account the individual analyses for both the FV duration and formant contours,

<sup>10</sup> As pointed out by an anonymous reviewer, another interpretation of the higher F2 values could be that these speakers are producing a more constricted dorsopalatal realization, given that dorsopalatal constriction narrowing and F2 are positively correlated.

<sup>11</sup> As evident in Appendix B, while the individual data fall into the four patterns, there is variation in the degree of spline overlap (i.e., acoustic distance). Without a principled way to quantitatively determine acoustic difference in the formant trajectories, however, we limit our discussion to the categorical patterns and include the individual SSANOVA for readers' reference.

there is no evidence that written Spanish language proficiency score nor language dominance score explain the observed individual variation. Why not? With respect to dominance score, there might not be sufficient variation between participants: recall that the BLP scale is from −218 (Spanish) to 218 (English); our participants' scores are concentrated between −22.7 and 88.6 (*M* = 43.56, *SD* = 35.35). Further, we only had three participants with scores on the Spanish side of the scale. Thus, it could be the case, given a wider range of dominance scores and a larger sample, that clearer patterns might emerge with respect to language dominance. Finally, given that dominance can be such an elusive construct, it is possible that a different proxy for dominance could reveal a relationship in the data that the BLP does not; Solis-Barroso and Stefanich (2019) evaluated a set of assessments that were completed by a single group of heritage Spanish bilinguals in Chicago and found different categorization patterns (dominant in Language A, dominant in Language B, "balanced") depending on the assessment, particularly when it came to bilinguals that fell close to the balanced point of a scale.

**Figure 9.** Smoothing Spline pattern type by dominance score.

For the Spanish proficiency score, unlike the dominance score, our participants show a wider range of scores from 23 to 46 (*M* = 35.50, *SD* = 7.80), and therefore one might expect to see a difference between those bilinguals with higher scores versus lower scores. However, in our case we see no such difference. Recall that proficiency was measured by a 50-item written measure, which we recognize is not ideal for several reasons, the most relevant of which are that (a) the focus of the study is speech production, and (b) a written assessment disadvantages heritage speakers without substantial formal education in the HL, particularly when the assessment is targeted at speakers of Peninsular Spanish rather than Mexican or US Spanish. Going forward, it would be ideal to employ a measure of oral proficiency (e.g., accent ratings, elicited imitation tests) or a variety of assessments that can be used to formulate a composite score or evaluated independently, such as the set of monologues, picture naming task, and vocabulary assessment used in Shea (2019). It could be the case that, with more direct measures of oral proficiency or a more global evaluation, patterns would more readily emerge in examining interactions between systems.<sup>12</sup>

<sup>12</sup> We also note that we only measured our participants' proficiency in the heritage language (here, Spanish), as our participants attended school and are dominant in the majority language (here English). Future research could also measure proficiency in the majority language in addition to that of the heritage language to see what patterns might surface.

A final consideration when accounting for the attested variation is that of perception. While we have measured variables that have been reported to acoustically distinguish /n+j/ and /ñ/, it remains to be verified whether these are the acoustic cues that these speakers attend to in the input, even in the cases in which the contour differs beyond the JND threshold. That is, it is wholly possible that these speakers (as a group, or individually) attend to different acoustic cues in the input and that these cues are what they use to distinguish in production, as well. Future research will examine the bilingual perception of the /ñ/ and /n+j/ in Spanish and English modes, comparing stimuli which vary in the cue(s) (single cues and combination of cues) that are manipulated and held constant. Once we know that these bilinguals perceive the difference between /ñ/ and /n+j/ and what cues they attend to in the input, an experiment can be designed to isolate production of those cues to confirm separate representations for /n+j/ versus /ñ/.

#### *4.4. Conclusions*

In this paper, we have examined heritage Spanish speakers' crosslinguistic production patterns of /ñ/ in monolingual Spanish mode and /n+j/ in monolingual English mode to determine whether their phonological systems interact in this scenario. At the group level, we did not find any evidence of interaction and concluded that this group of early Spanish/English bilinguals maintain separate representations for /ñ/ from /n+j/ based on measures of duration and formant trajectories of the following vocalic portion taken from /ñ/ data in Spanish mode and /n+j/ in English mode. Comparison with the L2 data suggests that age of acquisition is a likely predictor of interacting systems in this case, at least at the group level. We addressed individual variation in the sample via the relationships between duration and formant contours and dominance and proficiency, and did not find any clear explanatory trends. The next step in this line of investigation will thus be to determine why some bilinguals evidence formant trajectory patterns at the individual level that diverge from the group-level pattern. To that end, we highlighted the need to (a) replicate the study with a larger sample that spans a wider range of proficiency and dominance and (b) test perception to isolate the acoustic cues that are used to distinguish /ñ/ and /n+j/ in the input. It will also be valuable to directly compare the heritage data with non-heritage native speaker data in the HL to determine how the representations of these populations, who typically differ in dominance and input quantity/quality (among other factors), overlap. Finally, it will be of interest to determine whether language mode plays a role in the interaction of bilinguals' systems when it comes to these sounds. In the current study, we tested in monolingual modes in order to give participants the best chance possible of producing distinct segments. However, research on language mode in HS has shown that a bilingual versus monolingual mode in testing plays a role in both production and perception (e.g., Amengual 2018; Antoniou et al. 2012; Simonet and Amengual 2020). What would happen if were to test these HS in a bilingual mode (which is common for this community, in which participants are able to—and often do—codeswitch between the languages)? Would we see evidence of interaction? If so, what does that tell us about the nature of these representations? Ultimately, the triangulation of data from various bilingual profiles in monolingual and bilingual testing modes will lead us further towards the goal of a holistic understanding of the nature of interacting systems in the bilingual brain.

**Author Contributions:** Conceptualization, S.S. and J.C.; methodology, S.S. and J.C.; software, S.S. (for data), J.C. (for statistical analysis); validation, S.S. and J.C.; formal analysis, S.S. and J.C.; investigation, S.S. and J.C.; resources, J.C.; data curation, S.S.; writing—original draft preparation, S.S. and J.C.; writing—review and editing, S.S. and J.C.; visualization, S.S. and J.C.; supervision, S.S. and J.C.; project administration, S.S. and J.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We would like to thank David Abugaber for his assistance with the SSANOVA analyses as well as Leire Echevarria and Brian Rocca for their help with data collection. We also wish to thank Mark Amengual as the Guest Editor of this issue and the two anonymous reviewers for their valuable feedback.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Spanish Mode English Mode** Critical (C)CV.ña (Spanish) (C)CVn.ja (English) reña [reña] renya [ôEnj@] boña [boña] bonya [bαnj@] broña [bRoña] bronya [bôαnj@] droña [dRoña] dronya [dôαnj@] feña [feña] fenya [fEnj@] poña [poña] ponya [phαnj@] foña [foña] fonya [fαnj@] loña [loña] lonya [lαnj@] deña [deña] denya [dEnj@] beña [beña] benya [bEnj@] Control (C)CV.na bena [bena] benna [bEn@] dena [dena] denna [dEn@] lona [lona] lonna [lαn@] fona [fona] fonna [fαn@] pona [pona] ponna [phαn@] fena [fena] fenna [fEn@] drona [dRona] dronna [dRαn@] brona [bRona] bronna [bRαn@] quena [kena] renna [ôEn@] jona [xona] bonna [bαn@] Distractor nela [nela] talla [thæl@] neda [neðafl ] tamma [thæm@] dera [deRa] tulla [th2l@] gada [gaðafl ] bura [bÄ@] meba [meβa fl ] lekka [lEk@] bera [beRa] meppa [mEp@] doda [doðafl ] maffa [mæf@] bora [boRa] ponka [phαnk@] doba [doβa fl ] cromma [khôαm@] gora [goRa] neppa [nEp@] gera [geRa] zappa [zæp@] pada [paðafl ] ficka [fIk@] fala [fala] vatta [væR@] deda [deðafl ] virta [vÄR@] seba [seβa fl ] zanta [zænt@] poba [poβa fl ] thappa [θæp@] dola [dola] thurpa [θÄp@] teba [teβa fl ] drotta [dôαR@] dela [dela] vecka [vEk@] bada [baðafl ] stucka [st2k@]

**Table A1.** Stimuli.

*Languages* **2020**, *5*, 50

#### **Appendix B**

**Figure A1.** *Cont*.

**Figure A1.** (**a**) Pattern 1: individual splines that follow the group pattern. (**b**) Pattern 2: individual splines that differ in F1 from the group pattern. (**c**) Pattern 3: individual splines that differ in F2 from the group pattern. (**d**) Pattern 4: individual splines that differ in F1 and F2 from the group pattern.

#### **References**


Potowski, Kim. 2020. University of Illinois at Chicago, Chicago, IL, USA. Personal communication, August 10.

R Core Team. 2020. *R: A Language and Environment for Statistical Computing*. Vienna: R Foundation for Statistical Computing, Available online: https://www.R-project.org/ (accessed on 31 October 2020).


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **Redefining Sociophonetic Competence: Mapping COG Di**ff**erences in Phrase-Final Fricative Epithesis in L1 and L2 Speakers of French**

#### **Amanda Dalola 1,\* and Keiko Bridwell 2,\***

<sup>1</sup> Department of Languages, Literatures & Cultures, University of South Carolina, Columbia, SC 29208, USA

<sup>2</sup> Department of Linguistics, University of Georgia, Athens, GA 30605, USA

**\*** Correspondence: dalola@mailbox.sc.edu (A.D.); keiko.bridwell@uga.edu (K.B.)

Received: 9 September 2020; Accepted: 2 November 2020; Published: 12 November 2020

**Abstract:** This article presents a study of measures of center of gravity (COG) in phrase-final fricative epithesis (PFFE) produced by L1 and L2 speakers of Continental French (CF). Participants completed a reading task targeting 98 tokens of /i,y,u/ in phrase-final position. COG measures were taken at the 25%, 50% and 75% marks, normalized and submitted to a mixed linear regression. Results revealed that L2 speakers showed higher COG values than L1 speakers in low PFFE-to-vowel ratios at the 25%, 50%, and 75% marks. COG measures were then categorized into six profile types on the basis of their frequencies at each timepoint: flat–low, flat–high, rising, falling, rising–falling, and falling–rising. Counts of COG profile were then submitted to multinomial logistic regression. Results revealed that although L1 speakers produced predominantly flat–low profile types at lower percent devoicings, L2 speakers preferred multiple strategies involving higher levels of articulatory energy (rising, falling, rise–fall). These results suggest that while L1 speakers realize PFFE differently with respect to phonological context, L2 speakers rely on its most common allophone, strong frication, in most contexts. As such, the findings of this study argue for an additional phonetic dimension in the construct of L2 sociophonetic competence.

**Keywords:** sociophonetics; competence; fricative epithesis; vowel devoicing; center of gravity; French; acquisition

#### **1. Introduction**

Phrase-final fricative epithesis (PFFE), a phenomenon also known in the literature as phrase-final vowel devoicing (PFVD), refers to a well-attested phenomenon in Continental French (CF) in which breath group-final vowels lose their voicing and produce a short burst of high-frequency aperiodic energy, akin to a fricative, e.g., *mais oui\_hhh* [mεwi**ç**], *merci beaucoup\_hhh* [mεKsiboku**x**] (see Figure 1). The first linguistic description of this phenomenon described it as the emergence of "sharp, phrase-final whistles" (Fónagy 1989); subsequent research witnessed a split in nomenclature, with North American researchers often opting for a name focusing on voicing loss—"vowel devoicing/*dévoisement vocalique*" (Fagyal and Moisset 1999; Smith 2002, 2003, 2006; Martin 2004) and most European researchers preferring a name focusing on the emergence of the downstream fricative—"fricative epithesis/*épithèse (consonantique) fricative*" (Fagyal 2010; Candea 2012; Candea et al. 2013). Because the present study will focus on characterizing the spectral and durational qualities of the emergent fricative, we the (North American) authors have explicitly chosen to heed the call of our European predecessors in adopting the term "fricative epithesis" for this discussion.

**Figure 1.** PFFE on the spectrogram: *venu* 'came.' The PFFE corresponds to the final, highlighted segment—characterized by the lack of a voicing band on the spectrogram and aperiodic energy on the waveform—which follows the articulation of the vowel [y], distinguished by its full formant structure on the spectrogram and periodic energy on the waveform.

In the first description of PFFE in the literature, Fónagy hypothesized that not only did its characteristic phrase-final fricatives appear immediately following vowels that had lost a portion of their voicing band, but that he also suspected the fricatives themselves might correspond to the host vowel phonetically in terms of their backness dimension. Citing the *ich-Laut*/*ach-Laut* harmony phenomenon in standard German, in which the backness value of a voiceless fricative is selected by the backness value of its preceding vowel, he hypothesized that the fricatives epithesized after the high front vowels /i/ and /y/ in French would be more [ç]-like, i.e., front, than those appearing after high back /u/, which would be more [x]-like, i.e., back. This observation was corroborated by Dalola (2015a) who examined measures of center of gravity (COG) (average peak frequency) taken at the 1/4, 1/2 and 3/4 timepoints of PFFE fricatives produced by L1 CF speakers, and found evidence to suggest a three-way distinction in spectral energy at the first two timepoints; however, the spectral differences could not be characterized in terms of sheer [+/− back] and did not persist into the second half of the segment.

#### *1.1. Phonological Predictors of PFFE*

The best-studied dimension of PFFE is undoubtedly its phonological distribution. Originally described as occurring in high vowels (Fónagy 1989), subsequent studies documented the occurrence of PFFE in the full inventory of French vowels, including nasals (Smith 2006), but reported the highest rates of PFFE following the high vowels /i,y,u/ (Fagyal and Moisset 1999; Martin 2004; Smith 2003, 2006). When comparing reading passages, role-plays and impromptu conversation, PFFE has been found to occur at significantly higher rates in types of read, i.e., planned, speech (Fagyal and Moisset 1999; Dalola 2014), a finding that is perhaps explained by its higher rates of occurrence at the ends of both the intonation phrase and the declarative phrase (Fagyal and Moisset 1999; Smith 2003), where French sees the arrival of a low tone. Studies have also found an effect for the manner type of the preceding consonant, such that preceding stops condition PFFE at a significantly higher rate than more sonorous manner types, in addition to an effect for lexical frequency, which reports more frequent lexical items as more likely to exhibit the phenomenon than less frequent ones (Dalola 2015b).

#### *1.2. Social Predictors of PFFE*

The social distribution of PFFE presents a complex series of macro- and micro-group associations. Early work often described PFFE as occurring in the speech of women (Fónagy 1989; Fagyal and Moisset 1999; Smith 2006); however, later work has reported the variable to be used at similar rates among both men and women (Candea 2012; Candea et al. 2013; Dalola 2014). Fagyal and Moisset (1999), who took a categorical approach to age, found the variable at its highest rates among their youngest (16–35) and oldest (61–85) groups; Dalola (2014), who operationalized age continuously (testing ages 13–83), reported participants as more likely to use PFFE the older they were. From a socioeconomic standpoint, PFFE is often associated with the French middle class (*la bourgeoisie*) (Paternostro 2008; Fagyal 2010). Originally, the variable was associated with Parisians (Fagyal and Moisset 1999; Smith 2006; Fagyal 2010), though in recent years, it has also been documented in the speech of francophones from other metropolitan centers in France, namely Lyon and Strasbourg (Dalola 2014). Further afield, the variable has been described in the speech of French, Belgian and Canadian news anchors (Paternostro 2008; Candea et al. 2013); one study introduced intersectionality into this association by reporting it particularly among young, i.e., inexperienced, news anchors (Candea 2012). Despite the disagreement among social predictors, it is important to pursue research on the characterization of the PFFE variable.

#### *1.3. L2 Speakers and PFFE*

Given its salient phonetic energy and robust distribution among native francophone populations, it is somewhat unsurprising to learn that PFFE, despite its status as a sociophonetic variable, is also readily employed by L2 French speakers (Dalola and Bullock 2017). Investigating the nature of L1 and L2 PFFE as produced in different genres of speech, Dalola and Bullock (2017) revealed subtle but nuanced differences at every level of production. For rates of use of PFFE, L1 and L2 speakers performed similarly overall but were motivated by different genres of speech: L1 speakers used more PFFE in role-plays while L2s were more likely to use it when reading wordlists. In terms of duration of PFFE, or the proportional length of the epithesized fricative when compared to its host vowel, larger differences between speaker groups were documented: not only did L1 and L2 speakers produce PFFE segments that were statistically different in length (L1 PFFE length << L2 PFFE length), each group showed sensitivity to a different linguistic parameter: L1s produced longer PFFEs as a reaction to pragmatic shifts (indicated in the prompts to the role-plays), producing longer PFFEs in slower and formal speech, while L2s produced longer PFFEs as a reaction to task shifts, producing longer PFFEs in the wordlist task. Despite the various pragmatic and speaker group effects in this study, no effects were found in the participants for measures of gender or age.

#### *1.4. Perception of PFFE*

Differences in L1 versus L2 production of PFFE ushered in a rigorous examination of potential speaker group differences in the variable's perception. Dalola (2016; in progress) reports significant differences in L1 and L2 perceptions of PFFE, namely that L2 speakers perceive it as a positive marker indexing "formality" and "trustworthiness," whereas L1 speakers perceive it variably, sometimes as a positive marker indexing "admirability" and sometimes as a negative marker indexing "emotional affect." Using a matched guide design and exploratory factor analysis, a related form of principal component analysis that partitions out the shared variance of each variable from its unique and error variance to reveal the underlying factor structure (Osborne and Costello 2009), L2 participants in Dalola (2016; in progress) rated users of PFFE similarly for two separate groups of adjectives: *polite, well-educated, speaks clearly, speaks formally* (a category the author refers to collectively as traits of FORMALITY) and *confident, persuasive, I respect X, I trust X* (a category the author refers to collectively as traits of TRUSTWORTHINESS). L1 speakers also rated users of PFFE similarly for two separate groups of adjectives; however, the adjectival members of the groups were both more numerous and

compositionally different, with one group including the adjectives *well-educated, professional, speaks clearly, polite, intelligent, patient, confident, persuasive, I trust X, I respect X, I believe what X says, I would like to speak like X* (a category the author refers to collectively as traits of ADMIRABILITY), and the second including the adjectives *aggressive, bourgeois, superficial, bossy, native French speaker, speaks with emotion* (a category the author refers to collectively as EMOTIONAL AFFECT). It should be noted that all the traits that make up the L1 category of FORMALITY are also present in the L2 category of ADMIRABILITY, and that the reason for the difference in category name was due to the author's desire to assign names that applied to the full collection of adjectives. No gender effects were found for the voices being rated, however, there was a significant gender effect among those giving ratings, such that women were more likely to assign higher ratings overall.

#### *1.5. Motivation*

This article reports on production differences in spectral tendencies in PFFE among L1 and advanced L2 speakers of Continental French. Since PFFE is a sociophonetic marker in CF (Dalola 2014, 2016), it presents an interesting testing ground for comparing spectral values across native and non-native speakers. While previous work has reported production differences in rate and degree of devoicing between native and non-native French speakers (Dalola and Bullock 2017), it has yet to extend the comparison to investigate the phonetic quality of the variable emergent fricatives. Combined with the many known articulatory differences and false similarities in vowel production between French and English (the L1 of the non-native population in this and previous studies), it is reasonable to expect that articulatory issues may arise, even among advanced L2 speakers (Flege and Hillenbrand 1984; Flege 1985, 1987; among others). The goal of this study is, therefore, to examine and characterize the fricatives epithesized after devoiced vowels using measures of fricative-to-vowel ratio (FVR) (length of fricative divided by length of full vowel) and center of gravity (COG) (average peak frequency reached at designated timepoints during fricative segment). The COG measures will then be used to create a multipoint spectral profile for each fricative that will be classified into more general profile types capturing the overall increase, decrease or static tendency of energy during articulation. We will then use inferential statistics to examine the predictability of each profile type by speaker, fricative-to-vowel ratio and vowel type. After presenting the results unique to each profile type, we will compare the most common profile types produced by each speaker group in order to assess the nature of any significant spectral differences occurring between L1 and advanced L2 PFFE.

#### *1.6. Research Questions and Hypotheses*

The current study puts forth the following research questions:


Due to the exploratory nature of the study, predictions will not be offered for each of the research questions, as previous work has not yet diagnosed this aspect of the PFFE variable.

#### **2. Materials and Methods**

#### *2.1. Participants*

40 speakers of CF participated in the experiment, of which 31 were L1-French and nine L1-English advanced L2-French. All participants were recorded in Paris or Strasbourg in France or in the United States. Among the L1 participants, 23 were women and eight were men, ranging in age from 20 to 66 years (mean = 38.4 years). All L1 speakers were L2 speakers of English, having studied it formally for four or more years and using it in interactions once a week or more. Among the L2 participants, five were women and four were men, ranging in age from 27 to 58 years (mean 38.6 years). L2 speakers

were classified as "advanced" because they had all lived in France for at least two years, had prepared or were preparing an upper-level degree in French and used French regularly in their careers. All L2 speakers were L1 speakers of American English. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the University of South Carolina Institutional Review Board (USC IRB).

#### *2.2. Stimuli*

Inspired by the task and pragmatic effects findings of Dalola and Bullock (2017) and several studies' reports of PFFE's robustness among news anchors reading off teleprompters (Paternostro 2008; Candea 2012; Candea et al. 2013), participants were asked to complete a reading task that consisted of 106 single sentences containing 98 phrase-final tokens of /i,y,u/, occurring after all licit (C)C(C) onset sequences in one- to three-syllable real words in French (see Table 1 for a breakdown of consonant environments).



#### *2.3. Procedure*

Participants were presented with sentences one at a time on a MacBook Pro via Microsoft Powerpoint and told to read each one aloud, imagining they were reading a story to a native francophone listener. Participants were instructed to read each sentence twice and to repeat any trials from the beginning in the event of a disfluency. As they read aloud, participants were recorded via a head-mounted unidirectional cardioid microphone (SHURE WH20) plugged into a solid-state digital recorder (Marantz PMD 660) digitized at 44.1 kHz (16 bit). The task was completed under the direction of the L1-English advanced L2-French researcher; it was self-paced and participants were given as much time as they needed to complete it.

#### *2.4. Acoustic Measurements*

From the resulting recordings, target vowels were identified, delimited, and labeled manually in Praat based on the spectrographic and time displays, beginning at the onset of voicing and formant structure and ending with the end of formant structure. Instances of PFFE were counted as part of the vowel and were included in the overall duration (measured in milliseconds). Each target vowel was then inspected for the presence of PFFE, which was labeled and measured for duration on a separate tier. A derived measure of the *fricative-vowel ratio (FVR)* was calculated by dividing the length of frication by the length of the full vowel, i.e. the vocalic portion plus frication, as illustrated in Figure 2.

**Figure 2.** The segmentation of *mie* 'crumb.' With a full-vowel length of 287 ms and a fricative length of 171 ms, the FVR of this token is 59.6%.

A script targeting the devoicing tier divided each instance of PFFE into quartiles, labeling timepoints at the 25%, 50%, and 75% marks; a subsequent script measured center of gravity (COG) at each of these points (Erker 2010).

To control for effects of variation in vocal tract length, COG values were normalized (Shadle and Mair 1996), according to a technique adapted from Toda (2007), as shown in (1):

$$\text{COG}\_{\text{raurn}} = \text{s}\_{\text{i}} \times \text{COG}\_{\text{i}} \tag{1}$$

in which the speaker-dependent coefficient si was calculated by (2):

$$\mathbf{s}\_{\mathbf{i}} = \mathbf{C} \mathbf{O} \mathbf{G}\_{\text{avg}} / \mathbf{C} \mathbf{O} \mathbf{G}\_{\mathbf{i}} \tag{2}$$

where COGi refers to the average COG value of participant i, and COGavg refers to the average COG value across all participants. Henceforth, COG will be used to refer to this normalized center of gravity variable, and Hz to the normalized units used to quantify COG.

#### **3. Results**

#### *3.1. COG Statistical Treatment*

Out of 7942 tokens, participants produced a total of 4995 instances of vowels exhibiting PFFE, which formed the corpus for subsequent analysis.

Statistical analyses of COG were conducted in the statistical tool R (R Core Team 2017). Using *lmer()* from the package *lmerTest* (Kuznetsova et al. 2017), a mixed-effects linear regression model was performed for each timepoint, with *COG* as the dependent variable; *vowel*, *speaker group*, and *FVR* as independent variables; and *participant* treated as a random effect. Visualizations were generated using effects data from the package *e*ff*ects* (Fox 2003; Fox and Weisberg 2019).

#### *3.2. COG Results*

**Timepoint 1 (25%).** The full model for COG of PFFE at Timepoint 1 is pictured in Table 2. There was an interaction effect between *vowel* and *FVR*, such that higher PFFE-to-vowel ratios corresponded to higher COG values, with the identity of the vowel strongly affecting the rate of increase. As shown in Figure 3, the three vowels showed similar COG values as FVR approached 0%, but exhibited strong differences as FVR increased, with /i/ having a stronger rate of increase than /y/ or /u/. /y/ also showed slightly higher COG values than /u/ across all FVRs.


**Table 2.** Mixed-effects linear regression model for COG at Timepoint 1.

*p*-values: *p* < 0.1(.), *p* < 0.05 \*, *p* < 0.01 \*\*, *p* < 0.001 \*\*\*, *p* < 0.0001 \*\*\*\*.

**Figure 3.** COG by vowel and FVR at all timepoints.

An interaction was also observed between *speaker group* and *FVR*, as visualized in Figure 4, such that L2 speakers exhibited higher COG values than L1 speakers at all FVRs, but to a greater extent at lower percentages.

**Figure 4.** COG by speaker group and FVR at all timepoints.

No significant interaction was observed between *vowel* and *speaker group*. However, there was a main effect for *speaker group*, such that L2 speakers produced PFFE 547.06 Hz higher than L1 speakers. There was also a main effect for *vowel*, such that after all interactions were taken into account, the COG values of /y/ were 402.4 Hz and 424.3 Hz higher than the COG values of /i/ and /u/, respectively (although with respect to the raw data, COG values of /i/ were significantly higher than those of /y/ and /u/). The intraclass correlation coefficient (ICC) of 0.23 suggests low levels of similarity between measurements in the same group, indicating high variability.

**Timepoint 2 (50%).** A second mixed-effects linear regression model was fit for COG at Timepoint 2, with similar results, as pictured in Table 3. As at Timepoint 1, there was an interaction effect between *vowel* and *FVR*, shown in Figure 3, such that higher PFFE-to-vowel ratios corresponded to higher COG values, with more pronounced differences between COG values at 0% and 100% for /i/ than /y/ or /u/.


**Table 3.** Mixed-effects linear regression model for COG at Timepoint 2.

*p*-values: *p* < 0.1(.), *p* < 0.05 \*, *p* < 0.01 \*\*, *p* < 0.001 \*\*\*, *p* < 0.0001 \*\*\*\*.

There was also a similar interaction between *speaker group* and *FVR*, such that L2 speakers exhibited higher COG values than L1 speakers at low FVRs (see Figure 4).

No significant interaction was observed between *vowel* and *speaker group*. However, there was a main effect for *speaker group*, such that L2 speakers produced PFFE 619.32 Hz higher than L1 speakers. There was also a trending main effect for *vowel*, such that after all interactions were taken into account,

the PFFE of /y/ was produced 253.95 Hz higher than the PFFE of /i/. The ICC measure of 0.184 suggests low levels of similarity between measurements in the same group, indicating high variability.

**Timepoint 3 (75%).** A final model was fit for COG at Timepoint 3, with similar results to those at Timepoint 1 and Timepoint 2, as pictured in Table 4. As at the previous two time points, there was an interaction effect between *vowel* and *FVR* (see Figure 3), such that higher PFFE-to-vowel ratios corresponded to higher COG values, with more pronounced differences between COG values at 0% and 100% for /i/ than /y/ or /u/.


**Table 4.** Mixed-effects linear regression model for COG at Timepoint 3.

*p*-values: *p* < 0.1(.), *p* < 0.05 \*, *p* < 0.01 \*\*, *p* < 0.001 \*\*\*, *p* < 0.0001 \*\*\*\*.

There was also a similar interaction between *speaker group* and *FVR*, such that L2 speakers exhibited higher COG values than L1 speakers at low FVRs (see Figure 4).

A trending interaction was observed between *vowel* and *speaker group* for /u/, such that L1 speakers produced higher PFFE than L2 speakers. A main effect for *speaker group* was also present, such that L2 speakers produced PFFE 467.45 Hz higher than L1 speakers. Finally, there was a main effect for *vowel*, such that after all interactions were taken into account, the PFFE of /y/ was produced 331.13 Hz higher than the PFFE of /i/. The ICC measure of 0.133 suggests low levels of similarity between measurements in the same group, indicating high variability.

**All timepoints.** As shown in Figure 3, COG values exhibited a tendency to "level out" over time. At high FVRs, the average COG decreased from Timepoint 1 to Timepoint 3; at low FVRs, the average COG increased slightly over time.

#### *3.3. Profile Creation*

While COG values appeared to decrease over time, particularly when PFFE made up a larger proportion of the vowel, this observation was based on aggregate data, not the progression of fricative quality within individual tokens. To investigate this more granularly, a variable combining the three timepoints into a single contour was developed, which will subsequently be referred to as *profile*.

The variable of *profile* was operationalized according to the following procedure. Normalized COG values were first categorized by binning the data: since 98.7% of the data fell under 6000 Hz, the range of frequencies was equally split into levels *Low* (0–2000 Hz)*, Medium* (2000–4000 Hz)*,* and *High* (4000+ Hz). For each token, the three timepoints were then combined to form a three-letter profile designation describing the COG pitch over the course of the frication (e.g., HML for a token progressing from high, to medium, to low).

The resulting 27 designations (LLL, LLM, LLH, etc.) were subsequently divided into profile types based on the overall shape which they represented: *flat*, *rising*, *falling*, *rise–fall*, and *fall–rise*. Since a different type of PFFE (vowel devoicing producing only voicing loss, as opposed to vowel devoicing producing only fricative epithesis) appeared to be represented by LLL, LLL was separated from the other members of the "flat" level and labeled as "flat–low." The resulting six levels were used

in all subsequent analyses. The correspondence between letter designations and profile types is shown below, in Table 5.


**Table 5.** Profile categorization.

#### *3.4. Profile Statistical Treatment*

All statistical analyses of *profile* were conducted in R (R Core Team 2017). Chi-square tests of the relationship between *profile* and each of the independent variables *vowel, speaker group,* and *FVR* were performed using *chisq.test().* Using *multinom()* from the package *nnet* (Venables and Ripley 2002), a multinomial logistic regression model was also performed, with *COG* as the dependent variable and *vowel, speaker group,* and *FVR* as independent variables. Visualizations were generated using effects data from the package *e*ff*ects* (Fox and Hong 2009; Fox and Weisberg 2019).

#### *3.5. Profile Results*

In order to determine whether the different vowels favored different profile types, a chi-square test was conducted on the variables of *profile* and *vowel*. This revealed a significant effect for vowel on profile type (χ2(10) = 1555.2, *p* < 0.0001). As shown in Table 6, the PFFE of /i/ was most frequently realized with a high, flat COG production, or a COG production that began at high values and decreased throughout the course of the vowel. /y/ and /u/, on the other hand, were most often realized with a low, flat PFFE production. This was particularly true for /u/, which was categorized as flat–low 84.8% of the time. The results of the chi-square test are visualized via correlation plot in Figure 5.


**Table 6.** Profile distribution by vowel, expressed in % of each vowel.

To determine whether L1 and L2 speakers of French favored different profile groups, another chi-square test was conducted on *profile* and *speaker group*. This test revealed a significant effect for speaker group (χ2(5) = 29.104, *p* < 0.0001), such that while both groups used a low, flat frication profile more frequently than any other type, L1 speakers used a flat–low profile more often than L2s, and L2 speakers used a flat–high profile more frequently than L1s (see Table 7). All other profile types appeared to be roughly equal across speaker groups. The results of the chi-square test are visualized in a correlation plot in Figure 6.

**Figure 5.** Correlation plot showing residuals of profile type and vowel. Black circles represent positive correlation between variable level pairs, white circles represent negative correlation, and circle radius represents correlation strength.

**Table 7.** Profile distribution by speaker group, expressed in % of each group.

**Figure 6.** Correlation plot showing residuals of profile type and speaker group.

In order to conduct a chi-square test on *FVR*, its values were binned into five categories with a range of 20%, as shown in Table 8. This test revealed a significant effect for FVR (χ2(20) = 665.92, *p* < 0.0001). When PFFE made up 60% or less of the vowel, flat–low profiles made up the majority of tokens. It was still the most common profile type at 60–80% percent PFFE, but flat–high and falling were also frequent; at 80–100% PFFE, flat–high was the most common profile type. The results of the chi-square test are visualized in a correlation plot in Figure 7.


**Table 8.** Profile distribution by FVR, expressed in % of each bin.

**Figure 7.** Correlation plot showing residuals of profile type and FVR.

Results from a multinomial logistic regression, shown in Table 9, revealed significant interactions between all three variables. First, an interaction between *vowel* and *speaker group* was present, as shown in Figure 8, such that L1 and L2 speakers showed different distributions of profile type for /i/: L2 speakers showed lower rates of flat–low and higher rates of flat–high and falling, two profile types characterized by high initial energy, relative to L1 speakers. Additionally, L2 speakers utilized more rising tokens than L1 speakers for /y/, indicating that they started off at a low frequency but increased the intensity over the course of PFFE to approximate the high-energy fricative.


**Table 9.** Multinomial logistic regression model for profile.

**Figure 8.** Profile types by vowel and speaker group.

An interaction between *vowel* and *FVR* was also present, such that as FVR increased, the distribution of profile types changed for /i/ and /y/, as shown in the conditional density plot in Figure 9. Starting at approximately 50% FVR (indicated in the figure by a dotted line), flat–low tokens greatly decreased and flat–high tokens greatly increased for /i/, and the proportion of flat–high and falling tokens slightly increased for /y/.

Finally, an interaction was present between *speaker group* and *FVR*, as shown in Figure 10. For FVRs ranging from approximately 40 to 100%, L1 and L2 speakers showed similar profile type distributions. From 0 to 40%, however, L1 speakers predominantly used a flat–low profile, while L2 speakers exhibited much greater variation.

**Figure 9.** Profile types by vowel and FVR. Each plot shows the proportion of PFFE tokens belonging to each of the profile types at that combination of vowel and FVR. The dotted line at 50% indicates the approximate point at which the distributions of profile types for /i/ and /y/ shifted.

**Figure 10.** Profile types by speaker group and FVR. The dotted line at 40% indicates the approximate point at which the distribution of profile types shifted.

#### **4. Discussion**

#### *4.1. Vowel Findings*

The interaction between *vowel* and *FVR* revealed that all high vowels in French do not exhibit PFFE uniformly in terms of its proportion relative to the length of the host vowel. Specifically, the spectral energy in PFFE occurring after vowels /i/ and /y/ showed variable acoustic behavior when the vowel was devoiced to a degree of 50% or more: for /i/, flat–low tokens markedly decreased and flat–high tokens markedly increased, while for /y/, flat–high and falling tokens slightly increased, although flat–low continued to account for a high proportion of tokens. This suggests several acoustic tendencies for PFFE in French. Firstly, the stark decrease in flat–low tokens and increase in flat–high tokens for PFFE in /i/ is indicative of higher-frequency, i.e., more salient, energy being associated with PFFE after /i/ with FVRs between 50 and 100%. When compared to the other lower-frequency, i.e., less salient, COG profiles of other vowels in the high vowel series, the height at which this phenomenon is phonologically most robust, it becomes clear that /i/ devoiced in such a way and to such an extent may be the canonical realization of the PFFE sociophonetic variable for L2 speakers (Dalola and Bridwell 2019). This an important assumption to make when considering L2 speakers' production and distribution of the variable throughout because it may serve as a sort of underlying representation of the phenomenon, which L2 speakers use as a default following other more marked/phonetically dissimilar vowels across French and English where articulatory differences may arise (Delattre 1964). Additionally, the slight increase in flat–high and falling tokens in PFFE in /y/ with FVRs between 50% and 100% is also indicative of a preference for higher-frequency, i.e., more salient, energy, both throughout the segment, as in the case of flat–high, and at the onset of the segment, in the case of falling, but is less notable given that the overall increase is smaller.

#### *4.2. Speaker Group Findings*

The interaction between *vowel* and *speaker group* revealed that L1 and L2 French speakers do not realize PFFE uniformly in terms of vowel type. Specifically, the spectral energy in PFFE occurring after vowels /i/ and /y/ showed variable acoustic behavior across speaker group: for /i/, L2 speakers used lower rates of flat–low and higher rates of flat–high and falling, two profile types characterized by high initial energy when compared to L1 realizations, while for /y/, L2 speakers employed more rising tokens than L1s, indicating a build in intensity throughout the course of PFFE to approximate a higher-energy fricative. Both of these findings, when considered in concert with the vowel findings from this study, seem to suggest that L2 speakers may be hyperarticulating PFFE after vowels /i/ and /y/, perhaps as a reflection of how they how understand the phenomenon to sound as a stand-alone segment, i.e., as it does in its most salient form—after vowel /i/ with an FVR between 50 and 100%. Given the relatively lower-energy profiles found in L1 PFFE occurring after these same vowels, the patterning of this phenomenon across speaker groups can be nicely accounted for via Lindblom (1990) H & H Theory, in which speakers vary articulatory clarity according to the information needs of their listener. In this account, the L1 speakers, influenced by phonological context and without a need to communicate sociophonetic information to their unknown, imagined L1 listener (Bell 1984), manifest a sort of hypospeech that is underarticulated and focused on rendering the speech just intelligible enough to be recognized, while L2 speakers, wanting to accurately realize the text but also wishing to signal to their L1 listener their awareness of PFFE as a sociophonetic marker of polished French, manifest a sort of hyperspeech that not only renders optimally intelligible phonemic articulations but overemphasizes certain phonetic features at the expense of maximum articulatory effort. This theory is supported by the previous work examining L1 versus L2 perceptions of the variable, in which it was found that L2 speakers construe PFFE as being associated with features of TRUSTWORTHINESS and FORMALITY (Dalola 2016; in progress), the second of which has notable social capital for advanced L2 speakers using the L2 in their daily and/or professional life. Similar sociophonetic behavior has been found

in white Southern Americans using hyperarticulated [hw] to index educatedness (Bridwell 2019), a phonetic behavior characterized by increased duration of the fricative portion of the segment.

The interaction between *FVR* and *speaker group* revealed that L1 and L2 French speakers do not exhibit PFFE uniformly in terms of its proportion relative to the length of the host vowel. Specifically, the spectral energy in PFFE occurring at different FVRs showed variable acoustic behavior across speaker group: for FVRs ranging from approximately 40 to 100%, L1 and L2 speakers showed similar profile type distributions; for FVRs ranging from 0 to 40%, however, L1 speakers predominantly used a flat–low profile, while L2 speakers exhibited much greater, i.e., higher frequency, more salient, variation. This finding observes another instance of L1 speakers favoring low-energy profiles in the articulation of PFFE, while L2 speakers exhibit more and different higher-energy profiles. This is not only consonant with theH&H Theory (Lindblom 1990) offered above but also highlights the greater degree of variation among the L2 population, which may be suggestive of varied exposure times and degrees of involvement in L1 communities where the use of sociophonetic variables, including PFFE, are robust (Dalola and Bullock 2017).

#### *4.3. Implications for Sociophonetic Competence*

In light of the findings from this study, we are now able to add an additional parameter to the definition of "sociophonetic competence," as laid out by Dalola and Bullock (2017). Previous work on the sociophonetic variable of PFFE in CF has demonstrated that it is not enough for L2 speakers to have awareness of a sociophonetic variable in their L2 for them to use it at similar rates or durations as their L1 counterparts, or even in the same types of pragmatic and phonological contexts. This study has instead identified an additional dimension of L2 mastery, namely that of phonetic quality of use. Such a mastery at the level of production would also imply a heightened sensitivity to the perception of these sound variations, affording speakers the ability to decode an additional layer of meaning in an L2. Previous sociophonetic work examining COG has demonstrated its ability to work in concert with other parameters to index information about the speaker. Zimman (2017) found COG measures to be a marker of masculine voices when considered alongside f0, while Dalola and Bridwell (forthcoming) found COG measures in conjunction with measures of intensity (loudness), to be a marker of L1- or L2-French speaker status. Taken together, it is possible that COG values in French PFFE may not only be indicative of speaker group status, but also constructs of gender.

Whereas the acoustic energy of PFFE realizations seems to vary allophonically for L1 French speakers, that is to say, as predicted purely by phonological context and ease of articulation, it seems to vary sociophonetically and pragmatically for L2 speakers, that is to say, as conditioned by the desire for speakers to signal their sociophonetic awareness to native listeners at structurally and pragmatically acceptable moments.

#### *4.4. Future Directions*

Future studies will sample advanced L2 French populations more robustly and subdivide their level of advancedness via quantitative measures, i.e., the Bilingual Language Profile (Birdsong et al. 2012). This study originally benefited from a more balanced sample across speaker groups, but we later elected to filter out individuals from the L2 category because they did not meet our most stringent criteria (they had not lived abroad in a francophone country for a period of 2 or more years). In restricting the L2 group to the most "advanced" of our sample, we hoped to get the clearest picture of whether or not there were any speaker group differences, however, in doing so ended up diminishing some of our predictive power. In addition, we propose that the current findings be tested via a series of perceptual studies that investigate the pragmatic values of PFFE with differing COG measures in both L1- and L2-French populations. In that way, we can isolate what phonetic components of PFFE contribute reliably to perceptual differences and which ones represent mere physiological variation. A subsequent analysis should compare the COG-motivated perceptual differences that exist between L1 and L2-French speakers in order to test the production findings presented in this analysis. Additionally,

since COG (a common spectral moment used to diagnose fricatives) was found to be a meaningful descriptor and predictor for fricatives epithesized after phrase-final vowels, it stands to reason that the other spectral moments (standard deviation, skewness and kurtosis) may also be relevant metrics in characterizing the PFFE variable.

#### **5. Conclusions**

The present study has investigated spectral production differences in PFFE as produced by L1 and L2 speakers of French. It has suggested that, even at advanced levels of proficiency and similar rates of use, L2 users do not necessarily realize or distribute the subphonemic properties of sociophonetic variables in nativelike or consistent ways. Future research would do well to query the number and nature of these socially-conditioned subphonemic variables even more (Dalola and Bridwell, forthcoming; Dalola, forthcoming), with the goal of ultimately testing the percepts of their variable forms among both speaker populations.

**Author Contributions:** Conceptualization, A.D.; methodology, A.D.; software, K.B.; validation, A.D. and K.B.; formal analysis, A.D. and K.B.; investigation, A.D.; resources, A.D. and K.B.; data curation, K.B.; writing—original draft preparation, A.D. and K.B.; writing—review and editing, A.D. and K.B.; visualization, K.B.; supervision, A.D.; project administration, A.D.; funding acquisition, NA. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

Bell, Allan. 1984. Language style as audience design. *Language in Society* 13: 145–204. [CrossRef]


Paternostro, Roberto. 2008. Le dévoisement des voyelles finales. *Rassegna italiana di Linguistica applicata* 3: 129–58. R Core Team. 2017. *R: A Language and Environment for Statistical Computing*. Vienna: R Found. Stat. Comput.


Zimman, Lal. 2017. Gender as stylistic bricolage: Transmasculine voices and the relationship between fundamental frequency and/s. *Language in Society* 46: 339–70. [CrossRef]

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **A Phonetic Account of Spanish-English Bilinguals' Divergence with Agreement**

**Laura Colantoni 1,\*, Ruth Martínez 1, Natalia Mazzaro 2, Ana T. Pérez-Leroux <sup>1</sup> and Natalia Rinaldi <sup>1</sup>**


Received: 17 September 2020; Accepted: 2 November 2020; Published: 11 November 2020

**Abstract:** Does bilingual language influence in the domain of phonetics impact the morphosyntactic domain? Spanish gender is encoded by word-final, unstressed vowels (/aeo/), which may diphthongize in word-boundary vowel sequences. English neutralizes unstressed final vowels and separates across-word vocalic sequences. The realization of gender vowels as schwa, due to cross-linguistic influence, may remain undetected if not directly analyzed. To explore the potential over-reporting of gender accuracy, we conducted parallel phonetic and morphosyntactic analyses of read and semi-spontaneous speech produced by 11 Monolingual speakers and 13 Early and 13 Late Spanish-English bilinguals. F1 and F2 values were extracted at five points for all word-final unstressed vowels and vowel sequences. All determiner phrases (DPs) from narratives were coded for morphological and contextual parameters. Early bilinguals exhibited clear patterns of vowel centralization and higher rates of hiatuses than the other groups. However, the morphological analysis yielded very few errors. A follow-up integrated analysis revealed that /a and o/ were realized as centralized vowels, particularly with [+Animate] nouns. We propose that bilinguals' schwa-like realizations can be over-interpreted as target Spanish vowels. Such variable vowel realization may be a factor in the vulnerability to attrition in gender marking in Spanish as a heritage language.

**Keywords:** Spanish-English bilinguals; gender; vowels; vowel centralization; vowel sequences

#### **1. Introduction**

Heritage speakers, i.e., bilinguals who were raised with a home language different from the dominant language of their communities, exhibit various patterns of divergence and cross-linguistic influence. Studies of heritage speakers of Spanish document difficulties with gender agreement and concord for both adults (e.g., Montrul et al. 2008) and children (Gathercole 2002; Montrul and Potowski 2007; Morgan et al. 2013), which may remain at the level of performance (Montrul et al. 2014) or could lead to internal restructuring of the featural system, as proposed by Scontras et al. (2018) and Cuza and Pérez-Tattam (2016). Attrition of the gender system increases across bilingual generations (Martínez-Gibson 2011) and during school years (Cuza and Pérez-Tattam 2016). Given the frequency of the configuration under consideration (a determiner followed by a noun), and the morphological transparency and robustness of the Spanish agreement system, one might speculate that input factors cannot fully account for the vulnerability of this domain beyond lexical errors with low-frequency, gender-opaque nouns. One thing worth keeping in mind is that analyses of bilingual gender consider only some morphosyntactic or lexical factors and leave aside the potential explanatory value of other

domains. We turn to the perception-production interface to explore an alternative explanatory source to current accounts of bilingual vulnerability of Spanish gender.

Spanish agreement is primarily encoded by three vowels (/aeo/) appearing in unstressed word-final position, a context where English neutralizes vocalic contrasts. Word-final vowels followed by other vowels in the word-initial position are another locus of the differences between English and Spanish: English has a tendency towards separating these sequences via insertion of glottal stops or pauses (Davidson and Erker 2014), whereas Spanish diphthongizes or fuses vowels (e.g., Aguilar 2010). Is it possible, then, that heritage bilinguals' realization of Spanish vowels in absolute final position and in sequences introduces what Scontras et al. (2015) dubbed "incipient changes" in the input that could eventually trigger changes in gender marking in subsequent learners' cohorts? Is it also possible that linguists working on bilingual morphosyntax may have overlooked differences in the realization of these vowels when coding bilinguals' speech?

Our overarching goal is not to provide a definitive answer, but to probe preliminary data on these two questions. We conduct analyses of read speech and narratives produced by adult early and late bilinguals (defined in terms of onset of exposure prior to vs. after adolescence, respectively). In our approach, we use phonetic analyses to provide a full characterization of how phonetic variability could potentially impact heritage language acquisition, followed by an exploration of how various lexical and morphosyntactic properties associate with gender realization, and finally, we combine phonetic analyses with accuracy ratings. Thus, we will first phonetically analyze word-final unstressed vowels to compare our results with previous studies (see Section 1.2) that found that early bilinguals centralize vowels when compared to late bilinguals and monolinguals. Second, we will analyze the noun phrases extracted from the narratives to determine the distribution of patterns of morphological realizations in bilinguals to test whether early bilinguals differ from late bilinguals and monolinguals in their rate of errors with gender marking and concord (i.e., gender matching to related morphosyntactic categories). Third, and this is the key contribution of our study, we will integrate our phonetic and morphosyntactic results, arguing that the findings cast some doubt on our capacity of reporting agreement accuracy only based on the researcher's auditory transcriptions. Finally, we will use the insights extracted from this dataset to propose a modular interaction hypothesis, which claims that language contact-related variability in one domain (phonetics-phonology) can have important consequences for the acquisition of other domains (grammar). The testing of this latter hypothesis is a long-term goal of our research team and involves the design of an acquisition study that controls for the perception-production of the target vowels and the comprehension and production of agreement and concord. Before we delve into the analysis of our corpus, we pair previous findings from studies on the acquisition of gender with those of phonetic studies on the acquisition of vowels in Spanish-English bilinguals.

#### *1.1. Acquisition of Gender in Spanish-English Bilinguals*

Spanish encodes agreement as word-final affixes in the nominal system. Nouns and their dependents (determiners, adjectives, and pronouns) are marked (whether visibly or not) for gender. Gender marking can either have semantic value, as in *niña*/*niño*, 'girl/boy' or simply be a formal word-marker, as in *pala*, 'shovel' vs. *palo*, 'stick'. Spanish gender is considered transparent as most nouns overtly mark gender by means of the canonical vowel suffix. A few nouns do not directly reflect the gender feature: those ending in consonants, other vowels, or in mismatching vowels.

Monolingual and bilingual children show different patterns of development. Monolingual children learn noun agreement in spontaneous speech by the age of 2;0 (Lopez Ornat 1997; Snyder et al. 2001). Later gender-assignment errors (as identified by the gender of the article; e.g., \**la mapa*; 'the-fem map') reflect the lack of lexical knowledge of the gender of opaque nouns and expresses biases towards phonological gender cues (Pérez-Pereira 1991). Errors in concord (matching gender across the noun phrase) are negligible in the preschool years (Castilla and Pérez-Leroux 2010). Bilingual children reach mastery much later (Barreña 1997; Eichler et al. 2013; Larranãga et al. 2012), and early infant speech in bilinguals does not show the harmonic prenominal vowel patterns used by monolingual infants

before they fully acquire determiners (Kuchenbrandt 2005;)e.g., a vaca, '(the-fem cow'). Elicited data allow a direct comparison of attainment of noun-adjective agreement in monolingual and bilingual communities. Bedore and Leonard (2001) tested Mexican American five-year-olds in San Diego. These children, described as having minimal access to English, showed noun-adjective agreement accuracy at 76–91%. In contrast, same-age children tested in Mexico by Grinstead et al. (2008) were at ceiling. Morgan et al. (2013) found that five-year-old US bilinguals displayed few, but significantly more, gender substitution errors than their monolingual counterparts. In contrast, Cuza and Pérez-Tattam (2016) presented school-aged children with morphologically opaque nouns. Bilinguals produced many more errors than same-age monolinguals and were delayed in both their knowledge of gender assignment and of concord. Beyond noting a range of patterns of performance across different bilingual populations, some studies link exposure to Spanish to fewer gender errors (Gathercole 2002). For older children, Montrul and Potowski (2007) found differences between monolinguals and bilinguals, and between simultaneous and sequential bilinguals.

More recent studies (Goebel-Mahrle and Shin 2020), however, failed to find differences between monolingual and bilingual heritage speakers, independently of their age, which ranged between 5 and 11. This was also the case in the word-repetition task included in Montrul et al. (2014) study. Although adult Spanish heritage speakers and L2 learners showed differences from the monolingual comparison group in the other two tasks in the study (a gender monitoring task and a grammaticality judgement task), they did not differ from monolinguals in the word-repetition task. While acknowledging that bilinguals are not a uniform group, we note that the results of the word-repetition task might not necessarily indicate a higher accuracy in production by bilinguals, but in the limitations of auditory transcriptions instead. From a performance point of view, gender is often highly predictable in context. If bilinguals produce a schwa-like vowel rather than the underlying vowel, a transcriber might analyze the item as correct, independently of the quality of the target vowel. A schwa-like vowel insertion allows to maintain the syllable structure of the target word, and introduces acoustic ambiguity. In a predictable context, the transcriber perceives a vowel that cannot be clearly interpreted as a non-target vowel (e.g., *caperucita rojo*, 'Little Red Riding HoodFEM redMASC') and transcribes the vowel as [a], in what can be interpreted as a case of *in dubio pro reo*. Why do we think this is the case? Because the literature that we will review below consistently shows that Spanish-English bilinguals exhibit clear patterns of centralization of their vocalic space. We strongly believe that pairing up the phonetic literature on the acquisition of vowels by Spanish-English bilinguals with the findings from the literature on the acquisition of gender by the same group of bilinguals should at least make us wonder if the findings are compatible.

#### *1.2. Acquisition of Vowels in Spanish-English Bilinguals*

Spanish vowels (/aeiou/) vary little in quality and remain contrastive in stressed and unstressed positions (Hualde 2014). Although the literature reports variation in duration (e.g., Delattre 1965) and, to a lesser extent, in quality (Romanelli et al. 2018), Spanish does not show centralization to the extent of English (Navarro Toma's 1970, p. 43). Spanish vowels are learned early, typically by the age of 2;0 (Goldstein and Pollock 2000; Schnitzer and Krasinski 1994). This contrasts with English, where infants begin with a centralized vowel space that expands over time (Gilbert et al. 1997; Kent and Murray 1982; Rvachew et al. 1996). Vowel space differentiation is reported for eighteen-month-olds (Rvachew et al. 2006), but the full English vocalic inventory is mastered later, by 3;0 (Stoel-Gammon and Sosa 2008; Stoel-Gammon and Pollock 2009). This is to be expected, given that English has a large vocalic inventory in the stressed position (Ladefoged 2001). In the unstressed position, the only frequent vowel is schwa [@] (Rogers 2000). Thus, the unstressed vowel inventory is smaller in English than in Spanish (Hualde 2014).

Studies on the acquisition of Spanish vowels by bilinguals (Ronquest and Rao 2018) report consistent effects of cross-linguistic influence. Bilingual children centralize unstressed vowels (Gildersleeve-Neumann et al. 2009; Menke 2010),1 although it is debated whether centralization happens before or after children enter the school system (Gildersleeve-Neumann et al. 2009). Differences in vowel realizations persist into adulthood (Rogers 2012; Ronquest 2016; Willis 2005). Cross-linguistic effects are also reported in perception. Studies of English learners of Spanish suggest late bilinguals tend to confuse high-front with mid-front vowels (/i/ with /e/) and back-mid with low vowels (/o/ with /a/) (Morrison 2003; Morrison 2006). Unlike monolinguals, bilinguals rely on frequency cues rather than on duration (Fox et al. 1994). One perception study on adult heritage Spanish speakers (Mazzaro et al. 2016) found differences, but only in unstressed positions.

These results, however, refer to single vowels. A factor not yet considered is how phonological processes affect word-final vowels, the cross-linguistic differences in this domain, and the potential cross-linguistic interactions. Spanish tends to fuse vowels across words. When both vowels are unstressed, as in *como alfajores* ('I eat cookies'), the highest vowel in the sequence is frequently reduced, ranging from gliding to full deletion. Reduction processes apply to high and non-high vowels equally (Hualde et al. 2008; Vokic and Guitart 2009) across Spanish dialects (Alba 2006; Hutchinson 1974), but reduction is less frequent when one of the vowels is stressed (Colantoni and Hualde 2016; Hualde 2014). Because agreement vowels are in the unstressed word-final position, they are likely modified in running speech. Contrastingly, vowel reduction across word boundaries is rare in English, given that speakers frequently insert glottal stops to separate across-word vocalic sequences (Davidson and Erker 2014). Thus, whereas Spanish prefers diphthongization or deletion, English realizes these sequences as hiatuses.

Crucially, for coarticulation to occur, words must belong to the same intonational phrase or the same prosodic unit. Because nouns and adjectives are prosodified together in Spanish (D'Imperio et al. 2005; Frota et al. 2007), we assume that agreement vowels are often coarticulated in the input. An important question is whether coarticulation affects bilingual patterns of acquisition of suffixal morphology. It has been shown that word-edges play an important role in word recognition (Shoemaker and Rast 2013) and that coarticulation of consonants has a negative effect on lexical retrieval, as speakers struggle to compensate for coarticulated sounds (Mohaghegh 2016). We can thus expect a similar effect for vowels.

#### *1.3. Phonetics and Morphosyntax*

Research on the bilingual acquisition of Spanish morphology has explored multiple factors, such as frequency of use, nature of the input, and type of target form, but has not examined the phonetic properties of agreement. This was previously pointed out by Silva-Corvala'n (2014). Phonetic factors, however, are known to be important predictors of functional morphology in the child acquisition literature. Prosody is a common explanation of omission of functional elements, such as articles and clitics (the Prosodic Bootstrapping hypothesis by Lleo' and Demuth 1999; Guasti et al. 2008; Mateu 2015). A robust literature reveals that phonology is key in morphological development. For example, accuracy in production of the English plural -s and third person -s depends on syllable (coda) structure (Ettlinger and Zapf 2011; Song et al. 2009; see also Bernhardt and Stemberger 1998 for an overview). Culberston et al. (2019) show that children use phonological rather than semantic categories when acquiring gender. Phonetic reduction appears to have an impact on the overall course of development, affecting both comprehension and production. Miller and Schmitt (2010) show that the variable realization of the final /s/ in Spanish impacts the acquisition of both plurals and tense agreement (2Sg present). Children growing up in varieties where /s/ is maintained (i.e., not weakened or deleted) acquire these markers earlier, both in comprehension and production, than children growing up in varieties where /s/ is frequently aspirated and deleted. Thus, given evidence that phonetic characteristics of the input affect the acquisition of number (English, Spanish) and person (English),

<sup>1</sup> Kehoe and Lleó (2017) report similar results for German-Spanish bilinguals.

we hypothesize that phonetic variability induced by cross-linguistic influence in the phonetic domain will impact the acquisition of gender in Spanish-English bilinguals.

#### *1.4. Research Questions and Hypotheses*

RQ1: What is the phonetic realization of word-final unstressed vowels and across-word vowel + vowel sequences in the three groups?

H1: We hypothesize that early bilinguals (EB) will show a higher rate of vowel overlap (single vowels) than the other two groups. We also hypothesize that monolinguals (M) and late bilinguals (LB) will tend to fuse vowels across words while EBs will tend to separate them.

RQ2: Are there more errors in gender agreement and concord in EBs than in the other groups?

H2: EBs will show a larger proportion of errors than the other two groups.

RQ3: Is vowel centralization being reported as accurate gender marking?

H3: EBs will show a higher rate of centralization when compared with the other two groups, and cases of vowel centralization will be labelled as accurate gender.

RQ4: Do contextual factors (i.e., predictability of the gender of the noun) predict vowel centralization?

H4: Highly predictable DPs are less likely to be fully specified phonetically as they offer redundant information.

#### **2. Materials and Methods**

#### *2.1. Participants*

A total of thirty-seven participants (*N* = 37) took part in the study: 13 Early Bilinguals, 13 Late bilinguals, whose first language was Spanish, and 11 monolingual speakers serving as the comparison group. All the bilingual participants were residents of El Paso, US who were attending different classes at the University of Texas at El Paso. Participants in the (functionally) monolingual group were Spanish speakers with minimal exposure to and ability in English and were residents of Ciudad Juárez, Mexico. They were either recruited from a beginner-level English for Speakers of Other Languages (ESOL) course, or contacted through social networks.

Following previous research (Montrul 2011; Silva-Corvala'n 2014), the EBs were second- or first-generation immigrants who acquired Spanish during childhood at home or in other natural contexts where a majority language (English) was spoken. They were either born and raised in the US or immigrated permanently to the US at or before the age of 12. The LBs were of first generation Mexican background who arrived in the US after the age of 13 with fully developed L1 grammar. Participants completed an adult language background questionnaire, which elicited information on place of birth, primary language of schooling, patterns of language use, etc. This questionnaire also elicited a self-proficiency judgment in both English and Spanish in the four linguistic skills via a Likert scale, ranging from basic/limited (1) to excellent/native (4). In addition to the self-proficiency measure, participants completed an independent proficiency task, adapted from the *Diploma de Español como Lengua Extranjera* (DELE) (Cuza et al. 2013). Table 1 summarizes Age, Age of Arrival to an English context (AOA), and Length of Residence in an English context (LOR) for all participant groups.

EBs scored an average of 40 in the DELE test, while LBs scored 45. Previous research using this methodology (Cuza et al. 2013; Montrul et al. 2003) considered participants who scored between 40 and 50 points (out of 50) to be 'advanced' learners, those with scores of 30 to 39 were considered to be 'intermediate' learners, and those with scores between 0 and 29 were considered to be 'beginner' learners. In other words, while the proficiency score of the EB group is a bit lower than other groups, all groups have a high overall level of proficiency in Spanish. The EB group included participants born and raised in the US and those who came to the US before the age of six (AOA = 2.5). Their self-proficiency rating in English was near native (3.73/4), while, in Spanish, it was good/fluent (2.9/4). Regarding their patterns of language use, most of the participants (62%) reported using both English and Spanish at home, but they used mainly English at work (75%) and in social situations (46%). Six participants felt more comfortable in English (46%) and six participants (46%) felt equally comfortable in both English and Spanish (46%). Only one participant (8%) selected Spanish as the most comfortable language.


**Table 1.** Participants' demographic information.

Late bilinguals (*n*= 13) included first generation immigrants from Mexico (mean age at testing = 38; mean AOA = 23; mean LOR = 16). Their self-proficiency rating in Spanish was almost native (3.9/4), and in English it was good/fluent (2.9/4). The proficiency score in the DELE test was 45/50 (advanced proficiency). As for language use, the majority reported speaking more Spanish at home (77%) and in social situations (58%). At school, five participants (N = 5) used both English and Spanish and another five used only Spanish (42%). At work, half of the participants used only English, and 40% used both English and Spanish. When asked which language they felt most comfortable in, the majority (77%) indicated Spanish.

The comparison group consisted of seven recent arrivals in El Paso, Texas, and four residents of Ciudad Juárez, Mexico (mean age at testing = 25; mean AOA = 20; mean LOR = 9 months). Although their AOA of English is earlier than LB (20 vs. 23), these speakers had learned English for a shorter period of time, specifically an average of 9 months. Most of the participants reported speaking more Spanish at home, school, work, and in social situations. They also reported feeling most comfortable speaking Spanish.

All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the University of Texas at El Paso IRB.

#### *2.2. Materials and Tasks*

To test our hypothesis, we analyzed reading data from "The North Wind and the Sun" and the narrative of the folk tale "Little Red Riding Hood". To elicit the narrative, participants were shown wordless pictures of a children's book based on Perrault's version of the tale. First, participants were shown the pictures as a refresher. When they felt ready, participants recorded the narrative using the images as guidance. All the sessions were conducted in a sound-treated room. Informants were recorded directly onto a laptop computer using Audacity 2.1.2 (available on-line: http://audacityteam.org/) and a Blue Snowball USB microphone. The speech was sampled at 44.1K, and word-final vowels and vocalic sequences (/a e o ae ea oa/) were analyzed with PRAAT (Boersma and Weenink 2001).

#### *2.3. Analysis*

The data obtained was subject to two analyses. To analyze the vowels phonetically, we extracted all tokens of /aeo/ in unstressed word-final position as well as all the unstressed across-word vowel sequences (the first vowel in the sequence was one of the three target vowels in the study) using PRAAT. We marked the onset of each vowel or vowel sequence at the beginning of the F1 increase and the offset at the drop in intensity (pre-pausal vowels) or at the beginning of the F1 rise for cases in which the word-final vowel was immediately followed by a word beginning with a consonant. All measurements were taken at zero-crossings. From the reading, we extracted the vowels from nouns, adjectives, and verbs (single vowels: N = 1045; sequences: N = 287) whereas, from the narrative, only vowels that were part of the noun phrase were extracted (single vowels: N = 2954; sequences: N = 1109).2 Although formant values ware automatically extracted at five points, we will report values at mid-point for single vowels (to minimize the effects of coarticulation with surrounding consonants) and values at five points for sequences. Formant values were subsequently checked manually to inspect values that fell outside the ranges reported in previous studies for Spanish and Spanish-English bilinguals. Data were normalized using the Lobanov (1971) method, as adapted by Nearey (1977) and Adank et al. (2004). Single-vowel results were submitted to two complementary analyses aiming at computing the degree of overlap in vocalic spaces in each group. The first analysis computes the Bhattacharyya affinity scores, following Johnson (2015) and Strelluf (2016). The Bhattacharyya's affinity measures the degree of overlap between two Gaussian distributions (Mak Brian 1996) and is considered to be less sensitive than other approaches when there are imbalances in the sample (Strelluf 2016), which is the case in both tasks but particularly in the narrative. A score of 0 indicates that there is no overlap between vocalic spaces, whereas a score of 1 signals complete overlap. To calculate the Bhattacharyya's affinity score we used the "kernel overlap" function of the {adehabitatHR} R library (Calenge 2006). The second analysis, the computation of convex hulls, as implemented by Haynes and Taylor (2014), allowed us to quantify the percentage of overlap between vowel pairs by calculating the smallest convex shape (polyhedron) that would fit all the given datapoints (Haynes and Taylor 2014, p. 885).<sup>3</sup> Results are reported as percentages and were calculated with the package {phonR} (McCloy 2016).

Vocalic sequences were labeled and then transcribed by one of the authors and then verified by another author. Using the acoustic information available in the spectrogram, we distinguished sequences that were realized together (i.e., no pause or prosodic brake, such as pitch reset between the vowels) from those that were separated by a pause or a glottal stop. We labelled the former as 'diphthong' and the latter as 'hiatus'. This allowed us to clearly separate the sequences that cannot be coarticulated (those labeled as 'hiatus') from those that could (those labeled as 'diphthong'). Thus, the sequences labeled as 'diphthong' may include hiatus realizations, i.e., sequences in which each vowel has frequential and durational values that are similar to those of single vowels (e.g., Aguilar 1999; Borzone de Manrique 1976; Colantoni and Limanni 2010).4 To determine whether there were frequential differences, sequences that were labeled as diphthongs were acoustically analyzed for their formant trajectories. For the analysis of F1 and F2 trajectories within the unstressed sequences /ae oe ea/, <sup>5</sup> we ran Smoothing Splines ANOVAs (SSANOVAs) using the package {gss} (Gu 2014) to test for statistical differences across the three groups of speakers (EB, LB, M) at five intervals within each sequence. SSANOVAs create smoothing splines for each group by connecting mean data points

<sup>2</sup> Although no tokens had to be discarded from the reading task, we excluded 72 single vowel tokens from the narrative which were produced with creaky voice.

<sup>3</sup> As explained by Haynes and Taylor (2014), these are measurements of overlap and not of statistical significance. As in Haynes and Taylor's study, we are not interested in statistical differences but in the extent to which vocalic spaces overlap, because a greater degree of overlap suggests that the vowels are articulated similarly.

<sup>4</sup> Hiatus also differ from diphthongs in the duration and trajectory of formant transitions (e.g., Aguilar 1999).

<sup>5</sup> Only these sequences were analyzed because (i) the first vowel was unstressed and (ii) they appeared in the reading task and in the narrative.

through time, as well as 95% Bayesian confidence intervals represented by dotted curves above and below the splines. At the time-points where confidence interval curves do not intersect, two splines are considered significantly different.

To conduct the morpho-syntactic analysis, all noun phrases (NPs) were extracted, representing a total of 2445 analyzed tokens.<sup>6</sup> Determiner phrases (DPs) were coded for configuration (whether it contained a noun, determiner or quantifier, or modifiers, as in (1)), realization of gender and number agreement in each relevant category (nouns, determiners, and adjectives), and overall concord patterns (match/mismatch between constituents as well as target realization for the head noun, as shown in (2)). All nouns were further analyzed for various semantic parameters to explore potential association between form and meaning. These included semantic type of noun in terms of categories pertaining to concreteness and animacy (3) and to individuation and countability (4). Additional coding included whether the noun's initial segment was stressed [a], which, in Spanish singular feminine nouns, leads to an exceptional use of the masculine article (5), as in *el águila* (f) 'the eagle'.7 Overtly marked gender is indicated in the glosses.

(1) DP configuration


(2) Agreement: match for agreement features (number or gender) with determiners and/or adjectives, and with target gender of the noun


#### (3) Noun semantic type: semantic features of the noun under analysis



All nouns were also coded for morphology and for their morphological relation to other noun lexemes. First, we isolated the final suffix or segment. The first goal was to determine whether gender was visibly expressed or not; that is, whether the noun contained the transparent word markers -a

<sup>6</sup> Frozen expressions such as *Fin*, *Colorín Colorado* ('the end') and *lugar (*'place') in the prepositional locution *en lugar de* ('in lieu of') were excluded from the analysis.

<sup>7</sup> The morphosyntactic coding was conducted by one of the authors and then was verified by another author. As concerns the error count, this was first checked by one investigator, then re-checked by a second author, and discrepancies (N = 2) were solved by a third author.

(f) and -o (m), a different transparent suffix such as -*ción*, which is uniformly feminine, or whether it contained formally opaque final segments or suffixes (5). We then considered whether the noun entered a gender alternation or not, as in (6), and if so, what was the lexical relationship to the other entry in the alternation (7).

(5) Gender visibility


(6) Gender alternation: whether or not there exists an opposite word to pair with the noun in terms of gender and the nature of the lexical relationship between the alternants


We finally considered both structural and referential context to assess whether the gender of a given DP was predictable. Predictability was categorical (y/n), and we annotated the source of predictability, either by syntactic (informative article) or by semantic/contextual means (known/previously mentioned). The purpose of this classification was to explore whether predictability of the gender of a DP predicts vowel underspecification.

Our final analysis combined the phonetic and the morphosyntactic results. The F1 and F2 values obtained for single word-final vowels in the narrative were Bark-transformed and combined with the morphosyntactic analysis.<sup>8</sup> In order to visualize the results, we created a new independent variable that was the result of subtracting the F1 (Bark) to the F2 (Bark). This allowed us to compare the degree of vowel centralization for each participant (the smaller the number, the more centralized the vowel). Then, we plotted F2 and F1 against the predictability of each noun and displayed the results by vowel organized by group. This visualization allowed us to explore the hypothesis of whether nouns that entered into predictable alternations showed a smaller degree of centralization than nouns whose gender is not predictable. To determine whether predictability played a role in vowel centralization, we ran linear mixed effects models with F2-F1 (in Bark) as the dependent variable, Vowel (/aeo/), Predictability (yes/no) and Group (Monolinguals, EB, LB) as independent variables, and Participant as a random factor. All statistics were calculated with R Studio Team (2020).

<sup>8</sup> The combined analysis only included data from the narrative because we are interested in comparing vowel quality and gender accuracy. The reading task, instead, is not clearly reflecting participants' grammatical knowledge.

#### **3. Results**

We begin this section by summarizing the results of the phonetic analysis, which includes the characterization of single vowels and vowel sequences. Section 3.2 presents the morphosyntactic analysis and the final section combines the phonetic results obtained for single vowels with morphological predictability to explore whether it may be possible that vowel centralization is being undetected when we report accuracy in gender agreement.

#### *3.1. Phonetic Analysis*

Figure 1a displays the F1–F2 normalized values obtained for single vowels in the reading task, whereas Figure 1b shows the results for the narrative. In both graphs, we observe some overlap, particularly between the vocalic spaces for /a/ and /o/ for all three groups. In the reading task, although bilinguals show slightly more overlap than monolinguals, the patterns are rather similar across groups. In Figure 1b, instead, both bilingual groups display a smaller vowel space for /a/, and EBs clearly show a greater degree of overlap between /a/ and /o/ realizations. This is important because these are the two vowels that are frequently used to encode gender.

**Figure 1.** Formant charts for /aeo/ in all groups (Early Bilinguals (EB), Late Bilinguals (LB) and Monolinguals (M)): (**a**) reading task; (**b**) narrative.

Table 2 summarizes two measurements of overlap; the Bhattacharyya's affinity scores and the percentage of overlap, calculated using convex hulls for the data obtained from the reading passage, whereas Table 3 displays the results of the narrative. As indicated in Section 2.3, both are complementary measurements to quantify overlap. Whereas the affinity score quantifies how similar the vocalic spaces are, the convex hulls quantify the degree of overlap. In both cases, the larger the number, the greater the overlap. As mentioned, the goal of these measurements is to quantify overlap, not to test statistical significance among groups.

These results show, first, a task effect and, second, a vowel-pair effect. The degree of overlap across pairs of vowels (Table 2) is lower in the reading task than in the narrative, and this is particularly evident in the values calculated using convex hulls. More careful articulation and, thus, less overlap is generally expected in a reading task. The degree of overlap, instead, increases in the narrative

at different rates across groups (Table 3). LBs and particularly EBs double and sometimes triple (e.g., [a]–[o]) the degree of overlap in vocalic spaces when speaking spontaneously (see overlap %), which suggests that the distinction between some pairs of vowels may be weakening. This is clearly the case with the [a]–[o] pair, and, to a lesser extent, with the [a–e] pair.<sup>9</sup> In turn, this tells us that the most frequent vowel pairs that mark gender are being produced with the same quality a third of the time by EBs.


**Table 2.** Affinity scores and proportion overlap (convex hulls) for /aeo/ in the reading task. Results displayed by group (Early Bilinguals (EB), Late Bilinguals (LB) and Monolinguals (M)).

**Table 3.** Affinity scores and proportion overlap (convex hulls) for /aeo/ in the narrative. Results displayed by group (Early Bilinguals (EB), Late Bilinguals (LB) and Monolinguals (M)).


We turn to the realization of sequences, which were analyzed independently from single vowels because the formant quality and trajectory largely depends on how they are realized. If realized as hiatuses, formant values should be similar to those obtained for single vowels, whereas values in diphthongs should differ from the values of the corresponding single vowels (e.g., Borzone de Manrique 1976; Aguilar 1999). Thus, before conducting the acoustic analysis, it is important to determine the proportion of sequences produced together from those realized in different syllables (Figure 2). A preference for hiatuses will be interpreted as indicative of cross-linguistic influence, since across-word vowels in Spanish tend to be pronounced together (e.g., Aguilar 2010), which is not the case in English (Davidson and Erker 2014). The fact that vowels are syllabified differently across words in Spanish has also important implications for L1 acquisition because, as mentioned, the vowel quality in diphthongs differed from the vowel quality in singleton vowels. Thus, a higher probability of diphthongs is also a higher probability of noise in the signal for bilinguals and, thus, for difficulty in determining the quality of the vowel that marks gender. Results in Figure 2a show that EBs, in the reading task, had a larger proportion of hiatuses than the other groups, whereas all groups behaved similarly in the narrative (Figure 2b).

To determine if there were differences in the realization of sequences labeled as diphthongs, we compared the formant trajectories across groups.<sup>10</sup> First, we analyzed the sequences produced by female speakers in the reading task. Male speakers' productions were not analyzed given the small number of tokens obtained for this task (see Methods) and the even smaller number of tokens in which vowels were not separated by a pause or glottal stops. In terms of F1 trajectories, no statistical differences were observed across the three groups (Appendix A, Figure A1). In terms of F2 trajectories, the SSANOVAs revealed significant differences for all sequences, where the EB group realized (i) /ae/

<sup>9</sup> Results regarding the vowel [e] in the narrative should be interpreted with caution, because this vowel represents only 10% of the total tokens.

<sup>10</sup> Note that only sequences that were realized as diphthongs in all groups were analyzed.

with significantly lower values than M and LB (Figure 3a), (ii) /oe/ with significantly lower values than LB (Figure 3b), and (iii) /ea/ with significantly lower values than M and LB (Figure 3c). The F2 trajectories realized by the M and LB groups did not differ significantly.

**Figure 2.** Percentage of diphthongs and hiatuses by type of vowel sequence and group (horizontal axis): (**a**) Reading task; (**b**) narrative.

**Figure 3.** Smoothing Splines ANOVAs (SSANOVAs) for non-normalized (female) F2 trajectories across five intervals produced by each group of speakers (M, EB, LB) in the reading task. (**a**) Sequence /ae/; (**b**) sequence /oe/; (**c**) sequence /ea/.

Second, and in order to be able to compare results across tasks, we analyzed the sequences produced by female and male speakers in the narrative separately. SSANOVAs by gender using non-normalized values revealed two statistical differences, both within the sequence /ae/: (i) the female EB group realized F1 trajectories with significantly higher values than the female M group (Figure 4a); and (ii) the male EB group realized F2 trajectories with significantly lower values than the male M group (Figure 4b).<sup>11</sup>

**Figure 4.** SSANOVAs for non-normalized formant trajectories across five intervals within the sequence /ae/ produced by each group of speakers (M, EB, LB) in the narrative task: (**a**) F1 trajectories: female speakers; (**b**) F2 trajectories (male speakers).

Taken together, these results suggest that the EB group behaves differently from the M and LB groups, often centralizing F2 values in unstressed vowel sequences.

#### *3.2. Morphosyntactic Analysis*

The grammatical analysis showed high accuracy counts. From a total of 2445 DP tokens extracted for analysis, we identified six clear gender errors (see (7)–(12) below), as well as another handful of errors with grammatical number, mostly singular in lieu of plural, which will not be discussed here. Erroneous marking is indicated in bold.


We observed two additional, more ambiguous instances where a demonstrative could function as a filler or a stranded phrase or a demonstrative. In (13), below, we note that Speaker UT030 was a

<sup>11</sup> Non-significant results for female and male speakers are included in Figures A2 and A3, respectively.

frequent user of the *este* filler. Nonetheless, there was no prosodic break separating the noun from the preceding demonstrative, so an analysis of these tokens as instances of determiner-noun disagreement cannot be ruled out.


This high accuracy rate is appropriate, given the high proficiency status of these speakers, many of which produced elaborate, lexically rich and syntactically complex narratives that were, for the most, seemingly error free and without English intrusion. Nonetheless, the formal and contextual properties of gender expression in the DP deserve further examination. We ask two questions: How robustly do these DPs manifest gender agreement explicitly? And given our phonetic results, how accurate will a coder or transcriber be at detecting that a given gender vowel has been centralized. If a gender form is highly predictable (given lexical retrieval of a contextually or syntactically predictable entry), a coder might be likely to perceive a centralized vowel as the target. Given what is known about speech perception, we can expect that any skilled listener will predict a centralized vowel (caperucit@) to be an /a/ and ignore the presence of the schwa. Even if we think we are paying attention, the evidence suggests otherwise. Listeners are generally known to perceive elements that are not present in the speech chain in order to repair phonotactically illicit sequences (e.g., Calabrese 2012; Durvasula and Kahng 2015a; Durvasula and Kahng 2015b; Hawkins 2010; Repp 1992); so, we might expect them to easily attribute features to underspecified central vowels.

To further explore these patterns of accuracy, we consider the distribution of lexical and morphological types of noun, determiner, and adjective forms. Nouns were fairly evenly divided between those that entered into a gender alternation (52%, 1284 tokens) and those that did not; and, for this particular story, feminine nouns were almost twice as frequent as masculine nouns (1575 vs. 885 tokens). For most nouns, gender marking is an arbitrary classification with no semantic import. The main exceptions are nouns referring to human entities, some but not all animal classes (cf. *gallina* vs. *avispa*), and a handful of narrow subclasses of systematic alternations (*cerezo*/*cereza*). Nouns with human referents (*abuela*/*madre*/*caperucita*/*niña*/*cazador*) made up 42% of the data; animals (mostly *lobo*, 'wolf', with an occasional reference to cats in Caperucita's house) added another 12% to the count of semantically transparent gender alternations.

From a lexical perspective, over 80% of the noun tokens (2028 tokens) in the narratives had explicit gender marking (i.e., transparent *-o*/*-a* word-markings). Other nouns had either opaque word-final morphology (ending in *-e* or consonant) or ended in a transparent suffix *(-ción*, etc.). A potential loss of the gender system is revealed either as (i) a switch in the word marker vowel (i.e., saying *caperucito* or *abuelo in lieu of abuela*), or (ii) a mismatch between the noun and the determiner (*las alimentos*). A determiner-noun mismatch can potentially indicate loss of concord or agreement; most likely, it may just represent a lexical knowledge gap: the speaker may have misclassified the gender class to which a noun belongs. A study of gender agreement in French-English bilingual children by Nicoladis and Marchak (2011) supports this claim. Their data showed less attrition in concord (i.e., agreement between determiner-adjective) than in gender assignment (i.e., what determiner was associated to a noun). Bilingual children were significantly less accurate with gender assignment beyond their differences in lexical scores. However, when these authors strictly considered concord, there were no statistical differences between monolingual and bilingual children.

Determiners as a category were only partially informative. Bare nouns were common (26%, 629 tokens), and some determiners such as possessives (*mi casa*) and quantifiers (*dos casas*) were uninformative for gender (14%, 347 tokens), so that about 40% of the analyzed tokens had no gender identification in the determiner. The determiner forms that inflect for gender (*definites, indefinites, demonstratives*) can be divided into those where gender identifiability depends only on perceptibility of

the gender vowel, as for demonstratives (*ese niño*/*esa niña*) or the plural cases of definites and indefinites. Only for the singular definite and indefinite determiners (*un caso*/*una casa*) is there additional phonetic information beyond vowel quality, provided the following word starts with a consonant (i.e., *una gata* but not *una\_abuela*). Table 4 shows the frequencies of all determiner types classified by number in the noun. The frequencies of the more informative singular definite and indefinite forms are indicated in bold.


**Table 4.** Distribution of DPs by determiner and number (all tokens).

If we consider only the determiners marked for gender (definite/indefinite/demonstratives) for nouns ending with word markers -a, -o, we are left with 1156 or 47% of the nouns. On the assumption that a speaker or listener would ignore vowel quality for non-alternating nouns ([bok@] for *boca*) and only control production/perception of contrasting vowels, where it matters ([niñ@] would have to be resolved into *niño* or *niña*; similarly for [kas@] *caso*/*casa*), we could restrict our attention to only alternating nouns. As shown in Table 5, only 791 tokens or 32% of all data has nouns that will mark gender and are accompanied by a determiner with visible gender.


**Table 5.** Frequencies of determiner types associated with gendered nouns (reported for all gendered nouns and separately for only those that entered into an alternation).

So far, we have only talked about form. The scenario is even worse if we take into account the specific context of this narrative and deem as highly detectable nouns that are not contextually predictable. As pointed out above, if someone murmurs *Caperucito roj@* we are likely to hear *Caperucita roja* for the simple reason that *Caperucita* is one and only. In our story of choice, there are five main characters; all of them highly contextually predictable. We know we are talking about an *abuela*, not an *abuelo*, and that there is no *loba* in the story. We first note that there is a strong association between semantic type of noun and contextual status, as shown in Table 6. This is in the direction one would expect, with the five main characters (animate, the wolf and humans, *Caperucita*, grandmother, mother and hunter) which make up the bulk of the contextually-given reference. Table 6 cross-tabulates all nouns; columns classify all nouns by whether there was a preceding article or not, and rows show how those nouns are distributed in relation to contextual information, separated by whether it consisted of previous mention, first initial mention of known entities, or neither type of identification. We restricted previous mention to subsequent repetition using the same lexeme; that is, saying *Caperucita* twice was counted as previous mention but not *la niña*).


**Table 6.** Cross-tabulation of semantic type of nouns against contextual status of the DP.

This leaves us with 24 gendered, alternating nouns that were not gender-predictable, neither on the basis of syntax (preceding article) nor context.

#### *3.3. Combined Results*

The last step in this exploration is to combine what we found about the phonetic realization of these vowels and the morphological analysis. Thus, we will discuss here a way of combining the phonetic results obtained from the narrative elicitation task (the results of the reading task are left aside because read speech is not reflective of the underlying grammatical system) with the analysis of predictability presented in Table 7. Given the characteristics of our dataset (unbalanced number of vowels per category, distribution of predictable and unpredictable nouns both in the story and across speakers), this is very much a tentative proposal rather than a strong claim.

**Table 7.** Cross-tabulation of nouns (reported separately for all nouns and for gendered nouns in alternation) against contextual status of the DP.


As discussed in Section 2.3, to combine our results, we Bark-transformed the F1 and F2 values and calculated the F2–F1 difference to obtain a single result for each token. As a reminder, a small difference in Bark is interpreted as a sign of centralization. Our first analysis, displayed in Figures 5 and 6, presents the summary statistics for /a o/. <sup>12</sup> For each vowel, we calculated the mean difference for predictable and non-predictable nouns. Then, to each data point in our Excel file, we subtracted the mean value obtained for the opposite vowel with the same degree of predictability. For example, for a given token of the vowel [a] in *Caperucita*, we subtracted the mean obtained for all the [o] vowels in words like *lobo*. <sup>13</sup> If participants are making no difference between these unstressed vowels, the numbers should be closer to 0.

<sup>12</sup> We focus on these two vowels, given that we had a larger number of tokens than for /e/ and that 80% of the nouns that were marked for gender in the story had these vowels.

<sup>13</sup> The graphs also reflect the fact that there were many more predictable nouns than non-predictable nouns in this narrative, hence the difference in the number of data points in Figures 5a and 6a when compared to Figures 5b and 6b.

**Figure 5.** F2-F1 in Bark for the vowel /a/: (**a**) Values obtained for /a/ minus values obtained for /o/ in predictable contexts; (**b**) values obtained for/a/ minus values obtained for/o/ in non-predictable contexts.

Figure 5 shows that values obtained for EBs are closer to 0 (mean in predictable contexts = 1.95; mean in unpredictable contexts = 0.94) than for the other two groups (LB\_predictable = 3.09; LB\_unpredictable = 2.53; mono\_predictable = 3.02; mono\_unpredictable = 1.99), which means that EBs make less of a distinction between the two vowels. Values obtained for /o/ (Figure 6) display the same tendency, albeit mean values per group are slightly higher than those obtained for /a/. 14 The analysis of both vowels then suggests that the patterns are the opposite to those that we had hypothesized; namely, the difference between the two contrasting vowels was larger in predictable than in non-predictable contexts.

<sup>14</sup> EB\_predictable = 2.15; EB\_non-predictable = 1.33; LB\_predictable = 3.08; LB\_non-predictable = 2.43; mono\_predictable = 2.91; mono\_non-predictable = 1.80.

**Figure 6.** F2-F1 in Bark for the vowel /o/: (**a**) Values obtained for /o/ minus values obtained for /a/ in predictable contexts; (**b**) values obtained for/o/ minus values obtained for/a/ in non-predictable contexts.

The second type of analysis conducted on this combined dataset was a series of mixed effects models with the F2–F1 (in Bark) difference for each vowel token as the dependent variable and Vowel (/aeo/), Language group (M, EB, LB) and Predictability (yes, no) as the independent variables. Reference levels are /a/ for Vowels, M for Group, and no for predictability. Table 8 reports estimates, standard error, and significance of a linear mixed effects models with Participant as a random effect and Vowel, Language Group, and Predictability of the noun gender as fixed effects (Winter 2020).

The combined analysis of morphosyntactic and phonetic results presented in this section reveals that all groups centralize vowels more in gender-predictable nouns. Further analysis of predictability by formant distance (Figures 5 and 6) suggests that EBs showed a higher degree of centralization in /a o/ than the other groups.


**Table 8.** Results of a linear mixed effects model for the F2–F1 (Bark) with Participant as a random effect and Vowel, Language Group, and Predictability of the noun gender as fixed effects. Reference values: Vowel = [a]; Language Group = Monolinguals; Predictability = no.

#### **4. Discussion**

#### *4.1. Hypothesis Evaluation*

Our first research question and hypothesis targeted the phonetic realization of word-final unstressed singleton vowels and vowel + vowel sequences. We hypothesized that EBs would show a higher rate of overlap in the vocalic spaces of the three target vowels than the other two groups. Results showed this to be the case, particularly in the narrative. Figure 1b and Table 3 point to the fact that the vocalic spaces for the vowel pair that encodes gender more often (i.e., [a]–[o]) overlap in more than 30% of the tokens. This means that 1/3 of these vowels are realized with the same quality. We also hypothesized that monolinguals and LBs would tend to fuse vowels across words while EBs would tend to separate them. Figure 2 showed this to be the case only in the reading task. We further explored the quality (formant trajectories) of the sequences that were pronounced without a pause or a glottal stop to determine potential group differences that could be attributed to influence from English. Given the smaller number of tokens obtained for sequences when compared to single vowels and that most of those tokens were produced by female speakers, we focused on the latter in our statistics for the reading task. We showed that EBs indeed displayed signs of centralization in the realization of such sequences, since /ae/ and /ea/ were realized with similar trajectories and with lower F2 values than those obtained for the other groups. For the narrative, we analyzed both normalized and non-normalized values produced by female and male speakers. Although no statistical differences were observed in the former analysis, the latter revealed signs of centralization in EBs' realization of the sequence /ae/, which was produced with lower F2 values by male speakers and higher F1 values by female speakers than by the monolingual counterparts.

Our second question concerned the perceived errors in gender agreement and concord. Based on previous literature, we expected to find more errors in the EB group than in the other groups. However, very few errors were found in total (only six) and half of them were produced by EBs, which does not seem to support our hypothesis.

Our third research question was explored with data obtained from the narrative and addressed the issue of potential underreporting of gender errors. We hypothesized that vowel centralization would be labelled as accurate gender. Based on the results obtained for the phonetic characterization of vowels (we found 32% of overlap in the vocalic spaces for [a o] in EBs) and the fact that only three errors (examples 7–9) were perceived in this group, we conclude that the hypothesis was confirmed. As we will discuss in the next section, we would have expected at least a higher number of cases labeled as questionable but we ourselves did not do that.

Our final question concerned the results obtained in the combined phonetic and morphosyntactic analysis. Here, we wanted to determine whether EBs would be more likely to mark gender (i.e., both accurate marking and less vowel centralization) in DPs that were not predictable, neither syntactically nor contextually. Our results rejected our hypothesis that this would be the case. Indeed, we found that all groups (and especially EBs) had less centralization in predictable than in non-predictable contexts. In the next section, we discuss why this may be the case and we contextualize our results in terms of past and future research.

#### *4.2. General Discussion*

The results of our phonetic analysis show that there are differences in the realizations across groups and these differences are larger in the narrative than in the reading task. In particular, we have shown that EBs tend to centralize /o/ and have, as a consequence, a high degree of overlap in the vocalic space for [a]–[o]. This is consistent with previous production studies (Gildersleeve-Neumann et al. 2009; Menke 2010; Rogers 2012; Ronquest and Rao 2018) that reported differences in the realization of single vowels between heritage speakers and Spanish monolinguals. The patterns reported here are also consistent with previous perception studies showing that L1 English–L2 Spanish speakers tend to confuse /o/ with /a/ (Morrison 2003; Morrison 2006) and that Spanish heritage speakers tend to confuse vowels but only in unstressed positions (Mazzaro et al. 2016). We have also seen that across-group differences are not restricted to single vowels; there were also differences in the realization of vowels across words. EBs had a higher proportion of hiatuses than the other groups in the reading task and had differences in the F2 trajectories in all the sequences analyzed (i.e., /ae oe ea/). Differences were also found in the narrative for the F1 (female speakers) and F2 (male speakers) trajectories in the /ae/ sequence. Overall, our results showed signs of centralization of the vocalic space, signaled by changes in the F2 rather than in the F1. There are two important points to keep in mind. First, we have found differences in EBs, when compared to the other two groups, in a community with a high level of bilingualism and where Spanish is omni-present. All EBs had or were receiving some education in Spanish and used Spanish daily. They also showed DELE scores very close to those obtained by LBs. Second, we are reporting here results of an unbalanced dataset. Participants are not equally distributed between gender groups (which was overcome in part with normalization) and did not contribute to the sample the same number of tokens (narrative). Although everybody produced the same number of vowels in the reading task, our main analysis refers to the narrative, to which each participant contributed a different number of tokens distributed across different lexical items.

The analysis of gender agreement and concord revealed very few errors, and half of these errors were generated by the early bilinguals. Beyond these few errors, we extracted three observations about the distribution of morphological types. These observations inform our understanding of the challenge of assessing the distribution of gender realizations. First, while most nouns (4 out of 5) are explicitly marked, only a subset of all nouns are accompanied by transparent determiners (3 out of 5). Second, for all transparent nouns and for most transparent determiners, the single cue to gender realization is the vowel: a centralized vowel might actually be ambiguous for accuracy, but vowel centralization is likely to remain undetected, particularly in predictable, non-contrastive contexts. Last, the vast majority of the NPs we analyzed were highly predictable in context and almost never contrastive.

If we now turn to our combined results, we showed in Table 3 that, in 32% of the /a o/ tokens produced by EBs, the vocalic spaces completely overlapped. That means that there should be 221 tokens (out of the 737 tokens of these two vowels) in which the transcriber should have expressed some doubts about the quality of the vowel. If we turn to Section 3.2, we see that we only labeled six cases as gender mismatch. Thus, one can say that, in coding our data, we might have over-reported gender accuracy in possibly as many as 215 tokens, and this only in reference to one of the vowel pairs. If we turn to the combined analysis, both Figures 5 and 6 showed that EBs tend to make smaller distinctions between /a/ and /o/, both in predictable and in non-predictable contexts. To sum up, despite the limitations inherent to this type of data (narrative elicitation) and to our specific dataset (*La Caperucita*), we believe that we have established preliminary grounds for what we have dubbed the *modular interaction hypothesis*, which proposes that changes in one domain (phonetics) can have consequences for the acquisition of other domains (morphosyntax). We have made a case for phonetics as both a methodological obstacle for assessment of bilingual gender grammars as well as a potential incipient factor in contact-induced grammar restructuring in bilinguals.

The results of the combined analysis also showed a main effect of predictability of the noun gender; namely, nouns that were predictable had a significantly higher F2–F1 difference than unpredictable nouns. This means that the distance between the formants, which is a proxy of vowel centralization, was higher in predictable nouns, contrary to our last hypothesis. However, if we return to the data in Table 6, we note that Predictability and Animacy were heavily overlapping categories; i.e., all our predictable nouns were animate, while few inanimate nouns were predictable. This offers an alternative interpretation of the data as signaling that participants are enhancing the marking of gender with animate nouns. Current data does not allow us to evaluate these two possible interpretations of the data.

We do not seek to emphasize the limitations of narrative or spontaneous data, or of previous studies, nor to argue that there are or might be more gender errors than those reported in the literature and/or in the present results. Our goal was to assess how far bilingual speakers can go in reducing the quality of unstressed final vowel contrast before other speakers (including transcribers) detect that something is different or missing in the signal. On the route to this important point, we have shown that, although all DPs are analyzed as possessing abstract gender features, only a portion of them visibly show it. Equally important, only a very small subset of these realized gender forms is not predictable from the syntax or the semantics. As such, these results suggest that future studies of bilingual gender marking should address phonetic analyses explicitly.

The implications are more important than a methodological point. We have examined narrative use of gender and discovered that, in general, gender in many nouns is not overtly marked, but that gender is often a category that can be easily predicted from context. We have also shown evidence, as we predicted, that bilingual heritage adults are likely to produce agreement vowels that are less distinctly specified than those that will be available in the environment of a child in a monolingual community. In a bilingual context with an ongoing contact-based phonetic shift, the signal to gender marking becomes increasingly opaque for the child learner. This contact-induced variation includes robust patterns of vowel centralization and a high degree of overlap in the vocalic space. Furthermore, we should not leave aside the variation that results from the co-articulation between word-final vowels and words beginning with vowels in monolingual speech. Our bilinguals are separating vowel sequences in their production; this means that fused sequences in the speech of monolinguals are likely to remain opaque for them in perception.

Given this variability in the input, we speculate that, at some point, gender in contact varieties of Spanish could become French-like. That is, a scenario where gender is still marked in the syntax but is more visible in the determiner than in the noun morphology. The possibility remains that, for some children, the disruption in the acoustic signal may lead to the development of systems where there has been a reorganization of the underlying featural system, or even a more extreme scenario where gender is no longer part of the agreement system and remains as a vestigial lexical remnant. It is worth recalling the results in Cuza and Pérez-Tattam (2016) data, where, as in other studies, there are more errors with feminine targets than with masculine targets. The individual analysis of their data shows that about a quarter of the children produced no single instance of correct feminine agreement. Those children give the appearance of having contracted the system to the masculine default. Again, more data, particularly data with individual analyses, is needed to explore the range of morphological reorganization processes that might characterize individuals in the North American bilingual context, and to study the potential link between morphosyntax and changes to vowel perception and representation.

Our current study allows us to re-contextualize some of the pre-existing literature. As sociolinguists hold, narratives are a useful tool to elicit semi-spontaneous speech. The narrative analyzed in this paper has special characteristics; namely, it is a traditional folk tale widely known in the Western world. Thus, many lexical items are highly predictable, but not all. In Section 3.2, we showed that there is actually a very small proportion of nouns whose gender is not predictable. This suggests that, in most cases in real life, a bilingual can consistently communicate inserting schwas in word-final position

without them being detected. This, in turn, could be part of the observed asymmetries in results reported in studies such as Montrul et al. (2014), in which participants are accurate in production tasks but inaccurate in grammaticality judgement tasks. The analysis of the narrative, in combination with the study of the phonetic realization of vowels, shows that there are small but systematic differences in the speech of EBs when compared to the other groups. These small differences probably go undetected in production when these speakers interact with other bilingual and monolingual speakers but are substantial enough to limit their perceptual skills and possibly, for some bilingual speakers, prevent them from attaining the underlying representational system.

Our use of semi-spontaneous, narrative data introduced asymmetries and gaps in the data, which precluded certain analyses. At the same time, it gave us a better view of natural speech patterns than the reading data and allowed us to contextualize morphosyntactic analyses and moderate the conclusions from those analyses in light of the phonetic findings.

#### **5. Conclusions**

We have shown that early bilinguals differ in the phonetic realization of vowels when compared to monolinguals, as has been reported in previous studies. We argued that those differences have implications for the way in which we should analyze agreement errors (or the lack of them) in Spanish-English bilinguals. Even in our population of early bilinguals, who use Spanish daily, we have seen, on the one hand, low rates of reported errors in agreement, as identified by human coders, and on the other, frequent patterns of vowel centralization and a high degree of vowel overlap that cast doubt on informal analyses of gender marking. We argue that such phonetic reality provides the basis for established patterns of incipient restructuring in the morphological system of Spanish heritage bilinguals (as suggested by Scontras et al. 2018) as well as other, not yet fully documented possible impacts to the system.

Our final take home message is simple. Misperception happens for single segments (e.g., Calabrese 2012; Durvasula and Kahng 2015a; Durvasula and Kahng 2015b; Hawkins 2010; Repp 1992; see also Ohala 1989, 1993) as well as for whole utterances. Speaking about mondegreens, i.e., misheard song lyrics, Nevins (2014) explains: "listeners might impose what they wish to hear in the songs, or perhaps, what they expect to hear, in perception". Spanish-speakers predict and expect grammatical gender, so they hear gender vowels when, in fact, the bilingual speaker has spoken a schwa. Thus, it is important to cross-analyze data across linguistic domains and to remember that we linguists, as all human beings, believe we hear what we already know.

**Author Contributions:** Conceptualization: L.C. and A.T.P.-L.; methodology and analyses: all contributors; data collection: N.M.; writing: all contributors; funding acquisition: L.C., N.M. and A.T.P.-L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by a Social Science and Humanities (SSHRC) Insight grant from the Government of Canada, grant number 435-2020-0110.

**Acknowledgments:** We thank our participants for their time and M. Barreto and M. Lazzari for their assistance with the statistical analysis.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Figure A1.** SSANOVAs for non-normalized (female) F1 trajectories across five intervals within the sequences (**a**) /ae/; (**b**) /oe/; and (**c**) /ea/ produced by each group of speakers (M, EB, LB) in the reading task.

**Figure A2.** *Cont.*

**Figure A2.** SSANOVAs for female fromant trajectories across five intervals within the sequences (**a**) /ae/ (F2); (**b**) /ea/ (F1); (**c**) /ea/ (F2); (**d**) /oe/ (F1); and (**e**) /oe/ (F2) produced by each group of speakers (M, EB, LB) in the narrative task.

**Figure A3.** *Cont.*

**Figure A3.** SSANOVAs for male formant trajectories across five intervals within the sequences (**a**) /ae/ (F1); (**b**) /ea/ (F1); (**c**) /oe/ (F1); and (**d**) /oe/ (F2) produced by each group of speakers (M, EB, LB) in the narrative task.

#### **References**


Ladefoged, Peter. 2001. *Vowels and Consonants: An Introduction to the Sounds of Language*. Oxford: Blackwell.


Willis, Erik W. 2005. An initial examination of southwest Spanish vowels. *Southwest Journal of Linguistics* 24: 185–98. Winter, Bodo. 2020. *Statistics for Linguists: An Introduction Using R*. New York and London: Routledge.

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **(Divergent) Participation in the California Vowel Shift by Korean Americans in Southern California**

**Ji Young Kim 1,\* and Nicole Wong <sup>2</sup>**


Received: 24 September 2020; Accepted: 2 November 2020; Published: 6 November 2020

**Abstract:** This study investigates the participation in the California Vowel Shift by Korean Americans in Los Angeles. Five groups of subjects participated in a picture narrative task: first-, 1.5-, and second-generation Korean Americans, Anglo-Californians, and (non-immigrant) Korean late learners of English. Results showed a clear distinction between early vs. late bilinguals; while the first-generation Korean Americans and the late learners showed apparent signs of Korean influence, the 1.5- and the second-generation Korean Americans participated in most patterns of the California Vowel Shift. However, divergence from the Anglo-Californians was observed in early bilinguals' speech. Similar to the late bilinguals, the 1.5-generation speakers did not systematically distinguish prenasal and non-prenasal /æ/. The second-generation speakers demonstrated a split-/æ/ system, but it was less pronounced than for the Anglo-Californians. These findings suggest that age of arrival has a strong effect on immigrant minority speakers' participation in local sound change. In the case of the second-generation Korean Americans, certain patterns of the California Vowel Shift were even more pronounced than for the Anglo-Californians (i.e., /ܼ/-lowering, / ܤ/-/ܧ/ merger, /ݜ/- and /ݞ/-fronting). Moreover, the entire vowel space of the second-generation Korean Americans, especially female speakers, was more fronted than that of the Anglo-Californians. These findings suggest that second-generation Korean Americans may be in a more advanced stage of the California Vowel Shift than Anglo-Californians or the California Vowel Shift is on a different trajectory for these speakers. Possible explanations in relation to second-generation Korean Americans' intersecting gender, ethnic, and racial identities, and suggestions for future research are discussed.

**Keywords:** Korean Americans; California Vowel Shift; second language phonology; bilingualism; immigrant minority speakers; sound change

#### **1. Introduction**

Over the past few decades, research on second language (L2) phonology has provided empirical evidence that early bilinguals are generally more successful in acquiring L2 speech sounds than late bilinguals (Flege et al. 1995, 1997; Flege and MacKay 2011; Stevens 1999; Yeni-Komshian et al. 2000). Models in L2 phonology, such as Flege (1995) Speech Learning Model (SLM) and Best and Tyler (2007) Perceptual Assimilation Model (PAM)-L2, posit that bilinguals' L1 and L2 phones interact in a common phonological space. Thus, the development of L2 sounds would depend on the perceptual similarity to existing L1 sounds. That is, bilinguals would assimilate an L2 sound to an L1 sound if the two are perceived identical or if the L2 sound is perceived as a deviant variant of the L1 sound. However, if an L2 sound is perceptually distinct from existing L1 sounds, bilinguals would create a new category. Early bilinguals tend to be successful at simultaneously maintaining language-internal and cross-linguistic contrasts (Chang et al. 2011) because they begin establishing L2 sounds when they are still in the process of acquiring language-general fine-grained acoustic features (Kuhl et al. 1992;

Werker and Tees 1984). For late bilinguals, on the other hand, L2 sounds are introduced to an already established L1 sound system. Thus, influence from L1 speech sounds would occur to a larger extent for late bilinguals than for early bilinguals.

In the case of immigrant populations, first-generation speakers (i.e., late bilinguals) are prone to having a foreign accent in the societal language despite long residence in the host country (Baker and Trofimovich 2005). With respect to children of immigrants who are early bilinguals of their home language and the societal language, the situation becomes complicated. Some speakers do not show any signs of foreign accent in the societal language (Lloyd-Smith et al. 2020), while others demonstrate phonetic features that are different from local mainstream varieties. For instance, immigrant minority speakers who acquired the societal language natively may use phonetic features that are present in their parents' foreign-accented speech, regardless of whether they speak their parents' language (Fought 2003; Mendoza-Denton 1999; Mendoza-Denton and Iwai 1993; Tsukada et al. 2005). Thus, while neurological maturation associated with age of acquisition plays an important role in L2 pronunciation, there are various extralinguistic factors other than age of acquisition that contribute to the development of L2 speech sounds (e.g., quantity/quality of L2 input, relative use of L1/L2, language attitude, identity, speech register) (Jia and Aaronson 2003; Flege 1999; Zampini 2008).

#### *1.1. Ethnicity and Participation in Local Sound Change*

Since the foundational work on African American English by Labov (1972), ethnicity has been considered as one of the key factors, along with age, gender, and social class, that condition language variation in a speech community (Boberg 2004). With regard to immigrant minority groups in North America with non-English speaking backgrounds, native-born children tend to display less ethnic identification than their foreign-born parents (Hoffman and Walker 2010; Weinfeld 1985, pp. 71–77). Thus, apart from producing more native-like speech sounds in English, they demonstrate stronger assimilation to local mainstream norms. Nevertheless, studies have shown that even those who were born and raised in North America and speak English natively demonstrate some speech patterns that are distinct from the local mainstream varieties. For instance, Casillas and Simonet (2016) examined the production (and perception) of the English low vowels /æ/ and /ܤ/ by two groups of Spanish-English sequential bilinguals residing in Southern Arizona: Mexican Americans born and raised in the US Southwest by Spanish-speaking parents from Northern Mexico (i.e., native-born) and late English learners born and raised in Spanish-speaking countries, who moved to the US Southwest and lived there around 10 years (i.e., foreign-born). Both groups acquired Spanish as their L1, but the native-born speakers became more dominant in English as they grew up to the point that they were no longer able to actively communicate in Spanish. Unlike Spanish which has only one low vowel /a/, Southern Arizona English has two low vowels, /ܤ/ and /æ/, the latter of which is lowered in non-prenasal contexts, similar to California English (see Section 1.2). Casillas and Simonet (2016) found that, although the bilinguals were able to distinguish the two vowel categories, the phonetic realizations of these vowels were different from the local mainstream norms. Both groups produced more fronted /ܤ/ than English monolinguals, assimilating to the Spanish central /a/. Regarding /æ/, the foreign-born speakers produced this vowel more back than the English monolinguals (i.e., assimilation to the Spanish central /a/), whereas the native-born speakers produced it higher (i.e., weaker /æ/-lowering). These findings suggest that, even after shift to English occurs, the speech of native-born speakers may diverge from local mainstream norms either by demonstrating patterns that are traceable to their heritage language or by participating to a lesser extent in the sound change of the local mainstream variety.

From a developmental point of view, it is important to note that immigrant minority speakers who no longer speak their heritage language or have only passive knowledge of it may use speech patterns that differentiate themselves from speakers of other ethnicities. According to Labov (2001, p. 506), "[a]ll speakers who are socially defined as white, mainstream, or Euro-American, are involved in [regional sound] changes to one degree or another." Certain patterns of regional sound change may also appear in the speech of some ethnic minority speakers that American society defines as "non-white" (e.g., Black, Hispanic, Native American, Asian). However, it is unlikely that ethnic minority speakers converge with Anglo-Americans in all aspects of their speech (Labov 2001, p. 507). Rather, they often take a different trajectory in regional sound change. For instance, studies have shown that US-born Latinos tend to resist prenasal /æ/-raising (i.e., split between prenasal and non-prenasal /æ/) and /u/-fronting, which are features that occur in many varieties of American English (Carter et al. 2020; Fought 1999; Roeder 2010; Thomas 2001). Carter et al. (2020) examined the speech of Miami-born Latinos whose parents immigrated from various Latin American countries and found that their English /u/ was more back than that of Anglo-Americans. Moreover, although the Latino speakers distinguished prenasal and non-prenasal /æ/, their /æ/ in both environments were more back than their Anglo counterparts. Resistance to prenasal /æ/-raising and /u/-fronting, which may be due to influence from the Spanish low central vowel /a/ and high back vowel /u/, has also been reported in the speech of Mexican Americans in other regions (California: Fought (1999, 2003), Michigan: Roeder (2010), Texas: Thomas (2000, 2001), Washington DC: (Tseng 2015)), although in some cases these patterns are conditioned by social factors such as gender, social class, and group affiliation (Fought 1999; Roeder 2010; Tseng 2015).

Compared to African American English and Chicano English, little research has been conducted on English spoken by Asian Americans. However, studies have found that Asian Americans, like other ethnic minority speakers, show a combination of resistance and assimilation to local sound change (Cheng 2016; Hall-Lew 2009; Hall-Lew and Starr 2010; Hoffman 2010; Ito 2010; Lee 2000, 2016). For instance, Hall-Lew (2009) found that Chinese Americans in San Francisco participated in back vowel fronting and low back vowel merger, which are two sound changes that characterize California English. However, at the same time, Chinese Americans in this region produce coda /l/-vocalization (e.g., pronouncing *cold* and *skill* as *code* and *skew*), which most likely is due to influence from Chinese phonology that lacks syllable-final /l/ (Hall-Lew and Starr 2010). This pattern appears even in speakers beyond the second generation who are English monolingual speakers. Ito (2010) examined the vowels of Hmong Americans in the Twin Cities area in Minnesota, and found that 1.5 generation and second-generation speakers were accommodating to the local /æ/-fronting, but distinguished the low back vowels /ܤ/ and /ܧ/ more clearly than Anglo-Americans who showed a trend toward near-merger.

With regard to Korean Americans, which is the target population of this study, Cheng (2016) demonstrated that Korean Americans in California participated in some aspects of the California Vowel Shift (e.g., /u/-fronting and /ܤ/-/ܧ/ merger) to the same degree as Anglo-Americans, while in others their patterns were either more pronounced (e.g., /ݜ/-fronting) or less pronounced (e.g., split between prenasal and non-prenasal /æ/) than the Anglo-Americans. Lee (2016) found that Korean Americans in Bergen County, New Jersey, which borders New York City, maintained the /ܤ/-/ܧ/ contrast and raised /ܧ/ in accordance with New York City English, but they did not produce the New York City English split-/æ/ system (Labov et al. 2006). While these studies did not discuss Korean Americans' divergence from the white regional norms as influence from Korean phonology, it is possible that their less pronounced split-/ae/ system is related to Korean vowels which do not demonstrate such patterns. Thus, it is important to examine Korean-accented English of late bilinguals, particularly that of first-generation Korean immigrants, to see whether Korean Americans' resistance to sound change in local mainstream varieties has to do with their exposure to Korean and Korean-accented English.

#### *1.2. California Vowel Shift*

California English is easily exposed to people in other regions through television and movies, and the speech styles of stereotypical Southern California personae portrayed in the media (e.g., Valley Girl and Surfer Dude) are often parodied (Pratt and D'Onofrio 2017). A good example of this are The Californians skits from NBC's late-night comedy show Saturday Night Live (SNL). Pratt and D'Onofrio (2017) analyzed the vowels produced by two characters in The Californians, and found that the actors talked with more open and protruded jaws and lips to comedically portray the Valley Girl and Surfer Dude personae. The use of such articulatory settings resulted in the production of lower

and more retracted front vowels and more fronted back vowels when the actors played these characters than when they played non-Californian characters. Although without a doubt these performances are exaggerated, they reflect the vocalic changes that are underway in California, namely the California Vowel Shift.

Figure 1, created from data of millennial speakers reported in D'Onofrio et al. (2019) 1, demonstrates the vocalic changes involved in the California Vowel Shift. The California Vowel Shift is characterized by three main phenomena: (1) the low-back merger of /ܤ/) e.g., *bot*) and /ܧ/) e.g., *bought*), (2) the lowering and retraction of lax front vowels /ܼ/ (e.g., *bit*), /ܭ/) e.g., *bet*), and /æ/ (e.g., *bat*), and (3) the fronting of high- and mid-back vowels /u/ (e.g., *boot*), /ݜ/) e.g., *book*), /o/ (e.g., *boat*), and /ݞ/) e.g., *but*) (D'Onofrio et al. 2016; D'Onofrio et al. 2019; Hagiwara 1997; Hall-Lew 2009; Hall-Lew et al. 2015; Hinton et al. 1987; Kennedy and Grama 2012; Podesva et al. 2015). Following the pattern of General American English presented in The Atlas of North American English (Labov et al. 2006), prenasal /æ/ in California English is tensed, resulting in a split between tensed /æ/ in a prenasal context (e.g., *ban*) and lowered /æ/ elsewhere (Eckert 2008). With regard to the back vowels /u/ and /o/, the fronting is more advanced after a coronal consonant (e.g., *too* and *toe*) due to its high F2 (i.e., fronted) environment and prohibited when followed by the velarized coda /-l/ (e.g., *cool* and *goal*), because of its low F2 (i.e., retracted) environment (Hall-Lew 2011).

**Figure 1.** California Vowel Shift (adapted from D'Onofrio et al. (2019)).

While the California Vowel Shift has been understood as a chain shift affecting the front lax vowels /ܼ/, /ܭ/, and /æ/, the cause of the chain shift is under debate. Similar to the Canadian Vowel Shift in which /ܼ/, /ܭ/, and /æ/ are lowered due to the merging of /ܤ/ and /ܧ/) Clarke et al. 1995), the lowering of /ܼ/, /ܭ/, and /æ/ in California English may also be the result of a pull-chain initiated by the /ܤ/-/ܧ/ merger. However, Kennedy and Grama (2012) found that some young California English speakers demonstrated the chain-shifted lowering of the front lax vowels, while maintaining /ܤ/ in the traditional low-central position in the vowel space. Moreover, while both male and female speakers exhibited similar F1 values for /ܼ/ and /ܭ/, the female speakers produced higher F1 values (i.e., lower vowel height) for /æ/ than the male speakers. Since women generally are leaders of linguistic change (Coates 1993; Labov 1990; Milroy and Milroy 1985; Trudgill 1972), the gender difference indicates that /æ/ is the most recent step of the chain shift (Kennedy and Grama 2012). Thus, Kennedy and Grama (2012) suggested an alternative explanation to the chain shift which involves a push-chain initiated by the

<sup>1</sup> Figure 1 was created based on the data of millennial speakers reported in Table A1 in D'Onofrio et al. (2019). Note that BOOK-type tokens (i.e.,//ݜ/ (were not examined in D'Onofrio et al. (2019), thus, we added the fronting of//ݜ/in Figure 1 based on previous studies on the California Vowel Shift (e.g., Podesva et al. 2015; Pratt and D'Onofrio 2017).

lowering of /ܼ/, resulting in the lowering of /ܭ/ and subsequently the lowering of /æ/. This process is likely to be independent of the /ܤ/-/ܧ/ merger which in some cases occurs in the low-central position of /ܤ/) Kennedy and Grama 2012) and in other cases is not fully instantiated (Hall-Lew 2009).

Chain shifts are claimed to occur in order to maintain enough phonetic distance between phonemes in the vowel space so that they are perceptually distinctive (D'Onofrio et al. 2019; Gordon 2011; Martinet 1952). If a phoneme moves within the vowel space, this leads to subsequent phonetic movements of neighboring vowels. Thus, in order to identify the vowel that triggered the movement, it is important to examine the temporal establishment of the chain shift in real or apparent time (D'Onofrio et al. 2019; Gordon 2011; Labov 2010, p. 145). That is, speakers from a certain age group in a more recent time period or younger speakers should exhibit more advanced movements of the chain shift than speakers of the same age group in an older time period or older speakers. D'Onofrio et al. (2019) conducted an apparent time study, comparing the vowels produced by speakers of four generations which were determined based on their birth year: Silent Generation (1928–1945), Baby Boomer (1946–1964), Generation X (1965–1980), and Millennial (after 1980). Results showed that, across the span of four generations, the speakers exhibited an overall reduction of dispersion mainly in the F2 dimension (i.e., frontedness), demonstrating a more advanced backing of front vowels and fronting of back vowels in younger generations. Most of these changes (i.e., /ܤ/-/ܧ/ merger, backing of /ܭ/ and /æ/, and fronting of postcoronal /u/ and /o/) appeared between the Silent and the Baby Boomer generations, suggesting that the horizontal compression of the vowel space occurred contemporaneously. In subsequent generations, continued /æ/-backing and /ܤ/-/ܧ/merger were observed, as well as additional changes involving /ܼ/-backing, non-postcoronal /u/-fronting, lowering of /ܭ/, /æ/, and /i/, and raising of /ܤ/ and /ݞ/. These findings indicate that rather than a stepwise chain shift which has been previously claimed, the California Vowel Shift seems to show holistic compression of the vowel space. Phonologically speaking, this is contrary to the general tendency toward maximizing the phonetic space between phonemes as a means to maintain perceptual distinctiveness (Flemming 1996; Labov et al. 2006; Liljencrants and Lindblom 1972). Thus, D'Onofrio et al. (2019) proposed that the unexpected holistic compression at the root of the California Vowel Shift may be driven by speakers' projection of localized social meanings within a community (Eckert 1989; Fought 1999; Podesva 2011), not by purely phonological motivations. That is, it is possible that vowel space compression is achieved through speakers' manipulation of their articulatory settings (e.g., lowered jaw, protruded jaw and lips) (Pratt and D'Onofrio 2017) to index varied social meanings (e.g., young Californian, middle class membership, non-gang status, laid back, partier, urban, coastal) (D'Onofrio et al. 2019; Fought 1999; Podesva 2011; Podesva et al. 2015).

#### *1.3. Vowels in Korean and Comparison between Korean and American English Vowel Systems*

Modern South Korean has 7–8 monophthongs /i, e, (ܭ(, a,/ݞ/, o, ܺ,, u/ (Jang et al. 2015; Kang 2014; Kwak 2003; Lee 2000; Lee and Ramsey 2011; Yang 1996) 2. Due to recent merger of the mid-front vowels /e/ and /ܭ/, which is most likely caused by the raising of /ܭ/, many Koreans no longer distinguish these vowels (Baker and Trofimovich 2005; Kang 2014; Kwak 2003; Jang et al. 2015; Lee and Ramsey 2011; Yang 1996). Studies examining Korean vowel change in apparent time (Jang et al. 2015; Kang 2014) have shown that the Korean /e/ and /ܭ/ are produced with overlapping F1 and F2 values across ages, except for some older speakers<sup>3</sup> who produced them distinctly, supporting that young-generation Koreans have a seven-vowel system with one mid-front vowel /e/ (Kwak 2003).

<sup>2</sup> In some cases, front rounded vowels /y/ and /ø/ may be additionally observed in the speech of older generation speakers, but in modern South Korean these sounds are mostly replaced by the diphthongs [we] and [wi], respectively (Ahn and Iverson 2007; Kwak 2003; Jang et al. 2015).

<sup>3</sup> Kang (2014) specified these speakers as male speakers born before 1962 (i.e., birth-year-based), while in Jang et al. (2015), these speakers were male and female speakers in their 60s (i.e., age-based). Since the data in Jang et al. (2015) were collected between 2014 and 2015, we speculate that these speakers were born between 1945 and 1955.

Figure 2, created from data of Korean speakers in their 20s in Kang and Kong (2016) 4, demonstrates the Korean vowel space. In the high region of the vowel space, Korean has three vowels /i/, /ܺ, /, and /u/. While the Korean /i/ is acoustically similar to the corresponding English high front tense vowel /i/, the Korean /u/ is more back than the English high back tense vowel /u/ (Baker and Trofimovich 2005; Yang 1996; Yoon and Kim 2015). In fact, comparative studies of Seoul Korean and American English vowels have shown that the English /u/ is acoustically very similar to the Korean /ܺ, / (Baker and Trofimovich 2005; Yang 1996; Yoon and Kim 2015). The Korean /ܺ, /, especially in the Seoul dialect, is undergoing change in progress in which younger-generation Koreans produce this vowel more fronted than older-generation Koreans (Jang et al. 2015; Kang 2014; Kang and Kong 2016; Lee et al. 2017) 5. Thus, the overlap between the Korean /ܺ, / and the English /u/ may be due to the parallel fronting of the Korean /ܺ, / (Jang et al. 2015; Kang 2014; Kang and Kong 2016; Lee et al. 2017) and the English /u/ observed in most North American dialects (Labov et al. 2006). The Korean /u/ also exhibits fronting (Kang 2014; Lee et al. 2017), but not to the same extent as the Korean /ܺ, / and the English /u/.

**Figure 2.** Korean vowel space of young generation Koreans (adapted from Kang and Kong (2016)).

In the mid region of the vowel space, Korean has two vowels /e/ and /o/ (or three vowels for speakers who do not exhibit the /e/-/ܭ/ merger). Unlike the English /e/ and /o/ which are slightly diphthongized (i.e., [e/ܼ/ /ݜ/o [,] ]), the Korean /e/ and /o/ are purely monophthongal. In the case of the Korean /e/, it is also acoustically distinct from the other English mid-front vowel /ܭ/, in that the Korean /e/ is positioned higher and more fronted in the vowel space than the English /ܭ/) Baker and Trofimovich 2005). Rather than the English /ܭ/, the Korean /e/ is acoustically more similar to the English front lax vowel /ܼ/ (Baker and Trofimovich 2005). Both the English /ܭ/ and /ܼ/ are experiencing lowering and retraction in General American English, except for the South (Labov et al. 2006). As for the Korean /o/, this vowel is positioned very high in the vowel space, close to the Korean /u/(see Figure 2). Studies focusing on Seoul Korean have shown that the main distinction between the Korean /o/ and /u/ is in the front-back dimension (F2); the Korean /u/ is more fronted than the Korean /o/ (Jang et al. 2015; Kang and Kong 2016; Lee et al. 2017; Yoon and Kim 2015). These patterns were more clearly demonstrated in the speech of younger generation Koreans than older generation Koreans (Kang 2014; Kang and Kong 2016; Lee et al. 20176). Thus, the vowel change in Seoul Korean can be explained through a chain shift initiated by /o/-raising,

<sup>4</sup> Figure 2 was created by averaging the data of male and female speakers in their 20s reported in Table 1 in Kang and Kong (2016).

<sup>5</sup> In a cross-dialectal study, Lee et al. (2017) found that speakers of other Korean dialects (i.e., South Jeolla, South Gyeongsang, and Jaeju) showed converging patterns to Seoul Korean.

<sup>6</sup> Despite dialectal differences, Lee et al. (2017) showed that the /o/-raising in other Korean dialects demonstrates converging patterns to the Seoul dialect, similar to the case of the fronting of /ܺ, / and /u/.

which led to the fronting of /u/ and /ܺ, / (Kang 2014; Kang and Kong 2016; Lee et al. 2017). In the low region of the vowel space, Korean has one low-central vowel /a/ which is more fronted than the corresponding English low back vowel /ܤ/) Sohn 1999; Yang 1996). With regard to the Korean near-low back vowel /ݞ/, it is more back than the corresponding English /ݞ/) Yang 1996) which is centralized [ܣ[ in General American English, except for the Inland North (Labov et al. 2006).

Yang (1996) explained the cross-linguistic differences between Korean and English vowels through Lindblom's theory of adaptive dispersion (Lindblom 1990; Lindblom and Engstrand 1989), which proposes that speakers control sufficient perceptual contrast between phonemes, while monitoring a tradeoff between articulatory economy and perceptual distinctiveness. Yang (1996) argued that the English vowel space is characterized by the vertical expansion of low vowels resembling a rectangle, whereas the Korean vowel space is characterized by the horizontal expansion of high vowels resembling a triangle (Yang 1996). Using Lindblom (1990, p. 21) formula of perceptual distance, Yang (1996) demonstrated that the distance between the two extreme high vowels /i/ and /u/ was larger for Korean than for English. Given that Korean has more vowels in the high region of the vowel space (i.e., /i, ܺ,, u/) than English (i.e., /i, u/), English-like fronting of /u/ would be restricted for the Korean /u/, since this would lead to an overlap with the Korean /ܺ, /, causing perceptual confusion. The English /u/, on the other hand, is free to move forward without encroaching upon the space of other vowels (Yang 1996). With regard to the low region, Korean only has one vowel /a/, while English has two vowels /æ/ and /ܤ/. Thus, the Korean /a/ can be placed in the middle in the front-back dimension (i.e., corner of a regular triangle) without crowding into the space of other vowels, whereas the English /æ/ and /ܤ/ should be placed apart to maintain sufficient space between them. Yang (1996) also found that the perceptual distance between /i/ and /e/ and between /u/ and /o/ was larger for English than for Korean, which is in line with the predictions of the theory of adaptive dispersion (Lindblom 1990; Lindblom and Engstrand 1989). That is, the larger distance between the high and mid vowels in English is likely to be linked to English having intervening lax vowels /ܼ/ and /ݜ/, while Korean does not (Yang 1996).

Since Korean does not have tense-lax vowel contrasts like the English /i/-/ܼ/ and /u/-/ݜ/, Korean speakers often demonstrate difficulty in distinguishing these vowels (Baker and Trofimovich 2005; Flege et al. 1997). Baker and Trofimovich (2005) examined the acoustic properties of Korean and English vowels produced by early and late Korean-English bilinguals with varying length of residence in the US (i.e., 1 year and 7 years). They found that the late bilinguals, regardless of their length of residence in the US, produced the English /i/ and /ܼ/ and the English /u/ and /ݜ/ as two single categories which acoustically overlapped with the Korean /i/ (=English /i,/ܼ/) and the Korean /u/ or /ܺ, / (=English /u,/ݜ/(. The late bilinguals also produced the English /ܭ/ and /æ/ as a single category, but they were dissimilar from the Korean /e/ (merged with the Korean /ܭ/(. That is, the late bilinguals assimilated the English /i,/ܼ/ to the Korean /i/ and assimilated the English /u,/ݜ/ to the Korean /u/, while they created a new vowel category for the merged English /ܭ/-/æ/ (Baker and Trofimovich 2005). As for the early bilinguals, Baker and Trofimovich (2005) found that, regardless of the length of residence, they produced the English /i/ and /ܼ/ distinctly and their English /i/ overlapped with their Korean /i/. As for the other vowel contrasts, the early bilinguals who resided in the US for one year demonstrated similar patterns as the late bilinguals (i.e., merged English /u/-/ݜ/ and merged English /ܭ/-/æ/), whereas those with longer length of residence successfully distinguished these contrasts. For the latter group, the English /u/ overlapped with the Korean /u/, but the English //ݜ/ was produced as a separate category. Both early bilingual groups produced the English /ܼ/ as the Korean /e/. Thus, while the adult bilinguals who had three categories for the six English vowels /i,/ܼ/, u,/ݜ/, ܭ, æ/ (i.e., merged /i/-/ܼ/, merged /u/-//ݜ/, and merged /ܭ/-/æ/), the recently-arrived early bilinguals had four categories (i.e., /i/, /ܼ/, merged /u/-//ݜ/, and merged /ܭ/-/æ/), and the early bilinguals with longer length of residence distinguished all six vowels.

#### **2. The Present Study**

The goal of this study is to examine whether Korean Americans in Los Angeles participate in the California Vowel Shift. We focus on four vowel changes involved in the California Vowel Shift: (1) the lowering and retraction of front lax vowels /ܼ/, /ܭ/, and /æ/, (2) the split between non-prenasal /æ/ and prenasal /æN/, (3) the merging of low back vowels /ܤ/-/ܧ/, and (4) the fronting of /u/, /ݜ/, and /ݞ/. If Korean Americans do not exhibit the above-mentioned patterns, we explore whether their resistance to local sound change can be explained through influence from Korean phonology. To better understand this, we compare Korean Americans of three generations (first-generation, 1.5-generation, and second-generation) with Anglo-Californians and Korean international students who are late bilinguals. We predicted that, due to age effects, first-generation speakers (i.e., late bilinguals) would demonstrate stronger influence from Korean phonology than 1.5- and second-generation speakers (i.e., early bilinguals) and, thus, participate less in the California Vowel Shift. Regarding the early bilinguals, influence from Korean phonology, if any, would appear to a lesser extent for the second-generation speakers than for the 1.5-generation speakers.

With respect to patterns reflecting influence from Korean phonology, we base our predictions on the findings of Korean-English late bilinguals (Baker and Trofimovich 2005) and cross-linguistic differences between Korean and English vowels (Baker and Trofimovich 2005; Yang 1996; Yoon and Kim 2015). Regarding the lowering and retraction of /ܼ/, we predicted that, if influence from Korean phonology occurs, Korean Americans would merge this vowel with /i/, thus they would not participate in the lowering and retraction of /ܼ/. Moreover, Korean Americans would create a new single /ܭ/-/æ/ category in which /æ/ merges with /ܭ/) Baker and Trofimovich 2005). Thus, when examining these vowels separately, Korean Americans' /ܭ/ may demonstrate lowering and retraction, but their /æ/ would not, because it would be positioned higher in the vowel space due to the merger with /ܭ/. Additionally, in the case of the English /æ/, Korean Americans would not exhibit a split-/æ/ system, because Korean does not have an equivalent phonological pattern. With regard to the /ܤ/-/ܧ/ merger, no Korean vowel acoustically overlaps with any of these vowels. The closest vowels are the Korean /a/ (low central) and /ݞ/) near-low back) (Baker et al. 2002; Trofimovich et al. 2011; Tsukada et al. 2005). Thus, we predicted that, if influence from Korean occurs, they would either assimilate both the English /ܤ/ and the English /ܧ/ to the Korean /a/ (Outcome 1: Participation in /ܤ/-/ܧ/ merger but more fronted than expected) or distinguish them by assimilating the English /ܤ/ to the Korean /a/ and assimilating the English /ܧ/ to the Korean /ݞ/) Outcome 2: No participation in /ܤ/-/ܧ/ merger). As for the fronting of /u/ and /ݜ/, Korean Americans would not participate or participate less in the fronting of these vowels, because they would assimilate both vowels to the Korean /u/ (Baker and Trofimovich 2005) which is fronted, but not to the same extent as in English (Kang 2014; Yang 1996). Similarly, Korean Americans' English /ݞ/ would be produced more back due to influence from the Korean /ݞ/) Yang 1996).

#### *2.1. Participants*

In total, 37 Korean Americans, 4 Korean international students, and 5 Anglo-Americans participated in the study. The Korean Americans were residents of Los Angeles County in Southern California and consisted of three immigrant generations: first-generation (GEN1), 1.5-generation (GEN1.5), and second-generation (GEN2). The language background of each group is presented in Table 1. The GEN1 group (N = 8; 4F, 4M) were Koreans born and raised in South Korea (Seoul: 6, Daegu: 1, Yeongju: 17) who immigrated to the US as adults (25.8 years). They had spent an average of 26.8 years in California at the time of data collection and spoke both English and Korean on a daily basis (English: 53.7%, Korean: 46.3%). They reported that they learned English (L2) during middle school or high school (12.8 years) and rated their English intermediate-level proficiency (2.7) on a 5-point Likert

<sup>7</sup> Both Daegu and Yeongju are cities in North Gyeongsang region in South Korea.

scale (5 = native). The GEN1.5 group (N = 4; 1F, 3M) were also Koreans born in South Korea (Seoul: 1, unspecified: 3), but they immigrated to the US in late childhood (10.5 years). As the GEN1 group, the GEN1.5 speakers learned English as an L2 and lived in the US for a long period of time (13.3 years). However, they used English (90%) much more frequently than Korean (10%), and rated their English (5 out of 5) as proficient as or more proficient than their native Korean (4.3). The GEN2 group (N = 25; 15F, 10M) were Koreans who were either born in Los Angeles County (N = 19) or born in South Korea (N = 6; Seoul: 4, Daegu: 1, unspecified: 1) and moved to Los Angeles County at age 3 or younger. All of their parents were first-generation immigrants from Korea (Seoul and Gyeonggi: 13, Gyeongsang: 2, Jeolla: 1, Chungcheong: 1, mixed: 2, unspecified: 68). Seventeen of them were Korean-English early sequential bilinguals, while eight speakers acquired both languages simultaneously. Similar to the GEN1.5 group, the GEN2 speakers used English most of the time (English: 83.2%, Korean: 16.8%), but unlike the GEN1.5 group they rated their English proficiency (5 out of 5) much higher than their Korean proficiency (2.9 out of 5). Both the GEN1.5 and the GEN2 groups reported that, growing up, Korean was the main language of communication at home.

In this study we also included 4 Korean international students (KOR) as a baseline for Korean-accented English. The KOR speakers were born and raised in Seoul, South Korea, and came to the US to complete their undergraduate or graduate studies. These speakers were very similar to the GEN1 speakers in that they arrived in the US as adults (22.8 years), were late L2 learners of English (12.5 years), and used both English and Korean on a daily basis (English: 47.5%, Korean: 52.5%). However, compared to the GEN1 speakers, they spent less time in the US (4 years). Thus, if phonetic transfer from L1 Korean to L2 English appears, it would be strongest in this group. Lastly, 5 Anglo-Americans (2F, 3M) participated in this study as a control group for California English. All of these speakers were born and raised in Southern California (2 in Los Angeles County, 3 in San Diego County).

All participants read and signed a written informed consent form before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki and the protocol was approved by the School of Literatures, Cultures, and Linguistics Institutional Review Board (SLCL-IRB) of the University of Illinois at Urbana-Champaign.


**Table 1.** Participants' language backgrounds (standard deviation is presented in parentheses).

#### *2.2. Data Collection and Analysis*

Participants' English speech data were collected during the spring and summer of 2012. In order to elicit different speech styles, we conducted a reading task (i.e., controlled speech) and a picture description task (i.e., narrative speech). For the reading task the participants read out loud a passage from Aesop's fables *The North Wind and the Sun* and for the picture description task they narrated the story of a wordless picture book *Frog, Where are You?* (Mayer 1969). In this study we will only

<sup>8</sup> The areas are divided by regions, indicated by the suffix *Do* in Korean, in which different varieties of Korean are spoken.

report our findings from the narrative speech. Speech productions were audio-recorded using an AKG C520 head-mounted microphone and a Zoom H4n digital recorder with a sampling rate of 44.1 kHz and a sample size of 16 bits. The recordings were conducted in a quiet enclosed space in various locations in Los Angeles County (e.g., participants' home, furnished room in a church). In the case of two KOR speakers and two CA speakers, the recordings were conducted in a sound-attenuated booth at a public university in Illinois (KOR: 2, CA: 1)9 and in Arizona (CA: 1)10. After completing the tasks, the participants filled out a language background questionnaire.

Participants' speech was orthographically transcribed on Praat TextGrids (Boersma and Weenink 2020) and forced alignment was performed using the Montreal Forced Aligner (McAuliffe et al. 2017), which generated a word tier and a phone tier. We extracted the F1 and the F2 values at the midpoint of 9 English monophthongs /i,/ܼ/, ܭ, æ, ܤ/, ܧ/,/ݞ/,/ݜ/, u/ and the durations of these vowels using a Praat script. For convenience purposes, we classified the vowels using the ARPAbet symbols: IY (=/i/), IH (=//ܼ/), EH (=/ܭ/(, AE (=/æ/), AA (=/ܤ/(, AO (=/ܧ/(, AH (=//ݞ/(, UH (=//ݜ/(, and UW (=/u/). For AE, we further divided them into AE and AEN based on whether they preceded a non-nasal or a nasal consonant. A total of 28,948 tokens were obtained. In this study, we only considered vowels with primary stress and excluded vowels produced in fillers (e.g., *um*) or monosyllabic function words (e.g., *in*). Any tokens that were misaligned, too short in duration (<50 ms), or were produced with a creak, laughter, or background noise were excluded from the analyses. Moreover, in order to ensure reliable boundaries between the vowels and their neighboring sounds, we additionally excluded tokens following vowels, glides, or /r/, or tokens preceding vowels, glides, or liquids (Podesva et al. 2015). After this process, 2690 tokens remained (IY: 357, IH: 302, EH: 410, AE(N): 319, AA: 202, AO: 439, AH: 289, UH: 247, UW: 125). Raw F1 and F2 values (Hz) of these tokens were converted to a bark scale (Traunmüller 1997) and then normalized using Lobanov (1971) z-score procedure in the *phonR* package (McCloy 2016) in R (R Development Core Team 2020).

All statistical analyses and data visualizations were conducted using R (R Development Core Team 2020). In this study we examined the following patterns involved in the California Vowel Shift: (1) lowering and retraction of IH, EH, and AE, (2) AE-AEN split, (3) AA-AO merger, and (4) fronting of UW, UH, and AH. For the first three patterns (i.e., lowering/retraction, split, and merger), we analyzed both vowel height (F1) and frontedness (F2) and for the last pattern (i.e., fronting), we only analyzed vowel frontedness (F2). For statistical analyses, we performed linear mixed effects modeling in the *lme4* package (Bates et al. 2015). All fixed effects were contrast coded using simple coding in which each level is compared to the reference level and the intercept is the grand mean. The fixed and random effects of the model used in each pattern are presented in the following section. The best fitting model was selected through backward elimination and model comparisons were done with likelihood ratio tests using the *anova()* function. Although examining variation in gender was not the main purpose of the study, we included gender as a fixed effect in all models, due to its important role in sound change (i.e., female speakers are generally the leaders of linguistic change) (Coates 1993; Labov 1990; Milroy and Milroy 1985; Trudgill 1972). The p-values were obtained via Satterthwaite approximation in the *lmerTest* package (Kuznetsova Alexandra and Rune H. B. 2017). When significant interactions were found, we conducted post-hoc pairwise comparisons in the *emmeans* package (Lenth 2020).

#### **3. Results**

#### *3.1. Phonemic Status of English Vowels*

Figure 3 demonstrates the vowel space by group and gender based on their mean normalized F1 and F2 values of each vowel. Before looking into the patterns of the California Vowel Shift, we first

<sup>9</sup> The two KOR speakers and the CA speaker had spent 2 years, 2 years, and 1 year, respectively, in Illinois at the time of data collection.

<sup>10</sup> The CA speaker was temporarily visiting Arizona at the time of data collection, but was residing in Los Angeles County.

examined the phonemic status of the vowels in each group and checked whether the Korean Americans distinguished the vowel pairs IY-IH, EH-AE, UW-UH, and AH-AO. We analyzed the normalized F1 and F2 values to compare the height and the frontedness of the vowel pairs. We performed linear mixed effects modeling with vowel and gender as fixed effects and participant and item as random effects. Table 2 summarizes the statistical significance of the vowel contrasts in each group.


**Table 2.** Summary of the statistical results of English vowel contrasts.

\*\*\*: *p* < 0.001, \*\*: *p* < 0.01, \*: *p* < 0.05, n.s.: non-significant.

There was a clear distinction between the GEN2 and the GEN1.5 speakers (i.e., early bilinguals) and the GEN1 speakers (i.e., late bilinguals), in that the former groups patterned like the CA speakers and the latter group patterned like the KOR speakers. The CA speakers and the early bilinguals successfully distinguished all four vowel contrasts using vowel height and frontedness except in the case of the UH-UW contrast which was distinguished by just vowel frontedness. In comparison, the late bilinguals were able to distinguish only the front vowel contrasts either using vowel height (i.e., EH-AE) or vowel frontedness (i.e., IY-IH), while failing to distinguish the back vowel contrasts UH-UW and AH-AO.

**Figure 3.** Normalized F1-F2 space by group and gender.

#### *3.2. Lowering and Retraction of IH, EH, and AE*

For the lowering and retraction of front lax vowels IH, EH, and AE, we examined the effects of group and gender on the F1 (vowel height) and the F2 (vowel frontedness) of each of these vowels. We performed linear mixed effects modeling with group (CA, GEN2, GEN1.5, GEN1, KOR), gender (female, male), and the interaction between group and gender as fixed effects and participant and item as random effects. For IH, we additionally included the following segment (nasal, non-nasal) as a fixed effect, given that studies have shown that this vowel becomes raised to IY before a nasal consonant (e.g., *thing*) (Hinton et al. 1987). We also included the nasality of the following segment as

a fixed effect for the analysis of EH. As for AE, only tokens preceding a non-nasal consonant were examined. Further analysis of the effect of the nasality of the following segment on the realization of AE (i.e., AE-AEN split) will be presented in Section 3.3. The best-fitting models included random intercepts for subject and item without any random slope, except for the F1 of IH, which included a by-item random slope for gender.

Results of IH showed that there was a main effect of group (GEN2) on both the F1 (β = 0.375, SE = 0.151, t = 2.48, *p* < 0.01) and the F2 (β = 0.269, SE = 0.117, t = 2.307, *p* < 0.05), suggesting that the GEN2 speakers produced IH significantly lower<sup>11</sup> and more fronted than the CA speakers (i.e., reference level). The GEN1 and the KOR speakers also produced this vowel significantly more fronted than the CA speaker (GEN1: β = 0.76, SE = 0.137, t = 5.551, *p* < 0.001; KOR: β = 0.587, SE = 0.167, t = 3.509, *p* < 0.001). With regard to the effect of gender, overall, the female speakers (i.e., reference level) produced IH significantly lower and more fronted than the male speakers (F1: β = −0.46, SE = 0.128, t = −3.582, *p* < 0.001; F2: β = −0.497, SE = 0.093, t = −5.343, *p* < 0.001). For the F2, we found a significant interaction between group (GEN2) and gender (β = −0.548, SE = 0.233, t = −2.355, *p* < 0.05) and an interaction approaching significance between group (GEN1) and gender (β = −0.504, SE = 0.272, t = −1.852, *p* = 0.071). That is, the gender difference was larger for the GEN2 and the GEN1 speakers than for the CA speakers. Figure 4 presents the normalized F2 of IH, EH, and AE across groups and genders. Higher F2 values indicate more fronted realizations. Post-hoc pairwise comparison results confirmed that, while the CA speakers did not show significant difference between female and male speakers, the female GEN2 speakers and the female GEN1 speakers produced IH significantly more fronted than their male counterparts (GEN2: β = 0.814, SE = 0.09, t = 9.079, *p* < 0.001; GEN1: β = 0.769, SE = 0.165, t = 4.655, *p* < 0.01). Among the female speakers, the GEN1 speakers and the KOR speakers demonstrated the most fronted realizations of IH. When comparing across groups, the GEN1 speakers produced significantly more fronted IH than all the other groups, except for the KOR speakers (GEN1 vs. CA: β = −1.012, SE = 0.186, t = −5.435, *p* < 0.001; GEN1 vs. GEN2: β = −0.469, SE = 0.128, t = −3.65, *p* < 0.05; GEN1 vs. GEN1.5: β = −1.226, SE = 0.22, t = −5.584, *p* < 0.001). The KOR speakers also produced (marginally) significantly more fronted IH compared to the CA speakers (β = −0.762, SE = 0.236, t = −3.23, *p* = 0.064) and the GEN1.5 speakers (β = −0.976, SE = 0.261, t = −3.745, *p* < 0.05). Moreover, the GEN2 speakers produced significantly more fronted IH than the GEN1.5 speakers (β = 0.758, SE = 0.195, t = 3.891, *p* < 0.05) and the CA speakers (β = −0.544, SE = 0.155, t = −3.502, *p* < 0.05). The F2 of the GEN1.5 and the CA speakers did not differ. Therefore, female speakers' IH frontedness can be summarized with the following order: KOR, GEN1 > GEN2 > GEN1.5, CA. No group difference was found among the male speakers, except for the difference between the GEN1 and the GEN2 speakers, in which the former produced significantly more fronted IH than the latter (β = −0.513, SE = 0.14, t = −3.671, *p* < 0.05). Regarding the effect of the following segment, we found that overall, IH preceding a nasal consonant (i.e., reference level) were significantly more fronted than in other contexts (β = 0.183, SE = 0.084, t = 2.169, *p* < 0.05), but they did not differ in height.

With regard to EH, there was a main effect of group (GEN1) on both the F1 (β = −0.397, SE = 0.172, t = −2.311, *p* < 0.05) and the F2 (β = 0.306, SE = 0.11, t = 2.796, *p* < 0.01), which suggests that the GEN1 speakers produced EH significantly higher and more fronted than the CA speakers (i.e., reference level). The GEN2 speakers also produced EH significantly more fronted than the CA speakers (GEN2: β = 0.249, SE = 0.09, t = 2.777, *p* < 0.05), but the vowel height did not differ between the two groups. As found above, the female speakers (i.e., reference level) produced EH lower and more fronted than the male speakers (F1: β = −0.777, SE = 0.116, t = −6.723, *p* < 0.001; F2: β = −0.605, SE = 0.074, t = −8.164, *p* < 0.001). Moreover, we found a significant interaction between gender and

<sup>11</sup> In fact, among the five groups, GEN2 speakers' IH was produced with the lowest vowel height. Results of the same model with the GEN2 speakers as the reference group (instead of the CA speakers) confirmed that, except for the GEN1.5 speakers, the GEN2 speakers produced IH significantly lower than the other groups (CA: β = −0.375, SE = 0.151, t = −2.48, *p* < 0.05; GEN1: β = −0.699, SE = 0.124, t = −5.622, *p* < 0.001; KOR: β = −0.603, SE = 0.17, t = −3.556, *p* < 0.001).

group (KOR) for the F1 (β = 0.932, SE = 0.406, t = 2.295, *p* < 0.05), which indicates that the gender difference was larger for the KOR speakers than for the CA speakers. Post-hoc pairwise comparison results confirmed that female and male KOR speakers' EH did not differ in vowel height, whereas the female CA speakers produced this vowel significantly lower than the male CA speakers (β = 1.034, SE = 0.258, t = 4.005, *p* < 0.05). (Marginally) Significant gender differences were also found in the GEN2 (β = 1.017, SE = 0.117, t = 8.727, *p* < 0.001) and the GEN1 speakers (β = 0.737, SE = 0.223, t = 3.302, *p* = 0.054). The post-hoc test results also showed that the female GEN2 speakers produced EH with similar vowel height as the female CA and the female GEN1.5 speakers, whereas they produced EH significantly lower than the female GEN1 (β = 0.639, SE = 0.179, t = 3.577, *p* < 0.05) and the female KOR speakers (β = 0.774, SE = 0.23, t = 3.362, *p* < 0.05). No group difference in vowel height was found among the male speakers. For the F2, we found significant interactions between gender and all Korean groups, except for the GEN1.5 speakers (GEN2: β = −0.413, SE = 0.179, t = −2.308, *p* < 0.05; GEN1: β = −0.641, SE = 0.217, t = −2.954, *p* < 0.01; KOR: β = −0.533, SE = 0.261, t = −2.045, *p* < 0.05). This suggests that the gender difference was larger for these groups than for the CA speakers. Post-hoc pairwise comparison results confirmed that the female speakers in these groups produced EH significantly more fronted than the male speakers (GEN2: β = 0.623, SE = 0.074, t = 8.431, *p* < 0.001; GEN1: β = 0.851, SE = 0.143, t = 5.945, *p* < 0.001; KOR: β = 0.743, SE = 0.206, t = 3.6, *p* < 0.05), whereas no gender difference was found for the CA speakers. The post-hoc test results also revealed that the female CA speakers produced EH significantly less fronted than the female GEN2 (β = −0.456, SE = 0.131, t = −3.489, *p* < 0.05) and the female GEN1 speakers (β = −0.627, SE = 0.161, t = −3.888, *p* < 0.05). No group difference in F2 was found for the male speakers. Lastly, there was no main effect of the following segment on either measures.

**Figure 4.** Normalized F2 across groups: (**a**) IH; (**b**) EH; (**c**) AE.

Regarding (non-prenasal) AE, similar to the case of EH, we found a main effect of group (GEN1) on both the F1 (β = −0.613, SE = 0.293, t = −2.092, *p* < 0.05) and the F2 (β = 0.557, SE = 0.183, t = 3.049, *p* < 0.01), indicating that the GEN1 speakers produced AE significantly higher and more fronted than the CA speakers (i.e., reference level). Consistent with the findings of IH and EH, compared to the male speakers, the female speakers (i.e., reference level) produced AE lower (β = −0.897, SE = 0.193, t = −4.656, *p* < 0.001) and more fronted (β = −0.423, SE = 0.12, t = −3.552, *p* < 0.01). There was a significant interaction between gender and group (GEN1) for the F2 (β = −0.755, SE = −2.07, *p* < 0.05), which indicates that the gender difference was larger for the GEN1 speakers than for the CA speakers. Post-hoc pairwise comparison results confirmed that the female GEN1 speakers, but not the female CA speakers, produced AE significantly more fronted than their male counterparts (β = 0.863, SE = 0.234, t = 3.684, *p* < 0.05). Additionally, a significant gender difference was found in the GEN2 speakers (β = 0.534, SE = 0.113, t = 4.714, *p* < 0.01). The post-hoc test results also showed that, compared to the female CA speakers, female GEN1 speakers' AE was more fronted, although the difference was

marginally significant (β = −0.934, SE = 0.283, t = −3.301, *p* = 0.056). No group difference in F2 was observed for the male speakers.

#### *3.3. AE-AEN Split*

Figure 5 demonstrates the normalized F1 and F2 values of non-prenasal AE and prenasal AEN across groups. In order to test whether the height (F1) and the frontedness (F2) of these two vowel types were significantly different across groups and genders, we performed linear mixed effects modeling with vowel type (AE, AEN), group (CA, GEN2, GEN1.5, GEN1, KOR), gender (female, male), and the interactions among vowel type, group, and gender as fixed effects and participant and item as random effects. The best fitting models included random intercepts for subject and item with a by-item random slope for gender for the F1 and with a by-subject random slope for vowel for the F2.

**Figure 5.** AE and AEN across groups: (**a**) Normalized F1; (**b**) Normalized F2.

Results showed that there were main effects of vowel type for both the F1 (β = −0.718, SE = −0.118, t = −6.097, *p* < 0.001) and the F2 (β = 0.515, SE = 0.097, t = 5.315, *p* < 0.001), which suggests that overall AEN was produced significantly higher and more fronted than AE (i.e., reference level). We also found significant interactions between vowel type and all Korean groups in both measures. That is, the difference between AE and AEN was larger for the CA speakers than for the GEN2 (F1: β = 0.794, SE = 0.182, t = 4.356, *p* < 0.001; F2: β = −0.693, SE = 0.172, t = −4.021, *p* < 0.001), the GEN1.5 (F1: β = 0.86, SE = 0.267, t = 3.222, *p* < 0.01; F2: β = −0.531, SE = 0.242, t = −2.19, *p* < 0.05), the GEN1 (F1: β = 1.185, SE = 0.248, t = 4.783, *p* < 0.001; F2: β = −1.001, SE = 0.221, t = −4.529, *p* < 0.001), and the KOR speakers (F1: β = 1.253, SE = 0.369, t = 3.391, *p* < 0.001; F2: β = −1.07, SE = 0.314, t = −3.407, *p* < 0.01). Results of post-hoc pairwise comparisons revealed that, compared to AEN, the CA and the GEN2 speakers produced AE significantly lower (CA: β = 1.537, SE = 0.188, t = 8.179, *p* < 0.001; GEN2: β = 0.743, SE = 0.096, t = 7.712, *p* < 0.001) and more fronted (CA: β = −1.174, SE = 0.169, t = −6.944, *p* < 0.001; GEN2: β = −0.481, SE = 0.086, t = −5.606, *p* < 0.001). The GEN1.5 speaker also showed slightly higher and more fronted AEN than AE, but the difference did not reach statistical significance (F1: β = 0.676, SE = 0.222, t = 3.041, *p* = 0.077; F2: −0.643, SE = −3.333, *p* = 0.057). The late learners (i.e., GEN1 and KOR) did not distinguish the two vowel types. The effect of gender on the F1 and the F2 maintained in the combined AE-AEN data (F1: β = −0.943, SE = 0.178, t = −5.3, *p* < 0.001; F2: β = −0.561, SE = 0.132, t = −4.244, *p* < 0.001). For the F2, there was a marginally significant interaction between gender and group (GEN2) (β = −0.665, SE = 0.324, t = −2.051, *p* = 0.05) and a significant difference between gender and group (GEN1) (β = −1.062, SE = 0.379, t = −2.803, *p* < 0.01). That is, the gender difference was larger for these speakers than for the CA speakers. According to the results of the post-hoc pairwise comparisons, the gender difference was significant for both the GEN2 (β = 0.603, SE = 0.126, t = 4.793, *p* < 0.01) and for the GEN1 speakers (β = 1, SE = 0.235, t = 4.256, *p* < 0.01), while the CA speakers

did not show any gender difference. We also found a three-way interaction among vowel type, group (GEN1.5), and gender, suggesting that the interaction between vowel type and group (GEN1.5) showed a different pattern between female and male speakers. We further examined this by running separate models for each gender and found that, while significant interaction between vowel type and group (GEN1.5) appeared in the male data (β = −1.114, SE = 0.198, t = −5.635, *p* < 0.001), it did not appear in the female data. Given that we only had one female speaker in the GEN1.5 group, these results should be interpreted with caution.

#### *3.4. AA-AO Merger*

Figure 6 demonstrates the normalized F1 and F2 values of AA and AO across groups. In order to test whether these two vowels are produced as one category, we performed linear mixed effects modeling with vowel (AA, AO), group (CA, GEN2, GEN1.5, GEN1, KOR), gender (female, male), and the interaction among vowel, group, and gender as fixed effects and participant and item as random effects. The best fitting models included random intercepts for subject and item with a by-subject random slope for vowel for the F1 and with a by-item random slope for gender for the F2.

**Figure 6.** AA and AO across groups: (**a**) Normalized F1; (**b**) Normalized F2.

Results showed that there was a main effect of vowel on both the F1 and the F2, which suggests that overall AA (i.e., reference level) was produced significantly lower and more fronted than AO (F1: β = −0.291, SE = 0.094, t = −3.102, *p* < 0.01; F2: β = −0.289, SE = 0.073, t = −3.947, *p* < 0.001). A main effect of group (GEN2) was found (β = 0.221, SE = 0.094, t = 2.354, *p* < 0.05) for the F2, indicating that overall the GEN2 speakers produced the vowels significantly more fronted than the CA speakers (i.e., reference level). For the F1, there was a marginally significant interaction between vowel and group (GEN1) (β = −0.419, SE = 0.208, t = −2.014, *p* = 0.055). Post-hoc pairwise comparison results revealed that the GEN1 speakers produced AA significantly lower than AO (β = 0.607, SE = 0.15, t = 4.051, *p* < 0.01), whereas the CA speakers did not distinguish these vowels. With regard to the F2, we found significant interactions between vowel and group (GEN2) (β = 0.171, SE = 0.072, t = 2.392, *p* < 0.05) and between vowel and group (GEN1) (β = −0.223, SE = 0.09, t = −2.468, *p* < 0.05). That is, the difference in F2 between AA and AO of the CA speakers was larger than the GEN2 speakers and smaller than the GEN1 speakers. Post-hoc pairwise comparison results revealed that the CA speakers and the early bilinguals (i.e., GEN2, GEN1.5) produced AA and AO similarly, whereas the late bilinguals distinguished them (GEN1: β = 0.421, SE = 0.09, t = 4.659, *p* < 0.001; KOR: β = 0.417, SE = 0.127, t = 3.288, *p* < 0.05). AA and AO did not differ in any measures across groups, except between the GEN2 and the GEN1 speakers in the production of AO (F1: β = 0.707, SE = 0.155, t = 4.549, *p* < 0.01; F2: β = 0.288, SE = 0.079, t = 3.629, *p* < 0.05). That is, the GEN2 speakers produced this vowel significantly lower and more fronted than the GEN1 speakers. The GEN2 speakers also produced AO more fronted than the CA speakers, in which the difference approached statistical significance (β = 0.306, SE = 0.093, t = 3.289, *p* = 0.061). Lastly, the female speakers produced the vowels significantly

lower and more fronted than the male speakers (F1: β = −0.291, SE = 0.094, t = −3.102, *p* < 0.01; F2: β = −0.467, SE = 0.084, t = −5.55, *p* < 0.001).

#### *3.5. Fronting of UW, UH, and AH*

For the fronting of UW, UH, and AH, we examined the effects of group and gender on the frontedness (F2) of each of these vowels. We performed linear mixed effects modeling with group (CA, GEN2, GEN1.5, GEN1, KOR), gender (female, male), and the interaction between group and gender as fixed effects and participant and item as random effects. In the case of UW, since there was no male token in the data, we did not include the interaction between group and gender as a fixed effect. Additionally, we included the previous segment (coronal, non-coronal) as a fixed effect, since studies have shown that these vowels demonstrate more advanced fronting after coronal consonants than in other phonological environments (D'Onofrio et al. 2019; Podesva et al. 2015). The best-fitting models included random intercepts for subject and item without any random slope.

Results showed that group (GEN2) had an effect on the frontedness of UH (β = 0.581, SE = 0.171, t = 3.393, *p* < 0.01) and AH (β = 0.394, SE = 0.135, t = 2.922, *p* < 0.01). That is, compared to the CA speakers (i.e., reference level), the GEN2 speakers produced these vowels more fronted. No difference was found between the two groups in the production of UW. Lastly, regarding gender, the female speakers produced the three vowel types significantly more fronted than the male speakers (UW: β = −0.431, SE = 0.199, t = −2.159, *p* < 0.05; UH: β = −0.765, SE = 0.144, t = −5.318, *p* < 0.001; AH: β = −0.568, SE = 0.105, t = −5.415, *p* < 0.001). No significant interaction was found between group and gender in UH and AH. Moreover, no effect of previous segment was found in any of the vowels.

#### **4. Discussion**

#### *4.1. E*ff*ect of Age of Arrival on the Participation in Local Sound Change*

In this study we examined the effect of age of arrival on Korean Americans' participation in the California Vowel Shift. We compared the speech of Korean Americans of three generations who clearly differed in the age of arrival to Los Angeles: first-generation (GEN1) (i.e., adulthood), 1.5-generation (GEN1.5) (i.e., late childhood), and second-generation (GEN2) (i.e., early childhood). We predicted that, despite their long residence in the US (average 26.8 years), the GEN1 speakers would show signs of L1 Korean influence in their speech, similar to Korean international students (KOR) who spent less time in the US (average 4 years). On the other hand, younger generation Koreans (i.e., GEN1.5 and GEN2) would perform more similarly to Anglo-Californians (CA) than the KOR speakers and influence from Korean phonology, if any, would appear to a lesser extent for the GEN2 speakers than for the GEN1.5 speakers.

We examined four main patterns of the California Vowel Shift: (1) lowering and retraction of IH, EH, and AE, (2) AE-AEN split, (3) AA-AO merger, and (4) fronting of UW, UH, and AH. For each vowel, we predicted the outcomes based on previous findings of Korean-English late bilinguals and cross-linguistic studies between Korean and English vowels (Baker and Trofimovich 2005; Baker et al. 2002; Trofimovich et al. 2011; Tsukada et al. 2005; Yang 1996; Yoon and Kim 2015).

With regard to IH, as Korean does not have a high front lax vowel, Korean-English late bilinguals tend to merge this vowel to IY which is almost identical to the Korean /i/ (Baker and Trofimovich 2005; Yang 1996). Thus, we predicted that, if influence from Korean phonology occurs, Korean Americans would merge IH with IY and, thus, would not participate in the lowering and retraction of IH. Results showed that the early bilinguals (i.e., GEN2 and GEN1.5) aligned with the CA speakers in that they distinguished the IY-IH contrast using both vowel height and frontedness. On the other hand, the late bilinguals (i.e., GEN1 and KOR) patterned similarly to each other; they only used vowel frontedness to distinguish the contrast. Although the GEN1 and the KOR speakers did not completely merge IH with IY, which would have been a strong indication of Korean influence, their IH approached IY in the vowel space (see Figure 3). Indeed, when comparing the difference of the average normalized

F2 (i.e., vowel frontedness) between IY and IH across groups, the late bilinguals demonstrated smaller differences (GEN1: 0.11, KOR: 0.44) than the CA speakers (1.08) and the early bilinguals (GEN2: 0.89, GEN1.5: 1.35). Moreover, we found that both the GEN1 and the KOR speakers produced IH more fronted than the CA speakers. The GEN2 speakers also produced this vowel more fronted than the CA speakers. However, unlike the GEN1 and the KOR speakers, we do not believe that this is due to influence from Korean phonology, given that the GEN2 speakers additionally produced this vowel lower than the CA speakers. If the Korean /i/ had an effect on GEN2 speakers' IH, it would have demonstrated higher vowel height than CA speakers' IH. In fact, among the five groups, GEN2 speakers' IH was produced with the lowest vowel height (see Footnote 12). Rather than influence from Korean phonology, GEN2 speakers' divergence from the CA speakers could be explained through the nature of their vowel space. The vowel space of the GEN2 speakers was overall more fronted than that of the CA speakers.

Regarding EH and AE, we made different predictions for the two vowels. While Korean does not have any vowel that acoustically overlaps with either EH or AE, Korean-English bilinguals often identify both vowels as the Korean /e/ (Baker et al. 2002; Trofimovich et al. 2011), which is positioned higher in the vowel space (Baker and Trofimovich 2005; Yoon and Kim 2015). However, instead of assimilating the merged EH-AE category to the Korean /e/, Korean-English bilinguals, especially late bilinguals, tend to demonstrate two categories in the front mid/low region of the vowel space: the Korean /e/ and a single EH-AH category in which AE merges with EH (Baker and Trofimovich 2005). Therefore, we predicted that, if influence from Korean occurs, Korean Americans would merge AE with EH; their EH may demonstrate lowering and retraction, but their AE would not, because it would be positioned higher in the vowel space due to the merger with EH. Contrary to our prediction, the Korean Americans in our study maintained the EH-AE contrast regardless of their age of arrival to the US. As in the case of IH, we found a different pattern between the early bilinguals and the late bilinguals. The GEN2 and the GEN1.5 speakers performed like the CA speakers in that they produced EH and AE distinctly using both vowel height and frontedness. On the other hand, the GEN1 speakers and the KOR speakers kept the EH-AE contrast using only one measure (i.e., vowel height).

While the GEN1 speakers distinguished the EH-AH contrast, they produced both vowels higher and more fronted than the CA speakers, indicating that these speakers did not participate in the lowering and retraction of EH and AE. The GEN2 speakers also produced EH more fronted than the CA speakers, but no difference in vowel height was found between the two groups. As mentioned above, we believe that is due to their overall more fronted vowel space compared to the CA speakers. Although participants' birth year was not the main focus of our study, it is worth pointing out that the GEN1 speakers were overall older than the other groups (see Table 1). Although we formulated our predictions with the assumption that the merger between the Korean /e/ and /ܭ/ is established across most age groups (Kang 2014; Jang et al. 2015), it is possible that the GEN1 speakers arrived in the US during the time when the merger was still in progress. For instance, Yang (1996) demonstrated that Korean male adults in the 1990s maintained the distinction between the Korean /e/ and /ܭ/, while Korean female adults produced them indistinguishably, suggesting that the Korean /e/-/ܭ/ merger was still in progress during this period and female speakers led the change. Most of the GEN1 speakers in our study immigrated to the US during the 1970s and the 1980s. After a long period of time away from Korea, it is likely that the GEN1 speakers do not participate in sound changes in Korea that are still in progress or that were established after they left. In fact, linguistic conservatism has often been observed in diasporic communities (Johannessen and Laake 2015; Parodi 2014; Polinsky 2018). Thus, it is possible that the GEN1 speakers kept the Korean /e/-/ܭ/ contrast that they brought with them, which may have affected their production of (unmerged) EH and AE.

With regard to the KOR speakers, given that these speakers were much younger than the GEN1 speakers and left Korea recently (see Table 1), it is unlikely that they maintain the Korean /e/-/ܭ/ contrast. When comparing the difference of the average normalized F1 (i.e., vowel height) between EH and AE across groups, the KOR speakers demonstrated a smaller difference (0.42), compared to the CA

speakers (0.75) and the early bilinguals (GEN2: 0.64, GEN1.5: 0.66). On the other hand, GEN1 speakers' vowel height difference between EH and AE (0.71) was comparable to that of the CA speakers and the early bilinguals. Thus, it appears that both the GEN1 and the KOR speakers demonstrate influence from Korean phonology when producing EH and AE, but in a different way. While the GEN1 speakers assimilate EH and AE to the Korean /e/ and /ܭ/, respectively, the KOR speakers acquire the EH-AE contrast using vowel height, but do so less consistently than the CA speakers, similar to the case of the IY-IH contrast. To confirm this, future research should examine GEN1 speakers' and KOR speakers' realization of both English and Korean vowels.

In this study, we examined whether Korean Americans produce AE differently based on the nasality of the following consonant (i.e., AE-AEN split). Korean does not demonstrate a systematic split between non-prenasal and prenasal vowels, thus, we predicted that, if influence from Korean phonology occurs, Korean Americans would not distinguish AE and AEN. Our results showed that late bilinguals (i.e., GEN1 and KOR) and the GEN1.5 speakers did not distinguish AE and AEN. The GEN2 speakers, on the other hand, produced AE and AEN distinctly using both vowel height and frontedness, but the difference between these two vowels were smaller than for the CA speakers. This finding suggests that the GEN2 speakers participate in the AE-AEN split, but to a lesser extent than the CA speakers.

Regarding the low back vowels AA and AO, Korean does not have a vowel that acoustically overlaps with any of these vowels. The closest vowels in Korean would be /a/ (low central) and /ݞ/ (near-low back) (Baker et al. 2002; Trofimovich et al. 2011; Tsukada et al. 2005). Thus, we predicted that if influence from Korean occurs, they would either assimilate both vowels to the Korean /a/ (Outcome 1: Participation in AA-AO merger but more fronted than expected) or distinguish them by assimilating AA to the Korean /a/ and assimilating AO to the Korean /ݞ/) Outcome 2: No participation in AA-AO merger). Results showed that the late bilinguals (i.e., GEN1 and the KOR) produced AA more fronted than AO. Additionally, the GEN1 speakers produced AA lower than AO. These findings suggest that the late bilinguals did not participate in the AA-AO merger, most likely because they assimilated AA to the Korean /a/ and assimilated AO to the Korean /ݞ/) i.e., Outcome 2). The finding that these speakers produced AO indistinctly from AH, which is the closest vowel to the Korean /ݞ/) see Section 3.1), supports the possibility that they assimilated AO to the Korean /ݞ/. As for the GEN2 and the GEN1.5 speakers, they patterned like the CA speakers in that they showed an overlap between AA and AO. In the case of the GEN2 speakers, the AA-AO merger was even stronger than the CA speakers, suggesting that the GEN2 speakers may be in a more advanced stage of the AA-AO merger. However, compared to the CA speakers their productions were overall more fronted. Although the results seem to be pointing toward Outcome 1 (i.e., assimilation to the Korean /a/), this is unlikely. Compared to the CA speakers, GEN2 speakers' AO was more fronted, but not lower, which would have been the case if the GEN2 speakers assimilated the merged AA-AO category to the Korean /a/. As shown in Figure 3, AA is positioned in the low-back area in GEN2 speakers' vowel space, whereas for the late bilinguals it is positioned in the low-mid area between AE and AO. If the GEN2 speakers assimilated AA to the Korean /a/, which is what we believe happened in the speech of the late bilinguals, they would have shown similar patterns as the late bilinguals. Thus, as in the case of the front vowels, GEN2 speakers' overall fronted vowel space seems to be a more plausible explanation to their divergence from the CA speakers.

Lastly, with regard to the high back vowels UW and UH, Korean-English bilinguals tend to assimilate both vowels to the Korean /u/ which is more back (Baker and Trofimovich 2005). Thus, we predicted that, if influence from Korean phonology occurs, Korean Americans would merge UW and UH and produce them more back than the CA speakers. Similarly, AH would be produced more back than the CA speakers due to influence from the Korean /ݞ/. Our data showed that the late bilinguals produced UW and UH indistinguishably, suggesting a strong influence from Korean phonology. However, unlike what we expected, they did not produce the merged UW-UH more back than CA speakers. One possible explanation is that, rather than to the Korean /u/, these speakers

may have assimilated the UW-UH category to the Korean /ܺ, / which is acoustically more similar to these vowels. Future research examining late bilinguals' combined L1 and L2 vowel space would help confirm this. Unlike the late bilinguals, the early bilinguals aligned with the CA speakers in that they maintained the UW-UH contrast using vowel height12. Compared to the CA speakers, only the GEN2 speakers demonstrated more fronted UH and AH. As mentioned above, we believe that this is due to their more fronted vowel space.

Overall, our data showed a clear distinction between the GEN2 and the GEN1.5 speakers (i.e., early bilinguals), on the one hand, and the GEN1 speakers (i.e., late bilinguals), on the other, confirming an effect of age of arrival to the US on Korean Americans' realization of English vowels. Similar to the KOR speakers, the GEN1 speakers did not distinguish the front vowels contrasts IY-IH and EH-AE, using the same strategies as the CA speakers, and failed to maintain the back vowel contrasts UW-UH and AH-AO. Moreover, they did not participate in the California Vowel Shift, which is mostly likely due to influence from their L1 Korean. Unlike the late bilinguals, the early bilinguals successfully maintained the four vowel contrasts using the same phonetic strategies as the CA speakers. Moreover, the CA speakers and the early bilinguals demonstrated horizontally narrower and vertically more expanded vowel space than the late bilinguals (see Figure 3). This indicates that these speakers followed the linguistic trend of California English which is characterized by a horizontal compression of vowel space (D'Onofrio et al. 2019). However, the early bilinguals did not demonstrate a complete convergence toward the CA speakers, especially when producing non-prenasal AE and prenasal AEN. While the GEN2 speakers distinguished the two vowel types, their split was less pronounced than the CA speakers. The GEN1.5 speakers, on the other hand, did not demonstrate a systematic distinction between AE and AEN, following the patterns of the late bilinguals. Less pronounced or lack of AE-AEN split among Korean Americans has also been reported in other studies (Cheng 2016; Lee 2016). Since Korean does not have AE-AEN split, this finding suggests that Korean phonology has an effect on early bilinguals' production of AE and AEN and that the GEN1.5 speakers demonstrate a stronger influence from Korean phonology than the GEN2 speakers due to their later exposure to California English (i.e., age effect).

#### *4.2. Second-Generation Korean Americans' Divergent Participation in the California Vowel Shift*

In the case of the GEN2 speakers, apart from the less pronounced AE-AEN split, we found that these speakers additionally demonstrated an overall more fronted realization of the vowels than the CA speakers. Except for front vowel retraction, all the patterns of the California Vowel Shift examined in this study were observed in GEN2 speakers' speech (i.e., front vowel lowering, back vowel fronting, AE-AEN split, AA-AO merger). In fact, in certain aspects, the GEN2 speakers seemed to be in a more advanced stage of the California Vowel Shift than the CA speakers (i.e., IH-lowering, AA-AO merger, UH- and AH-fronting). These findings, along with the less pronounced AE-AEN split, are highly consistent with those of Korean Americans in Berkeley (Cheng 2016), which suggests that Korean Americans in Southern and Northern California may share similar patterns. Based on visual inspection of participants' vowel space in Figure 3, the GEN2 speakers seemed to demonstrate the narrowest vowel space across groups. Thus, it is possible that GEN2 speakers' fronted vowel space occurs in combination with more advanced horizontal compression than the CA speakers.

<sup>12</sup> It is noteworthy that the CA speakers and the early bilinguals did not use vowel frontedness to distinguish the UW-UH contrast, whereas they used both vowel height and frontedness when distinguishing other contrasts (see Table 2). We suspect that this is linked to the lack of a main effect of previous consonant on the production of UW and UH. The findings of this study differed from previous research (D'Onofrio et al. 2019; Hall-Lew 2009; Podesva et al. 2015) in that the production of UW and UH was not conditioned by the phonological context that encourages fronting (i.e., post-coronal position). Although our data do not have enough tokens in post-coronal position to further examine its effect on individual speakers, these findings seem to indicate that the fronting of UW and UH is well established among the CA speakers and the early bilinguals. A similar claim has been made by Hall-Lew (2009, 2011) that back vowel fronting is nearing completion in Northern California.

While further examination of GEN2 speakers' holistic vowel space (e.g., area and dispersion) should be carried out, it appears that there is a link between GEN2 speakers' narrow vowel space and their pronounced back vowel fronting. Pronounced back vowel fronting has also been found in other Asian American groups. For instance, Hall-Lew (2009, 2011) demonstrated that Chinese Americans in San Francisco may be in a more advanced stage of back vowel fronting than Anglo-Californians. Similarly, Cheng (2016) found that, apart from the Korean Americans, South Asians also demonstrated more pronounced UH-fronting than Anglo-Californians. Thus, it is possible that some Asian Americans in California collectively demonstrate stronger participation in back vowel fronting than Anglo-Californians to express their pan-ethnic Asian American identity. According to Wei (1993, p. 1), being Asian American "implies that there can be a communal consciousness and a unique culture that is neither Asian or American, but Asian American." US-born Asian Americans often experience microaggressions challenging their American-ness due to their phenotypic traits that are distinct from the mainstream Americans (i.e., Anglo-Americans) (Lee 2019). The shared racialization experiences, which contradicts the covert oppression exerted upon Asian Americans behind the model minority stereotype (e.g., docile, hard-working, good citizens) (Chou and Feagin 2010; Kawai 2005; Lee 2019), may lead some Asian Americans to overemphasize their American-ness using linguistic resources. In other words, the pronounced back vowel fronting observed in the GEN2 speakers may be a result of the speakers overcompensating for their perceived un-American-ness by taking the back vowel fronting of the California Vowel Shift even further than the CA speakers. This may eventually cause for the front vowels to be pushed forward in order to maintain sufficient perceptual contrasts between front and back vowels (Lindblom 1990; Lindblom and Engstrand 1989). Future research should examine the social meanings of back vowel fronting and the relationship between the degree of back vowel fronting and pan-ethnic Asian American membership across different Asian American groups, as well as its effect on the realization of front vowels.

It is important to note that, unlike the GEN2 speakers, the GEN1.5 speakers did not demonstrate pronounced back vowel fronting or an overall fronted vowel space. If we extend our argument from above, it is possible that the GEN1.5 speakers may not feel the need to overemphasize their American-ness through back vowel fronting in the same way as the GEN2 speakers, since GEN1.5 speakers often demonstrate a strong affiliation to Korean cultures as part of their dual identity (Kim and Stodolska 2013). Thus, it is likely that GEN1.5 speakers identify themselves more strongly as Koreans or Korean Americans than Asian Americans. Due to the small sample size (N = 4), it is premature to make an assumption on GEN1.5 speakers' speech behaviors. Future research should include a balanced number of GEN1.5 and GEN2 speakers to test whether their English vowels systematically differ from each other and whether their pan-ethnic Asian American identity in relation to their Korean or Korean American identity has an effect on their realization of English vowels.

Another possible explanation to GEN2 speakers' fronted vowel space is the social meaning associated with gender in Korean culture. Cross-linguistically, female speakers have higher fundamental frequency (F0) and formant frequencies than male speakers due to differences in their vocal anatomy (Escudero et al. 2009; Jacewicz et al. 2007; Pisanski et al. 2016; Simpson 2002; Yoon and Kim 2015). Thus, compared to male speakers, female speakers generally have a higher-pitched voice and produce vowels with lower height (i.e., higher F1) and more fronted (i.e., higher F2). In this study, we normalized participants' formant frequencies in order to examine gender effects on English vowel production while controlling for physiological differences between female and male speakers.

As demonstrated in Figure 3, the vowel space of CA female and male speakers largely overlapped in the front-back dimension13, whereas clear gender differences were observed across Korean groups,

<sup>13</sup> The only gender difference in the CA group was found in the vowel height of EH. That is, the CA female speakers produced EH lower than the male speakers. Kennedy and Grama (2012) found similar results in that female and male Californians in Santa Barbara (Southern California) differed in the height of AE (i.e., women produced it lower than men), but did not show any significant difference in vowel frontedness. Since women in general are leaders of sound changes (Coates 1993; Milroy

especially among the GEN2 and the GEN1 speakers. That is, even after normalizing formant frequencies, the Korean female speakers produced English vowels more fronted than the Korean male speakers. These findings suggest that Korean female speakers shift their vowel space forward as a way to express their femininity. Femininity is indexed differently across cultures. For instance, in American culture, women use creaky voice to enhance their female desirability (Pennock-Speck 2005; Yuasa 2010), whereas Japanese women use high-pitched voice to sound cute, young, and charming (Van Bezooijen 1995; Ohara 1998; Yuasa 2010). Although to a lesser degree than in Japanese culture, Korean women also use high-pitched voice to express femininity (Ohara 1998; Puzar and Hong 2018). High-pitched voice is a characteristic of performed winsomeness called aegyo, which is the cutified and infantilized figuration of femininity in Korean culture14 (Puzar and Hong 2018). Due to the close relationship between F0 and formants, it is likely that performers of aegyo also produce fronted vowel space. According to Pisanski et al. (2016), speakers across genders and cultures modulate their vocal tract length and F0 to imitate a physically large and small body size. That is, they shorten their vocal tract and increase their F0 to sound physically small and do the opposite to sound physically large. Thus, it is possible that the GEN2 female speakers move their vowel space forward to express Korean femininity. Here we would like to emphasize that among the four Korean groups (i.e., GEN2, GEN1.5, GEN1, and KOR), gender differences surfaced most systematically in the vowels of the GEN2 and the GEN1 speakers and that the front region of GEN2 speakers' vowel space largely overlapped with that of the GEN1 speakers. That is, GEN2 speakers' fronted vowel space reflects features that are present in both the CA speakers (i.e., horizontal compression and vertical expansion of the vowel space) and the Korean speakers (i.e., more fronted vowel space among female than male speakers), particularly those of their parents' generation. Although evidence of individual vowels disfavors the possibility of influence from Korean phonology, the findings suggest that the vowel space of the female GEN2 speakers is moving forward to align with the front region of the female GEN1 speakers. Studies have shown that children of immigrants who acquire the majority language natively may use ethnolectal features as additional linguistic resources to mark social meanings (e.g., association with ethnicity) (Cheshire et al. 2011; Clyne et al. 2001; Gnevsheva 2020). While ethnolectal features may originate from first-generation immigrants' foreign-accented speech, in second generation they may be reallocated for sociolinguistic purposes (Clyne et al. 2001; Gnevsheva 2020; Hoffman and Walker 2010). Thus, it is possible that the female GEN2 speakers shift their entire vowel space forward to index their intersecting ethnic and gender identities (i.e., cute and charming Korean female persona). The vowel space shift may occur independently or in combination with pronounced back vowel fronting to additionally express their pan-ethnic Asian American identity, as proposed above. Future research should examine intra-speaker variation of GEN2 speakers' vowel productions (e.g., style-shifting) to understand the social meanings of their fronted vowel space. Moreover, a perceptual study should be accompanied to examine whether such social meanings are shared by the Korean American community.

#### **5. Conclusions**

In this study, we examined Korean Americans' participation in the California Vowel Shift. Although the first-generation Korean Americans had spent a much longer time in the US than the Korean international students, influence from Korean still persisted in their speech. On the other hand, Korean Americans who were born and raised in Los Angeles (i.e., second-generation) or those who

and Milroy 1985; Labov 1990; Trudgill 1972), it appears that the lowering of mid and low front vowels EH and AE is still in progress in California English, whereas changes in the front-back dimension (i.e., retraction of front vowels and fronting of back vowels) may be nearing stability for the CA speakers.

<sup>14</sup> According to Puzar and Hong (2018), aegyo is not a direct emulation of child behaviors, but a performative repertoire of secondary infantilisation (Goffman 1979, pp. 72–77) used for various purposes (e.g., playfulness, seduction, negotiation, pleasing superiors). Thus, performers of aegyo, particularly young women, use this speech style to "negotiate the imbalance of power within patriarchal, androcentric and ageist/gerontocratic environments" (Puzar and Hong 2018). Similar concepts exist in other East Asian cultures, such as sajiao in China (Farris 1994) and kawaii in Japan (Brown 2011; Madge 1998).

came to the US during childhood (i.e., 1.5-generation) demonstrated most patterns of the California Vowel Shift. However, divergence from the Anglo-Californians was observed in their production of prenasal and non-prenasal /æ/. The 1.5-generation speakers did not systematically distinguish the two vowel types, similar to the late bilinguals. The second-generation speakers demonstrated a split-/æ/ system, but it was less pronounced than for the Anglo-Californians. These findings suggests that age of arrival has a strong effect on immigrant minority speakers' participation in local sound change.

Our findings also showed that the second-generation Korean Americans, in particular the female speakers, demonstrated an overall more fronted realization of the vowels than the Anglo-Californians. Second-generation Korean Americans' fronted vowel space reflected features that were present in both the Anglo-Californians (i.e., horizontal compression and vertical expansion of the vowel space) and the Korean speakers, particularly those of the first-generation Korean Americans (i.e., more fronted vowel space among female than male speakers). These findings suggest that second-generation Korean Americans may shift their vowel space forward to express their intersecting gender, racial, and ethnic identities. Future research should examine the social meanings of the fronting of vowel space.

**Author Contributions:** Conceptualization, J.Y.K.; Data curation, N.W.; Formal analysis, J.Y.K.; Investigation, J.Y.K. and N.W.; Methodology, J.Y.K. and N.W.; Visualization, J.Y.K. and N.W.; Writing—original draft, J.Y.K.; Writing—review & editing, J.Y.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


Pisanski, Katarzyna, Emanual C. Mora, Annette Pisanski, David Reby, Piotr Sorokowski, Tomasz Frackowiak, and David R. Feinberg. 2016. Volitional Exaggeration of Boday Size through Fundamental and Formant Frequency Modulation in Humans. *Scientific Reports* 6: 34389. [CrossRef]

Podesva, Robert J. 2011. The California Vowel Shift and Gay Identity. *American Speech* 86: 32–51. [CrossRef]

Podesva, Robert J., Annette D'Onofrio, Janneke Van Hofwegen, and Seung Kyung Kim. 2015. Country Ideology and the California Vowel Shift. *Language Variation and Change* 27: 157–86. [CrossRef]

Polinsky, Maria. 2018. *Heritage Languages and Their Speakers*. Cambridge: Cambridge University Press, vol. 159.


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Interlingual Interactions Elicit Performance Mismatches Not "Compromise" Categories in Early Bilinguals: Evidence from Meta-Analysis and Coronal Stops**

**Joseph V. Casillas**

Department of Spanish and Portuguse, Rutgers University, New Brunswick, NJ 08904, USA; joseph.casillas@rutgers.edu

**Abstract:** Previous studies attest that some early bilinguals produce the sounds of their languages in a manner that is characterized as "compromise" with regard to monolingual speakers. The present study uses meta-analytic techniques and coronal stop data from early bilinguals in order to assess this claim. The goal was to evaluate the cumulative evidence for "compromise" voice-onset time (VOT) in the speech of early bilinguals by providing a comprehensive assessment of the literature and presenting an acoustic analysis of coronal stops from early Spanish–English bilinguals. The studies were coded for linguistic and methodological features, as well as effect sizes, and then analyzed using a cross-classified Bayesian meta-analysis. The pooled effect for "compromise" VOT was negligible (β = −0.13). The acoustic analysis of the coronal stop data showed that the early Spanish–English bilinguals often produced Spanish and English targets with mismatched features from their other language. These performance mismatches presumably occurred as a result of interlingual interactions elicited by the experimental task. Taken together, the results suggest that early bilinguals do not have "compromise" VOT, though their speech involves dynamic phonetic interactions that can surface as performance mismatches during speech production.

**Keywords:** compromise VOT; voice timing; bilingualism; performance mismatches; dynamic phonetic interactions

#### **1. Introduction**

Though early bilinguals with ample experience in their first (L1) and second (L2) languages are believed to show "monolingual-like" L2 speech production (Rao and Ronquest 2015), research on bilingual language modes (Grosjean 2001) has shown that cross-linguistic interactions are strengthened in bilingual contexts (Olson 2013; Simonet 2014) and can lead to production/perception that differs from that of monolinguals. A crucial question revolves around how a bilingual speaker mitigates producing/perceiving acoustically similar segments in their languages in unilingual and bilingual settings, i.e., whether the two systems are kept separate (see Magloire and Green 1999), or whether there is a compromise in the acoustic characteristics of the sounds of the two languages (see Caramazza et al. 1973). A body of research dating back to the 1970s shows that some early bilinguals produce the sounds of their languages in a manner that has been characterized as "compromise", "intermediate" or "merged" with regard to monolingual distributions (e.g., Flege and Hillenbrand 1984; Flege 1991; Flege and Eefting 1987b; Sundara et al. 2006, among many others). The present study takes a systematic look at the cumulative evidence for "compromise" speech production in early simultaneous and sequential bilinguals. This study begins by describing the nature of "compromise" categories and reviews the literature supporting their existence. Next, this study considers how "compromise" categories may arise under current models of bilingual phonology, and then describes some methodological concerns. Afterwards, this study employs meta-analytic techniques to assess the extant literature, and concludes by presenting an alternative account of "compromise" categories using production data from coronal stops.

**Citation:** Casillas, Joseph V. 2021. Interlingual Interactions Elicit Performance Mismatches Not "Compromise" Categories in Early Bilinguals: Evidence from Meta-Analysis and Coronal Stops. *Languages* 6: 9. https://doi.org/ 10.3390/languages6010009


Received: 28 October 2020 Accepted: 29 December 2020 Published: 4 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### *1.1. Background and Motivation*

A recurring finding in the bilingual speech literature is that even early bilinguals, both simultaneous and sequential, can display production, perception, and lexical processing that differs from monolinguals.<sup>1</sup> To be more specific, this finding is often framed in terms of "compromise" or "intermediate" phonetic categories in the bilingual production of stop contrasts, mainly because the acoustic properties of the segments are proposed to lie somewhere between those of the two languages. In the present work, a "compromise" category is defined as one that is not target-like with regard to some acoustic property. This particular line of research has focused on voice timing. Voice timing in stops can be determined by a number of parameters, the most common of which is voice-onset time (VOT, Lisker and Abramson 1964). VOT refers to the duration of the interval between the release of the stop and the onset of voicing of the following segment. VOT realizations include phonetically voiced stops, in which voicing begins before the release (i.e., lead VOT), as well as phonetically voiceless, lag stops, in which voicing begins after the release (i.e., short- and long-lag VOT, approx. 0–30 ms and 30+ ms, respectively). Many languages have two-way or three-way oppositions in which lead VOT, short-lag VOT, and long-lag VOT are mapped to phonologically voiced and voiceless categories. French, for example, contrasts voiced and voiceless stops with lead VOT and short-lag VOT, respectively. English, on the other hand, contrasts voiced and voiceless stops with short- and long-lag VOT, respectively.

The notion of "compromise" categories appears to have been coined by Williams (1980) in reference to Williams (1977), a study on Spanish–English bilinguals' production of bilabial stops. Williams (1980) stated, "The apparent development of compromise VOT targets in both perception and production for bilinguals and second-language learners may reflect a true convergence over time of the acoustic phonetic features of the two languages instead of the development of two separate phonetic systems" (p. 213). Subsequently, the term became commonplace in the bilingual speech literature (e.g., Flege and Hillenbrand 1984; Antoniou et al. 2010, 2011; Bullock et al. 2006; Chang et al. 2011; Flege 1991; Flege and Eefting 1987b, 1988; Gabriel et al. 2016; Jones 2020; Kehoe et al. 2004; Kiliç 2018; Kilpatrick 2004; Lein et al. 2016; Llama and López-Morelos 2016; López 2012; Morgan 2011; Sundara et al. 2006).

One explanation for the notion that bilinguals display "compromise" categories is offered by the Speech Learning Model (SLM, Flege 1995). The SLM posits that the ability to learn speech sounds is maintained throughout life. Novel phones are stored in long-term memory as phonetic categories. Importantly, L1 and L2 phonetic categories interact because they are assumed to share the same phonological space. L2 sounds are unconsciously and automatically linked with neighboring L1 phonetic categories via a mechanism Flege (1995) refers to as the "equivalence classification". According to the equivalence classification, linked phones that are perceived as being phonetically similar may result in a single, merged category used for both languages. If this is the case, the model posits that the L1 sound may assimilate to the L2 sound, resulting in both L1 and L2 productions that are intermediate in the phonetic space with regard to monolingual categories. If, on the other hand, the linked phones are perceived as being phonetically dissimilar, it is more likely that a new phonetic category will be formed. In this situation, the model predicts that the L2 category can dissimilate from nearby sounds in order to maintain phonetic contrast. Thus both scenarios can lead to "compromise" categories in bilingual speech.

Though they did not use the term "compromise" VOT, evidence for the phenomenon in early bilinguals dates back to a landmark study by Caramazza et al. (1973). This study examined voiceless stop production/perception in early French–English bilinguals that had acquired English before the age of seven. Caramazza et al. (1973) compared the bilinguals'

<sup>1</sup> The present work takes a broad view, similar to that of the studies referenced herein, on what constitutes early bilingualism. Specifically, an early bilingual is operationalized as one who was exposed to an additional language before the age of 12. This study also distinguishes between simultaneous and sequential bilingualism, where the former refers to an individual that acquires their languages at the same time and the latter refers to an individual that acquires an L2 after the L1.

performance on a perceptual identification task and a reading task with monolingual English and French speakers and found that, when the bilinguals were in French mode, they produced French /ptk/ no differently than the French monolingual control group. However, when they were tested in English mode, they produced English /ptk/ with less aspiration than the monolinguals. Caramazza et al. (1973) concluded that French–English bilinguals were "more closely aligned" with the French monolinguals, presumably because they had learned English sequentially.

The results from Caramazza et al. (1973) suggest that sequential learners that consistently use both languages over a long period of time may still produce stops in a way that differs from monolingual speech. An SLM account for these data might propose that category formation was blocked because of the equivalence classification. This would imply that the phones were still linked, though the bilinguals' L1 categories did not differ from those of monolingual controls, and, therefore, did not assimilate or merge.

In another seminal study in this literature, Flege and Eefting (1987b), examined the production of Spanish stops in numerous groups of Puerto Rican Spanish–English sequential bilingual children and adults with different linguistic backgrounds. Spanish stops contrast lead VOT with short-lag VOT similar to French. The bilinguals were compared with age-matched monolingual controls for both languages. Of particular relevance to the present work are two early bilingual groups that Flege and Eefting (1987b) referred to as earlier childhood bilinguals (ECB) and later childhood bilinguals (LCB). Both the ECB and LCB groups comprised adults who had learned English before the age of seven, but the ECB group was born in the U.S. mainland, or moved there shortly after birth, whereas the LCG group still lived in Puerto Rico. Both groups produced English /ptk/ with less aspiration than monolingual English speakers. Flege and Eefting (1987b) concluded that the early bilinguals—even those living in the U.S. since early childhood—were not able to produce English /ptk/ "authentically", suggesting that their "intermediate" productions lent support to the equivalence classification hypothesis. In other words, they continued to associate the Spanish and English stops as realizations of the same phonetic category.

The results from Caramazza et al. (1973) and Flege and Eefting (1987b) suggest that sequential learners will differ from monolinguals because of the equivalence classification, but "compromise" VOT has also been documented in simultaneous bilinguals (see Sundara et al. 2006; Fowler et al. 2008; Kupisch and Lleó 2017; Lein et al. 2016, among others). For instance, in a more recent study, Fowler et al. (2008) analyzed the voiceless stops of English–French simultaneous and early, sequential bilinguals by comparing their production with monolingual controls. The simultaneous bilinguals produced French /ptk/ with longer VOT than monolingual French speakers, and English /ptk/ with shorter VOT than monolingual English speakers. The effects were greater in the early, sequential bilinguals who produced stops that were characterized as even more intermediate in both languages. Though the SLM was designed with sequential learners in mind, Fowler et al. (2008) proposed that, for the simultaneous bilinguals, the different phones were not merged but still "cognitively identified with one another" because of the equivalence classification (p. 650).

The "compromise" VOT literature also shows significant between-study variability, as there are also studies that do not find "compromise" categories in bilingual production. For instance, Flege (1991) analyzed early and late Spanish–English bilinguals' production of Spanish and English /t/ in utterance initial and utterance medial position. Importantly, the voiceless coronal is produced with short-lag VOT in Spanish and long-lag VOT in English. Flege (1991) found that, in both positions, the late bilinguals produced English /t/ with "compromise" VOT, but the early bilinguals' production was no different from the monolingual speakers. Thus, it seems that some early bilinguals are able to establish se-parate phonetic categories and maintain separation between their languages, even when the segments are acoustically similar. Nevertheless, bilinguals do indeed seem to display more variability in their production of stops. For example, a recurring finding is that bilinguals whose L1 is a 'true voicing' language, that is, one that contrasts pre-voiced

stops with short-lag stops (i.e., Spanish, French), tend to produce English voiced stops with pre-voicing at higher rates than monolingual English speakers (e.g., MacLeod and Stoel-Gammon 2005; Flege and Eefting 1987b; Hazan and Boulakia 1993). Much like in monolingual stop production (Chodroff and Wilson 2017), bilingual stop production also shows structured co-variation between stop categories (Chodroff and Baese-Berk 2019). This raises the possibility that variable voiced stop production might also be structured in other ways, on dimensions other than VOT.

#### *1.2. Motivating Meta-Analysis*

Given the discrepancies in the literature regarding the reliability of "compromise" categories, it is imperative that one consider all possible alternative explanations. One notable issue regarding voice timing that bares on this line of research is related to the mea-sure of VOT itself. Manifold studies show that VOT is modulated by linguistic factors, such as place of articulation (Cho and Ladefog 1999), word position (Antoniou et al. 2010), lexical stress (Casillas et al. 2015), and speech rate (Magloire and Green 1999) in monolingual and bilingual speech. For instance, faster speech is associated with shorter VOT and slower speech is associated with longer VOT, though the size of the effect may be language specific. Unfortunately, the large majority of studies related to stop contrasts in bilinguals do not take speech rate into account. One method to do so, proposed by (Stölten et al. 2015), is to use a relative measure of VOT by calculating the proportion of the duration VOT occupies in the stop + vowel sequence. This measure, relative VOT, has proven to be a more accurate, fine-grained metric that controls for possible speech rate confounds and provides more precise between-group comparisons.

In addition, bilingual language modes (Grosjean 2001) represent another factor that must be considered. Bilingual production/perception can vary (1) according to the mode (unilingual, bilingual) in which it is tested (e.g., Antoniou et al. 2010; Gonzales and Lotto 2013), and (2) as a function of the expectations the bilingual has about the communicative context (e.g., Gonzales et al. 2019; Lozano-Argüelles et al. 2020; Yazawa et al. 2019). These facts underscore the need to take special care when designing experiments so as to avoid confounds related to language modes.

Further methodological concerns include statistical power and sample size. Most studies in the social sciences test for small effects (Ellis 2010; cf. Plonsky and Oswald 2014), include small samples sizes, and, therefore, are underpowered (see Brysbaert 2020). The importance of this fact should not be overlooked, as an underpowered study is more likely to commit a type II error (false negative), and contributes to a literature with lower positive predictive value. Consequently, this implies that the prevalence of significant findings related to "compromise" categories may be indicative of publication bias.

Finally, advances in computational power have also led to more robust analytic techniques at the disposal of the speech researcher. Consider, for instance, multilevel modeling. These models provide two clear advantages: (1) they allow for partially pooled parameter estimates that are less affected by influential data points, and (2) they obviate the need to pool over subject and item repetitions in repeated measures designs. The combination of (1) and (2) helps to evade pseudoreplication in the phonetic sciences and reduces the likelihood of committing type II errors (Winter 2011).

#### *1.3. The Present Study*

The body of evidence suggesting that "compromise" categories in bilingual stop production may be fraught with confounds related to the primary outcome measure, as well as methodological issues related to language modes, power, sample size, and analytic techniques. This raises the question as to whether or not variable bilingual stop production may be best accounted for by some other phenomenon, such as dynamic phonetic interactions associated with language activation, rather than "compromised" underlying representations. All of the above motivate the need to assess the cumulative evidence via meta-analytic techniques. The present study aimed to address this need. Meta-analysis

offers a principled method for assessing a body of research by using independent observations to derive an average effect size and, thus, draw an overall conclusion regarding the direction and magnitude of real-world effects. The "compromise" category literature would particularly benefit from meta-analysis because a large amount of research and theory building has been based on the early findings.

In order to assess the "compromise" category literature, the present study addressed the following questions:


In what follows, the present project responds to the aforementioned research questions by presenting a meta-analysis of the "compromise" VOT literature. Subsequently, this study provides an alternative account to the notion of "compromise" categories in early bilinguals using data from coronal stops.

#### **2. Meta-Analysis**

*2.1. Method*

2.1.1. Study Identification and Screening

The analysis employed a variety of techniques to locate relevant primary studies, focusing first on amassing a large study pool and later filtering out redundant or unusable records. The first step included searching library-housed online databases using various combinations of relevant keywords. The terms 'compromise categories', 'merged categories', 'mixed categories', and 'intermediate categories' were searched individually and in combination with 'early learners', 'early bilinguals' or 'simultaneous bilinguals', as well as 'VOT' or 'voice-onset time'. The databases included ERIC, Science Direct, Linguistics and Language Behavior Abstracts, PsycINFO, ProQuest Dissertations and Theses, and FirstSearch (see the supplementary materials for more details on the results from each search). Ancestry studies and studies citing the primary studies were also obtained via searches in Google and Google scholar. When potential studies were not available through the aforementioned resources, authors were contacted directly. There were 153,860 re-cords identified through database searching and 27 additional ancestry studies identified through Google and Google scholar. After removing duplicates and irrelevant hits, the study pool contained 148 records.

#### 2.1.2. Eligibility

The 148 full-text articles and dissertations were assessed for eligibility. To be eligible, a study had to (1) include simultaneous and/or early bilinguals (AOA before 12 years old) that were adults at the time of testing, (2) examine a language pair with a two-way stop voicing contrast, specifically voiceless stops, and (3) include a monolingual comparison group.<sup>2</sup> For all studies, data from both languages were included in the meta-analysis if the participants were simultaneous bilinguals and control comparisons were included. For sequential learners, their second, sequentially learned language was utilized in comparison with controls, as "compromise" phonetic categories are more common in the literature in the L2. Languages with three-way contrasts (namely Korean) were excluded because these contrasts typically involve other parameters, i.e., pitch (see Holliday 2015). Finally, the study pool was limited to analyses of voiceless stops due to the fact that the majority of the records identified involved English, which allows both short-lag and lead VOT for phonologically voiced stops.

<sup>2</sup> An anonymous reviewer duly notes the fact that there is currently debate surrounding the use of monolingual control populations in bilingual research (see Sakai 2018). The present meta-analysis included monolingual controls as part of the search criteria because it best reflects the practices of research on "compromise" categories and thus led to the largest number of potential studies for the dataset.

The assessment resulted in a dataset of 68 studies that appeared to meet the aforementioned inclusion criteria. The studies spanned 6 decades, from the 1970s to the present, and came from a variety of sources, including journal articles, book chapters, MA and PhD theses, conference proceedings, and unpublished manuscripts. Forty-eight studies were discarded for a number of reasons: (1) they looked at something different (k = 15), (2) they did not include a control group for comparison (k = 16), (3) there was missing data (k = 10), (4) this study included duplicate data presented in a prior study (k = 2), or (4) this study examined a three-way contrast (k = 5). Requests for missing or unreported data were sent via email. One response out of 10 requests was received (with data). The search process led to a final dataset comprising 20 studies with 37 independent comparisons and a pooled participant sample size of 641.The average age cut-off used to classify participants as early bilinguals was 4.53 ± 2.84 SD. The majority of the usable studies were journal articles (n = 16), followed by MA/PhD theses and conference proceedings (n = 4). The usable studies spanned 5 decades, with 3 from the 1980s, 4 from the 1990s, 4 from the 2000s, 8 from the 2010s, and 1 study from 2020.

#### 2.1.3. Coding

Each study was coded for linguistic and methodological features and effect sizes. The effect size was a measure of standardized mean difference (SMD), specifically Hedge's g. The linguistic features included the stop category (/p/, /t/, or /k/), lexical stress (stressed or unstressed syllable), and word position (initial, medial). The methodological features included analytic strategy (*t*-test, ANOVA, or LME) and pooling method for stops (individual evaluations of each segment versus averaging over combinations of segments). Effect sizes were calculated primarily using reported means and standard deviations (or standard errors).3 Ultimately, word position was excluded as a moderator, as there were not enough studies that included this factor.

#### 2.1.4. Statistical Analysis

A cross-classified Bayesian meta-analysis was conducted by fitting the study data with the multilevel regression model formulated below:

$$\begin{aligned} \text{SMD}\_{i} & \sim \text{Normal}(\theta\_{i}, \sigma\_{i} = \text{se}\_{i}) \\ & \theta\_{i} \sim \text{Normal}(\mu, \tau) \\ & \mu \sim \text{Normal}(0, 1) \\ & \tau \sim \text{HalfCucchung}(0, 1) \end{aligned} \tag{1}$$

Effect size (SMD) was the outcome variable and lexical stress (stressed, unstressed) and analytic strategy (LME, other) were included as population-level effects (i.e., fixed effects). The likelihood of the outcome variable was assumed to be a gaussian distribution. Individual studies and stop pooling methods were group-level effects (i.e., random effects). Population-level effects were deviation coded (lexical stress: stressed = 0.5, unstressed = −0.5; analytic strategy: LME = 0.5, other = −0.5) such that the posterior distribution of model estimates for each effect provided an assessment of effect size. The model included regularizing, weakly informative priors (Gelman et al. 2017) which were normally distributed and centered at 0 with a standard deviation of 1 for all population-level parameters. A cauchy prior set at 0 with scale 1 was used for *τ*. Fianlly, the model was fit with 4000 iterations (2000 warm-up) and Hamiltonian Monte-Carlo sampling was carried out with 4 chains distributed across 4 processing cores. The analysis was conducted in R (R Core Team 2019, version 4.0.3) and was fit using stan (Stan Development Team 2018) via the

<sup>3</sup> In the case of one study, effect size was calculated from the reported degrees of freedom and F-value of a one-way ANOVA. For three studies standard deviations were not reported but the manuscript included boxplots. In these cases, the median and interquartile range were derived from the boxplots via webplot digitizer (Rohatgi 2020) and then used to calculate the mean and standard deviations (see Wan et al. 2014). All figures used for these approximations are available in the supplementary materials.

R package brms (Bürkner 2017). More information regarding Bayesian Data Analysis is available in the supplementary materials. All supplementary analyses as well as the data, code, and the experimental materials necessary to reproduce the analyses reported in this article are available at: https://osf.io/un45x/.

#### *2.2. Results*

Summaries of the posterior distribution of the meta-analytic model are provided in Figures 1 and 2 (see supplementary materials for summaries in table format). Averaging over lexical stress and analytic strategies, the pooled estimate of the standardized mean difference (SMD) was small and negative (β = −0.132, HDI = [−0.708, 0.468]). As reflected in Figure 1A, the posterior distribution of the effect size estimate is wide and encompasses plausible values on both sides of a point null of 0. In short, there is not compelling evidence in support of a difference in VOT between bilinguals and monolingual controls. The estimates of variability from group-level effects *pooling method* and *individual studies* are plotted in Figure 1B. One can see that *individual studies* were a considerable source of variability. Figures 1C and 2 illustrate estimate uncertainty in comparison with the overall pooled effect. The moderator (subgroup) effects were negligible. Specifically, lexical stress had no effect on group differences (β = −0.14, HDI = [−0.766, 0.483]), nor did analytic strategy (β = 0.182, HDI = [−0.657, 1.035]), though mixed effects models tended to narrow the gap between group difference estimates, as illustrated by the positive β-parameter (see Figure 1D).

**Figure 1.** Summary of the posterior distribution of the meta-analytic model. Panel (**A**) plots the pooled estimate of effect size (SMD) on the horizontal axis and standard error (τ) on the vertical axis. Lighter colors illustrate higher density areas (i.e., more plausible values) and the point represents the posterior median. Panel (**B**) illustrates estimates of variance from group-level effects and Panel (**C**) provides a sub-category summary of variability as a function of pooling method. The vertical lines show the posterior median (solid) and 95% credible interval (dashed) of the pooled effect. Panel (**D**) summarizes the posterior distributions of the overall effects of the lexical stress and analytic strategy moderators.

**Figure 2.** Summary of posterior model estimates for individual studies (vertical axis) and the effect size (SMD, horizontal axis). White points represent posterior medians and horizontal bars capture the ± 95% and 80% credible intervals, which are also printed along the right vertical margin. The vertical shaded rectangle illustrates ± 95%, 80%, and 50% credible intervals around the posterior median of the pooled effect.

#### *2.3. Interim Discussion*

Using a variety of techniques to locate relevant research on bilingual stop production, a dataset of 20 studies with 37 independent, usable bilingual-to-monolingual comparisons was collected. The studies were coded for linguistic and methodological features and effect sizes were calculated in order to estimate the cumulative effect in the literature comparing bilingual voiceless stop production to that of monolingual controls via meta-analysis. The results of the meta-analysis suggest that the cumulative effect in the literature is negligible and includes a high degree of uncertainty. The pooled estimate of the present dataset is −0.132 standard deviations, 95% HDI = [−0.708, 0.468]. Traditional standards classify an effect size of 0.20 as small, 0.50 as medium, and 0.80 as large (see Cohen 2013; Ellis 2010). Plonsky and Oswald (2014) suggest even more stringent standards for L2 research (small: d = 0.40, medium: d = 0.70, large: d = 1.00). In this dataset, the posterior probability that the average effect size meets or exceeds Plonsky and Oswald (2014) suggestion for a small effect is 0.18. Why, then, has a significant amount of the literature built theory around the "compromise" category claim?

One possibility is publication bias. Figure 3 provides a funnel plot of the unpooled dataset comparing standard error as a function of effect size (SMD). If the literature on bilingual stop production suffered from publication bias one would expect to see individual studies (points) dispersed asymmetrically around the pooled estimate. This does not appear to be the case, as one observes studies on both sides of the pooled estimate (the vertical black line).

**Figure 3.** Funnel plot of effect sizes (Hedge's g, horizontal axis), standard error, and power (vertical axis). Power for individual studies was calculated using the pooled estimate (−0.132 SMD, vertical dashed line) from the meta-analytic model. Background color illustrates power (darker colors represent more power).

> If we were to assume that the "compromise" category effect in bilingual speech were real, then we are left questioning why it was not borne out in the meta-analysis. One possibility, a glaring shortcoming of the extant literature, is the fact that all of the studies analyzed are underpowered (median = 5.80%). The gold standard for power suggested in psychology is 80% (see Ellis 2010), and none of the studies included in the present analysis met this standard (see right margin of Figure 3). The lack of power in this literature likely results from small sample sizes. If we again assume that "compromise" categories exist in bilingual speech and that the effect is small—0.40 using Plonsky and Oswald's suggestion for L2 research—a hypothetical study would need to include 99 participants *per group* to have an 80% chance of capturing the effect with alpha set a 0.05. The number is even more astounding if we use the pooled estimate of the present study and assume the same conditions: 896 participants, again, *per group*.

> Another possible factor contributing to the low overall power in the extant literature is measurement error. While measuring VOT is rather straightforward, the majority of the early research in this area required researchers to average VOT values for individuals over items, as well as over repetitions of items. This suggests that the values being used in statistical analyses may inherently misrepresent the actual productions of the participants. If, for example, a participant produces the voiced stops of English variably, with and without pre-voicing, a mean value for this participant might not accurately represent a prototypical stop from either distribution. Given the structured variability observed in monolingual (Chodroff and Wilson 2017) and bilingual (Chodroff and Baese-Berk 2019) stop production, it seems reasonable to assume that there is also inherent structure manifested in other ways. Advancing technology has made the costly computations involved with partial pooling methods more accessible to researchers, and, as a result, this is reflected in more recent studies using more powerful analytic strategies, such as multilevel models. The present meta-analysis took analytic strategies into account and found no effect, though

this may be explained by the fact that the majority of the studies included did not use partial pooling (k = 33 out of 37 independent comparisons).

It is important to note that the meta-analysis presented here is limited in several non-trivial ways. First, the pool of studies excluded clearly relevant, seminal research due to uncontrollable circumstances (e.g., missing data). Second, the criteria for inclusion necessarily limited the sample to voiceless stop production from early/simultaneous bilinguals speaking languages with two-way contrasts. While this filtering facilitated controlling possibly confounding factors, it raises questions regarding what data from other sources can contribute to the cumulative research on "compromise" categories in bilinguals. If, for example, "compromise" categories are not real in early and simultaneous bilinguals, what, then, can explain the observed variability in bilingual stop production? The following section considers coronal stop data from highly proficient Spanish–English early bilinguals in order to explore the interaction between language-specific voice-timing and place of articulation differences when both languages are activated in bilingual mode.

#### **3. Production of Coronal Stops**

This section presents an acoustic analysis of coronal stop data from early Spanish– English bilinguals in order to explore alternative explanations for "compromise" categories. The bilinguals completed a delayed shadowing task in which both languages were highly activated in order to elicit bilingual language mode. While English contrasts /d t/ through short-lag VOT and long-lag VOT, respectively, Spanish has the same voicing distinction through lead VOT and short-lag VOT. Importantly, the coronal stops of each language also differ regarding place of articulation. In Spanish coronal stops are described as dental, whereas in English they are described as alveolar (Casillas et al. 2015).

#### *3.1. Method*

#### 3.1.1. Participants

The dataset included 33 participants between the ages of 18 and 23, all of which were female. There were 17 bilingual participants and 16 monolingual controls. Of the monolingual participants, eight were native Spanish speakers, born and raised on the island of Majorca, Spain. The remaining eight were native English speakers, born and raised in the US Southwest.

The Spanish–English bilinguals came from Southern Arizona and Northern Mexico. They were raised in Spanish-speaking families and were schooled mostly in English in Southern Arizona. They reported using English and Spanish daily, both in the classroom as well as with their friends and relatives. The bilingual group completed the Bilingual Language Profile (BLP, Gertken et al. 2014) in order to assess language dominance. The BLP calculates a weighted average of language dominance based on the individual history, use, proficiency, and attitudes of the bilinguals with regard to their languages. The measure ranges from −218 to 218 with values near the extremes implying dominance in one of the languages. Values close to 0 are taken as an indication of balanced bilingualism. In the present study, Spanish was arbitrarily assigned to positive values. Figure 4 plots language dominance (Panel A) and language use and proficiency data (Panel B) derived from the BLP. The bilingual group had a mean dominance score of 2.08 (SD = 40.42), suggesting rather balanced bilingualism (Panel A). Participants that reported using Spanish more often also tended to report being more proficient in that language; the converse was also true for English (Panel B).

**Figure 4.** Bilingual Language Profile data. Panel (**A**) illustrates the distribution of BLP dominance scores. Panel (**B**) shows language proficiency as a function of language use. Both measures come from self-report data and are plotted in standardized units.

#### 3.1.2. Materials

There were 48 target words (English: k = 24; Spanish: k = 24) that contained voiced and voiceless coronal stops in word initial position. For each language, 12 targets began with /d/ and 12 began with /t/, equally divided between stressed and unstressed syllables (see supplementary materials). All stops were followed by a low vowel (/a/ for Spanish and /æ, α/ for English).

The participants completed a delayed repetition task in which they heard the target words presented in a carrier phrase ("x is the word" or the Spanish equivalent "x es la palabra"). The auditory stimuli were recordings of six male native speakers: three native English speakers and three native Spanish speakers. These recordings served as the auditory stimuli repeated out loud by the participants in the delayed repetition task. Words not containing coronal stops were considered distractors (k = 20). Praat was used to present the sentences randomly in auditory form and the speakers were asked to listen to the entire sentence and then repeat it out loud at their own pace.

The monolingual English speakers and bilinguals were recorded in a sound attenuated booth. The monolingual Spanish speakers were recorded in a quiet classroom on the campus of the *Universitat de les Illes Balears* in Majorca, Spain. The monolingual English speakers were recorded in English and the monolingual Spanish speakers were recorded in Spanish. The Spanish–English bilinguals were recorded in both of their languages in a single session with all English and Spanish items presented in a single, randomized block in order to activate both languages. The full dataset included 3519 tokens (24 target words per language × 3 repetitions). Eighty-one items (2.25%) were discarded due to mispronunciations or extraneous noise. A Shure SM10A dynamic head-mounted microphone with a Sound Devices MM-1 microphone pre-amplifier captured the acoustic signal and it was saved to a Marantz PMD660 digital speech recorder. The signal was digitized at 44.1 kHz and 16-bit quantization.

#### 3.1.3. Measurements

The audio files were low-pass filtered at 11.025 kHz. Synchronized waveform and spectrographic displays were used to mark the onset of modal voicing and of the stop burst, as well as the offset of the first vowel. The onset of voicing was operationalized as the upwards zero-crossing of the first periodic pattern in the oscillogram and the offset of the vowel was marked at the downwards zero-crossing of the final periodic pattern. VOT was calculated as the difference (in ms) between the onset of modal voicing and the onset of the burst. Relative VOT was calculated as the ratio between VOT and the total

duration of the stop-vowel sequence. Spectral moment measures were calculated from a 6 ms window beginning at the onset of the burst. Specifically, kurtosis was extracted from the spectral envelope, which ranged from 60 Hz to 11.025 kHz.

#### 3.1.4. Statistical Analysis

The bilingual voice-timing data were analyzed using Bayesian multilevel regression models. Specifically, separate models were fit for VOT and relative VOT as a function of voicing (voiced, voiceless) and language (English, Spanish). Fixed effects were deviation coded (voicing: voiced = 0.5, voiceless = −0.5; language: English = 0.5, Spanish = −0.5) thus the posterior distribution provided an assessment of effect size for each predictor. Item repetition was included as a continuous predictor and was centered to have a mean of 0. The random effects structure included a by-subject intercept with a random slope for voicing, as well as a by-item intercept. The model included regularizing, weakly informative priors (Gelman et al. 2017) which were normally distributed and centered at 0 for all population level parameters. The standard deviation was set at 50 and 3 for the VOT and relative VOT models, respectively. The models were fit with 4000 iterations (1000 warm-up) and Hamiltonian Monte-Carlo sampling was carried out with 4 chains distributed across 4 processing cores. For each model a region of practical equivalence (ROPE) was established around a point null value of 0 (see Kruschke 2018) using the following formula:

$$\frac{\mu\_1 - \mu\_2}{\sqrt{\frac{\sigma\_1^2 + \sigma\_2^2}{2}}} \tag{2}$$

For all models, median posterior point estimates are reported for each parameter of interest, along with the 95% highest density interval (HDI), the percent of the region of the HDI contained within the ROPE, and the maximum probability of effect (MPE). For statistical inferences, a posterior distribution for a parameter β in which 95% of the HDI falls outside the ROPE and a high MPE (i.e., values close to 1) were taken as compelling evidence for a given effect. Again, the analyses were conducted in R (R Core Team 2019, version 4.0.3) and models were fit using stan (Stan Development Team 2018) via the R package brms (Bürkner 2017).

#### *3.2. Results*

Figure 5 plots the VOT data as a function of group and stop category. Looking across the horizontal axis, one can observe distinct distributions for voiced and voiceless stops. In comparing the three groups along the vertically faceted panels, it becomes particularly clear that the bilinguals produce the coronal stops similarly to the monolingual controls in English and Spanish; however, upon close inspection, we can see that the bilingual group produces more pre-voiced /d/ tokens in English than the monolingual English speakers, as well as more short-lag /d/ tokens than monolingual Spanish speakers. When pooled together to calculate by-subject averages, tokens such as these could skew the measurement, fostering the notion that the bilinguals produce some segments with intermediate values that do not correspond with prototypical monolingual values of either language. The analysis of the voice-timing data did not provide compelling evidence that the bilingual productions differed from monolingual controls.

**Figure 5.** Voice-onset time (horizontal axis) and Kurtosis (vertical axis) as a function of speaker group and stop category. Colored points represent raw data. White points represent category medians.

#### 3.2.1. VOT

For Spanish, phonologically voiced stops had shorter VOT than voiceless stops (β = −38.865, HDI = [−46.555, −31.766], ROPE = 0, MPE = 1), but VOT did not vary as a function of group (β = 0.78, HDI = [−6.977, 8.604], ROPE = 0.844, MPE = 0.579), nor did the two factors interact (β = 1.213, HDI = [−4.842, 8.058], ROPE = 0.898, MPE = 0.646). For English stops, voiced stops also had shorter VOT than voiceless stops (β = −31.743, HDI = [−37.77, −25.363], ROPE = 0, MPE = 1). Averaging over stop category, the analysis suggested that bilingual VOT was slightly shorter than that of the monolingual English speakers (β = −7.005, HDI = [−13.015, −0.951], ROPE = 0.19, MPE = 0.988), though approximately 19% of the most plausible estimates fell within our *a priori* established region of practical equivalence. There was no evidence of a group × phoneme interaction (β = −5.161, HDI = [−11.02, 0.18], ROPE = 0.41, MPE = 0.966).

#### 3.2.2. Relative VOT

The more stringent relative VOT metric tells a similar story. For Spanish, voiced stops comprised a larger proportion of the stop-vowel sequence than voiceless stops (β = 0.1, HDI = [0.071, 0.127], ROPE = 0, MPE = 1), but the stop-vowel ratio did not vary between groups (β = −0.001, HDI = [−0.018, 0.015], ROPE = 0.945, MPE = 0.576), nor was there a two-way interaction (β = −0.004, HDI = [−0.028, 0.017], ROPE = 0.792, MPE = 0.658). For English, voiced stops comprised a smaller proportion of the stop-vowel sequence than voiceless stops did (β = −0.101, HDI = [−0.119, −0.083], ROPE = 0, MPE = 1). Averaging over stop category, there was no difference between groups (β = 0.004, HDI = [−0.013, 0.02], ROPE = 0.898, MPE = 0.697), and, finally, the two factors did not interact (β = 0.01, HDI = [−0.004, 0.025], ROPE = 0.699, MPE = 0.911).

The complete summary of the posterior distributions of all models are available in Table 1. As a point of comparison, the VOT and relative VOT data were also fit using a 2 × 2 repeated measures ANOVA under a null-hypothesis significance testing (NHST) frequentist framework by averaging over items and item repetitions. The VOT model, and not the relative VOT model, suggested a group × stop category interaction for the English data. The complete model summaries for all analyses are available in the supplementary materials.


**Table 1.** Summary of the posterior distribution modeling voiceless responses as a function of VOT, context, z-LexTALE, and order. The table includes posterior means, the 95% HDI, the percentage of the HDI within the ROPE, and the maximum probability of effect (MPE).

#### 3.2.3. Bilingual Performance Mismatches

Taken together, the aforementioned analyses do not provide compelling evidence that Spanish–English bilinguals produce Spanish and English coronal stops in a manner that robustly differs from their monolingual counterparts. That being said, there does appear to be a qualitative difference between the bilinguals and the monolingual controls in terms of variability. Specifically, on occasion, the bilinguals appear to produce English /d/ with pre-voicing, as well as Spanish /d/ and English /t/ with short-lag VOT. In order to analyze this further, the data were subset based on these mismatched VOT properties and the relationship between voice timing (relative VOT) and place of articulation (Kurtosis of the stop burst) was explored.

The scatter plots in Figure 6 show the mismatched targets for /d/ (Panel A) and /t/ (Panel B) as a function of target language. The vertical axis is kurtosis (standardized units) and the horizontal axis is relative VOT (standardized units). Higher kurtosis values are found in Spanish monolingual coronals with regard to English monolingual coronals. This difference reflects the place of articulation differences between Spanish (dental) and English (alveolar) coronals (see Casillas et al. 2015; Sundara et al. 2006).

**Figure 6.** Summary of performance mismatches for /d/ targets (Panel **A**) and /t/ targets (Panel **B**) in bilinguals. In both panels, English targets are triangles and Spanish targets are squares. The horizontal axis represents relative VOT and the vertical axis represents kurtosis in standardized units.

The plots can be interpreted using the quadrants specified by the vertical and horizontal dotted lines. Points on the left side of the vertical line are associated with lower VOT values and points to the right side are associated with higher VOT values.4 Points above the horizontal dotted line contain burst characteristics consistent with dental place of articulation (POA), while points below the horizontal line contain burst characteristics consistent with alveolar POA. The colors of the points are associated with the type of mismatch, which could be VOT, POA, or both VOT and POA (grey points are tokens produced as expected that met the filtering criteria). Of particular interest are the VOT/POA mismatches in the upper-left and lower-right quadrants of both plots.

The majority of the mismatched productions occurred in phonologically voiced target items (Panel A). One can observe Spanish targets that were realized as short-lag stops with more alveolar bursts (purple triangles), as well as English items that were produced with pre-voicing and more dental bursts (purple squares). The phonologically voiceless target items led to fewer mismatches (Panel B), though there are instances of English targets with short-lag VOT and more dental bursts, as well as Spanish targets with long-lag VOT and more alveolar bursts. The amount of mismatches produced was not associated with language dominance or any of the self-report measures collected in the BLP (see supplementary materials for more details).

#### **4. Discussion**

The present study included two primary analyses: a meta-analysis of "compromise" VOT and coronal stop data from a delayed shadowing production experiment. The results of the meta-analysis suggested that the pooled estimate of the cumulative effect-size was small, and just as likely to be positive as it was to be negative. The model considered linguistic factors as well as methodological factors. There was no evidence that lexical stress was a relevant moderator for "compromise" VOT, nor that analytic strategies resulted in a higher or lower likelihood of encountering differences between bilinguals and monolinguals. Between-study differences accounted for a larger proportion of the variance than pooling methods for the stop categories. The posterior estimate of the pooled effect made it particularly clear that the "compromise" VOT literature is underpowered (median = 5.80%).

<sup>4</sup> Note that the *x* axis was reversed in panel A of Figure 6 so that both panels are interpreted in the same manner, that is, with lower VOT values on the left and higher VOT values on the right. The reason the *x* axis must be reversed for this to be true is because pre-voiced stops have higher relative VOT values, i.e., they account for a larger portion of the stop + vowel sequence, than short-lag stops. The same is true for long-lag stops when compared with short-lag stops.

That being said, there is no evidence suggesting that this literature suffers from publication bias.

The analysis of the coronal stop data produced two findings. First, bilingual stop production is highly variable, and, second, this variability is structured in consistent, predictable ways. Specifically, the analysis showed that the bilingual speech contained target mismatches. That is, when producing English targets, the bilingual speech included pre-voiced /d/ tokens and short-lag /t/ tokens. When speaking Spanish, the bilingual speech included short-lag /d/ tokens. Crucially, these mismatches also displayed burst characteristics that were consistent with the mismatched language, suggesting that, on the whole, not only were the bilinguals using the voice timing of their other language, but also the corresponding place of articulation. The performance mismatches are attributed to the fact that the experiment induced high activation of both Spanish and English by being conducted in "bilingual" mode.

Taken together, the results of the present work depict bilingual stop production in a new light. Specifically, it seems unlikely that early bilinguals have "compromise" categories for stops, but rather produce performance category mismatches that result from dynamic phonetic interactions associated with language activation. This assertion is inconsistent with the evidence put forth by many studies in this literature. Possible explanations for the discrepancy may revolve around methodological issues. For example, the effects of bilingual language modes on speech production/perception are well attested. Many, but not all, of the earliest studies in the "compromise" VOT literature control for language mode by conducting experimental sessions on different days, with all materials and interactions with the experimenter conducted in the relevant language. Thus, it seems unlikely that language mode is a confounding factor in this body of literature. A more likely candidate is speech rate, which is negatively correlated with VOT (see Schmidt and Flege 1996; Magloire and Green 1999). Stölten et al. (2015) proposed controlling for speech rate by using relative VOT when making between group comparisons. Only two of the studies included in the present analysis controlled for speech rate. Magloire and Green (1999) found that Spanish–English bilinguals' English bilabial stop production was no different from that of monolingual English speakers in slow, normal and fast speech.5

The SLM posits that if learners can avoid the equivalence classification of two similar phones, they will be able to develop new phonetic categories. When this occurs, one prediction is that the L2 phonetic category will deflect or dissimilate from neighboring categories in order to maintain phonetic contrast. The early bilinguals of the present study clearly established a phonetic category for English /t/. In terms of the equivalence classification, it is more difficult to make a determination regarding /d/, which was realized with pre-voicing at a higher rate than in the monolingual controls. Any claim that increased pre-voicing is indicative of a "compromise" category would be weak, at best, as American English stops are often pre-voiced (Lisker and Abramson 1964) and may even be the default for some varieties (Walker 2020). Given the acoustics of the category mismatches, there does not appear to be any reason to believe that the bilinguals underlying phonetic categories are anything different from the input they receive in their speech community. On the contrary, the results herein support models that do not assert a bi-directional influence on the underlying grammar.

The Second Language Linguistic Perception Model (L2LP, Escudero 2005; Van Leussen and Escudero 2015), for example, proposes that the L2 grammar is separate from and develops independently of the L1 grammar. This model conceives of Grosjean's bilingual language modes as a continuum ranging from a unilingual mode in the L1 on one extreme to a unilingual mode in the L2 on the other, with an L1–L2 bilingual mode in the middle. Importantly, the L1 and L2 underlying grammars can be activated selectively or in parallel in real time. Language activation can be triggered by variables that are linguistic or

<sup>5</sup> Other studies on bilingual stop production have taken speech rate into account, to be sure, but were not included in the present analysis due to missing and/or unavailable data.

extralinguistic in nature, such as the use of cognates in the experimental items (e.g., Amengual 2012) or the participant's beliefs about the language required for a given task. Thus, this model can account for phenomena like the double phonemic boundary effect (e.g., Lozano-Argüelles et al. 2020) and language-dependent cue weighting (e.g., Yazawa et al. 2019) in speech perception.

To the author's best knowledge, the L2LP has not been used to account for production of consonants, but, on the surface, it appears to provide an elegant explanation for bilingual performance mismatches via parallel activation in bilingual mode. For instance, the bilinguals in the present study would have a monolingual Spanish mode on one end, where they produce /d t/ as dental stops with pre-voicing and short-lag VOT, respectively, and a monolingual English mode on the other, where /d t/ are realized at alveolar place with short- and long-lag VOT. When both languages are activated in parallel in bilingual mode, all combinations of place and voice settings are available, though not equally probable.

It is worth noting that neither the SLM nor the L2LP were designed to explain simultaneous bilingualism. Moving forward, a complete model of bilingual phonology should be able to account for behavior of both simultaneous and sequential bilinguals with a wide variety of linguistic experience in order to appropriately model the nature of the dynamic phonetic interactions that occur between robust phonetic sub-systems in diverse communicative contexts.

The present work could be improved by including more of the extant literature on "compromise" VOT. The meta-analysis presented here excluded a non-trivial subset of relevant studies due to missing data. As mentioned in the interim discussion, the whole of the "compromise" VOT literature includes small sample sizes and is underpowered. The coronal stop data presented herein is subject to the same critique. This analysis was necessarily exploratory in nature, serving the main purpose of providing a qualitative assessment of performance mismatches. Future studies should further examine the nature of performance mismatches to shed light on how they might be modulated by proficiency, language dominance, and language modes. The present analysis focused on stops, specifically coronal stops, and recent research proposes that alveolar sounds may enjoy a special status in L2 speech learning due to a universal phonetic bias (Bohn 2020). Thus, future research should focus on other speech segments, particularly those where multiple cues are weighted differently between language pairs, in order to better understand performance mismatches in conjunction with bilingual language modes. The aforementioned avenues of inquiry motivate testable hypotheses that could prove fruitful in the continued development of models of bilingual phonology.

#### **5. Conclusions**

The present study combined meta-analytic techniques and coronal stop data to assess the extent to which early bilinguals have "compromise" categories for voiceless stops. The results of the analyses (1) suggest that there is little evidence to support this claim in the extant literature, and (2) reinforce the notion that bilingual speech involves dynamic phonetic interactions that can surface as performance mismatches during speech production. The data provide compelling evidence that early bilinguals do not have intermediate, "compromise" phonetic categories, but rather display speech productions that are, by and large, no different from the input of the speech community to which they have been exposed. Taken together, the results support models of bilingual phonology that posit separation between phonological systems and are subject to dynamic phonetic interactions via language activation.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2226-471 X/6/1/9/s1, All supplementary analyses as well as the data, code, and the experimental materials necessary to reproduce the analyses reported in this article are available at: https://osf.io/un45x/.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Arizona (Approval 12-0250-02).

**Informed Consent Statement:** All participants gave their informed consent for inclusion before they participated in this study.

**Data Availability Statement:** The complete dataset presented in this article is openly available on Open Science Framework and can be accessed at: https://osf.io/un45x/.

**Acknowledgments:** I express my gratitude to Miquel Simonet for sharing the dataset used in the analysis of coronal stops, as well as Kyle Jones for sharing data from his dissertation research. I am grateful for insightful comments from 3 anonymous reviewers that improved the quality of this work. All errors are mine alone.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Languages* Editorial Office E-mail: languages@mdpi.com www.mdpi.com/journal/languages

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18