1. Introduction
Cross-language phonetic phenomena are often examined through the prism of Flege’s trifold taxonomy: “from the standpoint of Ll, the phones in an L2 may be taxonomized acoustically as ’identical’, ’similar’ or ’new’” (
Flege 1987, p. 48). When the realisation of a phoneme in an L2 is acoustically ’similar’ to that of an L1 phoneme, i.e., what French /u/ is to English speakers, language learners are less likely to place each /u/ in different phonetic categories. This phenomenon is known as
equivalence classification (
Flege 1987) and may prevent learners “from making effective use of auditorily accessible acoustic differences between phones in L1 and L2” (
Flege 1987, p. 50). This was part of what Flege called the Speech Learning Model (
Flege 1995). Flege and Bohn’s revised Speech Learning Model (henceforth SLM-r) suggests that the quality of input in an L2 may help learners to overcome such difficulties (
Flege and Bohn 2021). These differences mainly concern acoustic or articulatory characteristics of phonemes. However, differences also lie in the allophonic distribution of such phonemes. In this study, we hypothesise that the equivalence classification also applies to the way in which learners classify allophonic distributions that occur in both their L1 and L2, such as unreleased stops—which are found in French and in English. Unreleased stops, i.e., stops lacking audible release, are frequent in word-medial position in both languages but are reported to be less frequent in word-final position in French (
Abercrombie 1967). This may lead French learners of English to use equivalent allophonic distributions of stops in both languages.
The present study explores the impact of an awareness-raising approach on stop-unrelease in the speech of advanced French learners of English. Awareness-raising in second-language speaking, i.e., the facilitation of “conscious knowledge of the facts learned about a language” through various means (
Crawford 1987, p. 109), has been shown to have a significant and long-term effect on L2 speech production (
Svalberg 2007). Our awareness-raising approach includes a research-based explanation of the above-mentioned allophonic distribution to the participants of the study, a presentation of unreleased stops from an articulatory point of view and through the visualisation of the acoustic properties of stops on spectrograms. The approach includes the use of gestures during the production of stops to maximise the efficiency of the training. Although using various approaches may not enable us to determine which approach would work best regarding the allophonic acquisition of stop-unrelease,
Shams and Seitz (
2008) advocate for more multisensory modalities in learning rather than a “single modality in learning” since the former “better approximate natural settings and [are] more effective for learning” (p. 411). We therefore decided to opt for a more “natural” and “multisensory” training environment.
A key tool in our awareness-raising approach is the use of spectrograms.
Quintana-Lara (
2014) pointed at the efficiency of spectrograms in the pronunciation training of /
i/ vs. /i:/ amongst Spanish advanced learners of English through the observation of F1/F2 values produced by the learners and by native speakers. In the present study, spectrograms were deemed a good learning tool since utterance-final stops on a spectrogram are easier to interpret than are F1/F2 values for vowels. If stops are released, a burst is visible, and if they are not released, no burst is observable. Such a binary interpretation, albeit simplistic, is accessible to any language learner
1. However, although they may contribute to raising awareness on stop-unrelease, spectrograms may not help learners to master it from an articulatory point of view. Moreover, the tool cannot be used comfortably in natural conversations since observing the spectrogram during a conversation inevitably stops the flow of communication between speakers. We therefore suggest using hand gestures during the speech production phase, as they are often used in natural speech (
Krauss et al. 1995) and should thereby disrupt a conversation to a lesser extent than spectrograms. Hand gestures are also known to be efficient tools in language learning and can foster progress in various learning situations (e.g.,
Kelly et al. 2002;
Goldin-Meadow 2003). More specifically, they may facilitate the acquisition of new L2 sounds. Xi and colleagues tested the efficiency of gestures in the acquisition of Mandarin stops and affricates amongst Catalan learners of Chinese (
Xi et al. 2020). The results indicate that appropriate gestures “mimick[ing] the phonetic properties” of the sounds as closely as possible helped to improve the learners’ productions of these stops and affricates. When dealing with sub-phonemic phenomena like unreleased stops or the aspiration of stops in L2 English, gestures can prove more efficient than mere imitation (
Amand and Touhami 2016) amongst intermediate French learners of English ranging from secondary school French students to French teachers of English. It did not include university students majoring in English. Hence, the purpose of the present study is to measure the acquisition of stop-unrelease amongst advanced university students of English through awareness-raising via a multisensory learning material, i.e., spectrograms and gestures.
Section 2 briefly summarises the articulatory and acoustic properties of stop-unrelease, lists previous studies analysing stop-unrelease in French and English as an L1 and an L2.
Section 3 presents the research questions followed by the hypotheses.
Section 4 covers the material and methods used in the study. In the results given in
Section 5, the rates of stop-unrelease are analysed by language, syntactic context and learning phase. A discussion of the results is provided in
Section 6 and is followed by a conclusion (
Section 7). The stimuli in French and English are provided in
Appendix A and
Appendix B respectively.
4. Materials and Methods
4.1. Participants
The participants were L1 French second-year university students majoring in English. The students were in their early 20s. Unfortunately, some students dropped out from university or were absent when the experiment was being conducted and did not complete the experiment. During the first recording session, 42 students started the study and were distributed as evenly as possible between the control and the test group (control: 14 women and 6 men, test: 20 women and 2 men). During the second recording session, the number of students was 31 (control group: n = 17, 11 women, 6 men, test group: n = 14, 12 women, 2 men). During the third recording, there remained 3 women from the control group and 2 women and a man in the test group.
The participants were recorded in 2016 in a sound attenuated room at a sampling frequency of 44,100 Hz. The learners were recorded with Praat (
Boersma and Weenink 2022). The microphones used were headset stereo microphones (Plantronics Audio 655 DSP and V7 J151648). The distance between the speakers’ mouth and the microphone was controlled for.
While all students read the stimuli with contextualised voiceless stops in English, 17 of them read stimuli with voiceless stops in French in a similar context (prior to reading the English text). Due to timetable constraints, it was more difficult to train students from the test group and ask them to do an extra recording in French, as it would increase the risk of losing more participants during the study
3. Sixteen students from the control group read the stimuli in both French and English. One student (subject ID: TAI01M) from the test group happened to participate in this recording. His data were nonetheless included. One student from the control group was removed from the study because she accidentally attended the training session, thereby impacting the overall scores of release in the second recording (subject ID: COL28F).
4.2. Recording Phases
Phase 1 (
pre-training,
first recording): Both the control and the test group were asked to read sequences of two words and sentences containing voiceless stop-stop sequences straddled between words (homorganic and heterorganic), and word- and sentence-final voiceless stops. Namely,
that pan,
wait for me at that table over there or
I like that truck (see
Appendix B). Since the availability of the students was limited, stimuli were uttered only once. One week later, during the intervention phase, only the test group watched an explanatory video to raise awareness on stop-unrelease in English. The control group did not watch anything and did not receive any training.
Phase 2 (post-training, second recording): Then, immediately after training, both the control group and the test group were asked to read the same stimuli a second time.
Phase 3 (follow-up test, third recording): The control group and the test group were asked to read the same stimuli a third time a month later with no extra training for the test group and no training given to the control group.
The stimuli in French were composed of a similar structure: verb-noun nonce compounds coined with real French words with a simple CVC structure, sentence-final stops and sentence-medial stop-stop sequences (see
Appendix A). Word-final stops were omitted by mistake. A summary of the experimental phases can be found in
Figure 2.
4.3. Stimuli
The stimuli were compiled by the authors. Due to time constraints, participants were asked to read the stimuli only once. Straddled between two words: Stimuli start with pairs of words. The first word ends with a voiceless stop and the second one starts with the same or another voiceless stop, e.g., black pan. They contain all 9 possible combinations of voiceless stops in that context. Word-final stops: Monosyllabic CVC words containing a final stop were also chosen to measure the proportion of release/unrelease in final position (9 words, /p/, /t/ and /k/ appearing in 3 different words, e.g., a trap, a hack, a hit). Sentence medial stops: To avoid a potential wordlist effect, simple sentences were also created, e.g., stop talking and listen to me. They included a two-word noun phrase with a stop at the end of Word 1 and at the beginning of Word 2. To limit the effect of rare or specific words on stop-unrelease, Word 1 was composed of 4 highly frequent monosyllabic words ending in a stop: stop, that, can’t and like. Sentence-final stops: In total, 23 sentences end in a voiceless stop, e.g., I’ve just told Nick. A set of 5 sentences end in the closing diphthong /aɪ/ + /t/ so as to explore the potential effect of a closing diphthong on t-unrelease (e.g., I’m going out tonight). They will not be included in the analyses and Results Section.
4.4. Video Tutorial: Awareness Approach
The awareness-raising training consists of watching a video in which a trained phonetics instructor demonstrates stop-unrelease with gestures. The instructor provides key elements of stop-unrelease before inviting the students to practise along with her.
In the first part of the video, some research on stop-unrelease in spontaneous speech for American English (
Davidson 2011) is summarised.
In the second part, the articulatory and acoustic characteristics of unrelease are presented with excerpts from an existing video on articulatory phonetics (UBC Visible speech, 2015:
https://youtu.be/dfoRdKuPF9I). The video was then stopped, and spectrograms were shown and commented during that session by the same phonetics instructor using the freely available program Praat (
Boersma and Weenink 2022).
The third part of the video explains how to inhibit the release of voiceless stops. The main learning aid is a cutoff gesture found in choir-conducting as illustrated in
Figure 3. A circular movement of the wrist with the hand curling into a fist once a full turn is reached. It is sometimes called the
pig-tail gesture in French
4. The closed fist coincides with stop-unrelease. In a stop-stop sequence, the release of the second stop is accompanied by the rapid uncurling of the fingers into an open hand. The fingers are projected forward as if following the direction of the airflow coming out of the speaker’s mouth. This prepares the learners for the aspiration of the second stop at a later stage.
A potential drawback of creating a video with multiple teaching approaches (visual and kinaesthetic) is that it becomes impossible to single out the
one factor with the most impact on learners overall. However, we aimed to reach different learning profiles or individual preferences, and to provide students with multiple ways to acquire unreleased stops since “the way students engage in learning is rarely restricted to one single or dominant approach or learning strategy” (
Rogiers et al. 2019, p. 386).
4.5. Measurements
Release was treated as a binary variable: released (R) vs. unreleased (U). Auditory and spectrographic analyses were carried out. If the burst was visible on the spectrogram, then the stop was coded as having a release as illustrated in
Figure 4. Otherwise, the measured tokens were considered as cases of unrelease (see
Henderson and Repp 1982, for more fine-grained metrics with a five-category classification).
5. Results
5.1. French Stops
Amongst the 17 L1 French speakers, the overall release rates was 50.4% and the unrelease, 49.6%. However, disparities in scores that were mostly due to stop position. Sentence-final stops were released almost at ceiling (98.1%, see
Figure 5). When read in pairs of words, heterorganic pairs were released three times more often than in a sentence (65.7% vs. 18.9%). Homorganic pairs are four times more often released in pairs of words than inside a sentence (42.6% vs. 9.7%).
5.2. Release in French vs. English: Phase 1 Only
This section tests whether stop-unrelease differs from one language to the other in the three similar contexts for the 17 students having read both the French and the English stimuli: end of a sentence, pairs of stops inside a sentence and inside a pair of words. Only the first recording in English was retained in this data subset. The results are displayed in
Figure 6 in the form of a conditional inference tree. The tree uses chi-square tests with Bonferroni corrections to partition the tree (
Hothorn et al. 2006). A mixed-effect model and multiple chi-square tests were also run in RStudio (
R Core Team 2023) to confirm the robustness of these results. The higher the variable on a tree, the stronger the effect. Whenever variable levels split, they are deemed to impact scores of release in a significantly different way. The bars below display the aggregate proportions of release/unrelease under each condition. Position (variable wordSent) has a stronger impact on release than language (
X2(2,
N = 2478) = 1665.3,
p = 2.2 × 10
−16). Pairs of stops inside a sentence (sentmed) are the least released in both languages—slightly more released in French (14.8%) than in English (9.4%). Although a chi-square test indicated that this was significant enough,
X2(1,
N = 1380) = 7.15,
p = 0.007, the statistical power points to a small effect (ω = 0.07 for language vs. 0.73 for position, see
Cohen 1992, p. 157 on statistical power thresholds: small effect: 0.10, medium effect: 0.30, stronger effect: 0.50). The results are likely to vary from one cohort to the next but based on this sample of students, we can see that before training, students have comparable allophonic variation patterns in stop-unrelease in both English and French despite minor differences in some contexts. This leads us to conclude the following:
Sentence-medial position favours stop-unrelease (>80% in both languages, but more so in French);
Sentence-final position disfavours stop-unrelease (<20% in both languages);
Pairs of two words such as tape-porte or tap pan tend to have a release score approximating 50%.
This highlights the potential impact of the linguistic context and tasks when investigating allophonic variation amongst L1 and L2 speakers in French and English. The next section investigates the effect of training on stop-unrelease across all speakers in English.
5.3. Release in English: Phase 1 vs. Phase 2
We first checked for potential differences in release scores in English between Recording 1 vs. Recording 2 in the control group. With the subject COL28F, the p-value for the chi-square test was slightly below the critical value, and without this learner, it rose slightly above it (with COL28F: X2(1, N = 2244) = 4.44, p = 0.035, ω = 0.045, without COL28F: X2(1, N = 2112) = 3.64, p = 0.057, ω = 0.042). This means that without this learner, the differences in release scores are not significant enough between Recordings 1 and 2 in the control group.
Then, we investigated potential differences in release scores in English between the two cohorts (test vs. control subjects having completed both Phases 1 and 2, the speaker COL28F was excluded from now on). Due to possible individual differences, the overall release rates differed slightly between the two cohorts in the first recording (X2(1, N = 1056) = 11.59, p = 0.0007). However, the statistical power (ω = 0.07) points to a minor effect. If we add the nine students who completed Phase 1 only (three in the control group, six in the test group), the differences between the cohorts are ironed out (p = 0.38 (ω = 0.02)).
Finally, the effect of training on stop-unrelease amongst the test group was examined (speakers having completed both Phases 1 and 2 only) and was deemed significant (
X2(1,
N = 1842) = 298.52,
p < 2.2 × 10
−16,
ω = 0.404). The average release rate in the control group was 40.2% (rec1 38.2%, rec2 42.3%), while the scores of unrelease in the test group rose from 45.8% (rec1) to 84.3% after training (rec2). More specifically, sentence-final and word-final stops were overwhelmingly released (90% and above for both cohorts) before training (see
Figure 7). Training had a significant impact on sentence-final stop-unrelease in English which rose from 10% to 71.6%, and the difference was slightly smaller in word-final position (from 7.9% to 61.1%). After training, unrelease rose by about 40% in pairs of words (from 49.5% to 91.7%).
Figure 8 complements the preceding figure and indicates which conditions have a stronger influence on stop-unrelease. Stops in final position, be they placed at the end of a sentence or a word, are the most affected by training (Node 13), which leads to a tripling of unrelease scores (Node 15 vs. 14). In medial position, scores depend on whether the stops are straddled between two words in isolation or within a sentence (Node 2). Homorganic pairs have higher scores of unrelease than heterorganic pairs (Nodes 4 and 10 vs. Nodes 6, 7 and 11), yet the difference after training is ironed out in pairs of words (wordmed, Node 12).
Figure 9 shows rates of stop-release by linguistic context (sentences vs. pairs of words) in the test group (before and after training). Overall, homorganic pairs of stops were more often unreleased (rec1: 75.9%, rec2: 98.1%) than heterorganic pairs (75.9% vs. 95.5%) or than stops in final position (11% vs. 72.1%). In final position, /k/ had higher scores of release in both pre-test and post-test. In word-medial position, the homorganic pairs with velar stops were least affected by training despite being three times more often unreleased after training (pre-test: 43% vs. post-test: 14%). The heterorganic pair /kp/ consistently exhibited more release across all conditions in both the pre-test and post-test results. In word-medial position, the bilabial stop /p/ was the least released when followed by the alveolar stop /t/.
A logistic mixed effects regression was run to confirm the effect of the training on the proportion of stop-unrelease along with the environment in which the stop is (final/heterorganic pair or homorganic pair). The subjects and the words used as stimuli were included as random effects. The model specifications were as follows: release ∼ stopType + cohort * recording phase + (1|speaker) + (1|context). The results point to a significant interaction between the group and the recording sessions (p < 2 × 10−16), with the post-test group having significantly lower rates of release than the control and the pre-test groups. There was also a main effect of position with final stops being significantly more released overall compared to homorganic and heterorganic sequences of stops (for both differences: p < 2 × 10−16). The next section investigates whether the training had a lasting effect on the sample of students who re-read the same stimuli in English after a month without further training on stop-unrelease.
5.4. Release in English: All Three Phases
This section analyses the evolution of stop-unrelease amongst the six learners having completed Phases 1, 2 and 3. Stop combinations were classified into a binary variable (within a pair or in final position), since the differences in release scores rely mainly on this opposition (see
Figure 8 and
Figure 9 above). In
Figure 10, the scores of release amongst speakers having completed all three phases depend on the position (Node 1) of the stop (final position or in pairs of stops) and on whether they received the training (remaining nodes). The pair context did not lead to significant differences in release scores amongst the control group (all three phases) and the test group (pre-training). In the test group, the scores were slightly higher in Phases 2 and 3. This means that during the follow-up reading session, the effect of training was still observable a month later. A similar pattern was found for stops in final position with a more striking difference in scores between speakers having received training and those with no training, even though the
p-values are considered equivalent (
p = 0.001). In Phase 3, release decreased in the control group (cont3, Node 5), while within the test group, it increased a month after training (post-test2, Node 8). This means that the effect of training was less durable for stops in final position than it was for pairs, whose scores of unrelease were initially very high (above 80%).
Figure 11 exhibits release scores by phase and speaker. Interestingly, in both cohorts, two learners have similar scores and one has scores that differ from the other two learners in Phase 3 in particular (CME22F and TFE06M). CME22F from the control group seems to inhibit release almost like someone who has received training on stop-unrelease. It is not impossible that she may have talked to her peers from the test group and learnt from them. Some form of awareness is noticeable in her third recording. In addition, after double-checking the recordings for TFE06M, we noticed that the effect of training on stop-unrelease in final position did not last as long as it did for the other two learners (TAL02F and TNA26F). While unrelease stood at 48.5% before training, it almost doubled to 88% immediately after training, before dipping to 60.7%. More specifically, in Phase 3, release was exclusively found in word- and sentence-final positions (resp. 66.7% and 83.3%). However, in pairs of voiceless stops, TFE06M controlled the release of stop 1 and started to add aspiration on the second stop during Phase 2 and maintained it in Phase 3 (e.g.,
tap pan realised as [tæp
˺p
hæn]). This suggests that it is probably easier to produce a new allophone of a phoneme that is not present in the learner’s inventory (aspirated stop) than to acquire a position-dependent allophonic variation pattern that differs between an L1 and an L2.
An inspection of spectrograms of the learners’ recordings can also illustrate the progress in more detail.
Figure 12 represents
tap pan read by speaker TNA26F. Stop-release in medial position is visible in Phase 1 (top), then absent in the remaining phases. Word-initial aspiration is visible for
tap but almost absent for
pan in Phase 1 (i.e., below the 30/50 millisecond threshold,
Cho and Ladefoged 1999). In Phases 2 and 3, aspiration is present in both words.
During Phase 1, the speaker TFE06M’s stops are generally released (
Figure 13, top) in word-final position. In Phase 2 (mid), the second /k/ is released but the other stops are not. In Phase 3 (bottom), both occurences of /k/ are released even though the duration of the burst is visibly shorter than in Phase 1.
6. Discussion
Based on
Flege’s (
1995) model, this study assessed whether cross-linguistic transfers occur at a sub-phonemic level, i.e., stop-unrelease, and whether the allophonic variation patterns observed amongst learners of an L2 may stem from patterns found in the learners’ L1. It also tested whether a diversified training approach (awareness-raising with spectrograms and gestures) could lead the learners to inhibit cross-linguistic transfers from their L1 and to opt for patterns that approximate those by native speakers of English.
H1: L1 French learners of English release utterance-final stops in English at rates that mirror their L1.
Although not all students were assessed on stop-unrelease in French, it is clear that stops in sentence final position are overwhelmingly released (98.1%, based on 17 subjects). Similar results are observed across all participants in English before the training took place (90.9% for all participants having completed Phase 1, based on 39 subjects). In pairs within sentences, the trend is almost the opposite (FR: 14.8%, EN: 9.8%), whereas rates of release approximate chance level in pairs of words in isolation with a tendency to favour release (FR: 58%, EN: 55.3%). These trends point towards an equivalence classification (
Flege 1987) at a sub-phonemic level in the sense that learners seem to tap into the allophonic variation patterns in French to produce stops in English and that these patterns depend mostly on the environment they are in (in pairs or in final position) and on the linguistic structure they are in (pairs of words in isolation or in a sentence).
H2: Homorganic pairs of stops are more likely to exhibit stop-unrelease in the first stop than heterorganic pairs.
More subtle differences are observed in productions before training. In both languages and linguistic structures, homorganic pairs lead to higher scores of unrelease than heterorganic clusters. This aligns with Rojczyk et al.’s study on Polish accented English: “homorganic clusters in Polish [being] optionally unreleased” (
Rojczyk et al. 2013, p. 13), the perception, and subsequently imitation, of such clusters in English (L2) was facilitated despite the absence of any explicit training on stop-unrelease. For stops in word-final position or heterorganic pairs of stops, however, explicit training seems to be needed as intelligibility issues may be at stake (
Cruttenden 2001).
H3: The combination of tools—i.e., spectrograms to raise awareness and gestures to inhibit bursts in stops—significantly helps the learners to control stop-unrelease in final position and in pairs of stops.
A comparison of the productions in the control group versus the test group clearly shows that our multimodal training had a significant impact on stop-unrelease across all positions and more importantly, in utterance-final position. Although this remains to be tested further with more participants, the follow-up test a month later showed the lasting effects of the training, as proportions of stop-unrelease remained high even in utterance-final position. Even though the rates of unrelease went down slightly, we could say that the results are more realistic than scores found immediately after training as they are closer to rates found in large corpora of spontaneous speech in English (
Byrd 1992,
1996 or
Davidson 2011). Our findings align with Bergier’s study on the positive impact of metalinguistic awareness on second-language pronunciation performance even for a sub-phonemic feature like stop-unrelease (
Bergier 2014). During the recording sessions, many students used the
pig-tail gesture to inhibit release. A future study could involve filming students while producing unreleased stops after having received similar training and interviewing them on the strategies they used to inhibit release. Aspiration could also be taken into account while measuring progress, as
Amand and Touhami’s (
2016) study indicates that young learners of English seem to acquire aspiration more easily than stop-unrelease. It is possible that advanced learners of English have the ability acquire both sub-phonemic details at the same time, i.e., stop-unrelease in Stop 1 and aspiration in Stop 2. Finally, the alveolar /t/ having an extra allophone [
ʔ] (see
Byrd 1992, p. 29), they should be investigated further with articulatory measurements of students’ productions since advanced French learners of English are more sensitive to glottal stops than aspiration in perception (
Shoemaker 2014).
7. Conclusions
In France, no explicit training on stop-unrelease is found in English pronunciation textbooks nor in university syllabi for English majors since it is probably considered as a non-contrastive phonetic feature in English. However, as
Schwartz et al. (
2014) remark, “success in acquisition is predicted on the basis of sub-phonemic phonetic detail”, including allophonic variation patterns involving stop-unrelease. This paper suggests that Flege’s
equivalence classification of phonemes between an L1 and an L2 also may involve sub-phonemic features like stop-unrelease in pairs of stops and in final position, yet the allophones of these stops can be re-classified after an explicit training involving awareness-raising with spectrograms and gestures. French learners of English initially transferred the allophonic variation pattern of stops from their L1 to their L2. After training, the test group managed to inhibit release even in utterance-final position in their L2, where proportions of release was initially higher than 90%. A month later, students who completed the experiment until the very end produced allophonic variation patterns in stops that were closer to patterns produced by native speakers of English than by French speakers. This awareness-raising approach may also lead to a better perception of stop-unrelease in natural speech in English, thereby leading to a better understanding of seemingly easy segments like “together we can beat cancer” which can be confused with “together we can’t be cancer”
5. More generally, this cross-language speech production investigation provides a window on French native speaker’s allophonic variation patterns of stops in French and on their ability to adjust the patterns when speaking English as a second language.