1. Introduction and Background
Australian English is reported to have three main varieties: Mainstream Australian English (the institutional standard), Ethnocultural Australian English (which encompasses the Englishes used by speakers with different non-Anglo-Celtic cultural heritage backgrounds), and ‘Aboriginal English’ (
Cox and Fletcher 2017, pp. 12–13). Aboriginal English is spoken by Indigenous Australians, who are First Nations people that experienced colonisation and language loss after the arrival of the British in 1788 (e.g.,
Mailhammer 2021). This variety has been described as “an English-lexified contact-based variety and the first and only language for a sizeable number of Indigenous people in Australia” (
Louro and Collard 2021, p. 2). There are multiple varieties of Aboriginal English spoken in Australia, and Aboriginal Englishes, plural, is often considered a more appropriate term to use to reflect this (
Eades 2013). While L1 and L2 varieties are spoken, L1 varieties are more common in the south of the continent, where our research is carried out; and there are greater numbers of L2 speakers in the northern regions of Australia, which largely reflects colonisation practices (
Mailhammer 2021). Aboriginal Englishes may be phonetically and/or phonologically different from Mainstream Australian English (
Butcher 2008) and can also be structurally different (
Eades 2013;
Malcolm 2013) depending on the variety in question. Communicative practices are also known to be culturally different from the mainstream variety (
Louro and Collard 2021), and this extends to the use of disfluencies such as unfilled pauses (
McDougall et al. forthcoming) and filled pauses (
Blackwell and McDougall forthcoming) in L1 Aboriginal English. There are very few acoustic studies focusing on the vowels produced by Aboriginal English speakers in Australia, despite a clear difference in the sound system(s) between these speakers and Mainstream Australian English speakers, and this will be the focus of the current paper.
In acoustic descriptions of English vowels in Australia, most attention tends to be paid to the mainstream variety and mostly from Sydney (i.e.,
Harrington et al. 1997;
Watson and Harrington 1999;
Cox 1999;
Cox and Palethorpe 2008,
2019;
Elvin et al. 2016;
Grama et al. 2019;
Cox et al. 2024). The limited work on Aboriginal English vowel spaces has been based on static measures of formant steady states. These studies have shown in particular that Aboriginal English vowel spaces are more compressed than the mainstream variety and have more “conservative” features, whether L2 Aboriginal English (
Butcher and Anderson 2008) or L1 Aboriginal English (
Loakes et al. 2016). In these production studies focusing on Aboriginal English vowels, and in the perception studies which consider vowel categorisation behaviour by Aboriginal English listeners (
Loakes et al. 2024a,
2024b), varietal differences are analysed as due to Aboriginal English not having undergone the same rates of change as the mainstream variety (also see
Butcher 2008). In particular, front vowel lowering has been especially rapid for MAE speakers (i.e.,
Cox and Palethorpe 2008). This means that age is often a highly significant factor in how Mainstream Australian English listeners produce vowel variants (e.g.,
Cox 1999;
Cox and Palethorpe 2008) and respond to vowel categorisation tasks (
Mannell 2004;
Loakes et al. 2024a,
2024b). Perception studies focusing on short front vowels with the same speaker-listeners and communities as the current study (
Loakes et al. 2024a,
2024b) have also shown that Aboriginal Australian English participants respond differently to various vowel contrasts compared to Mainstream Australian English listeners in the same region. This is consistent with research discussed above, which shows a more compressed vowel space for Aboriginal English speakers. One example is that Aboriginal English listeners in Mildura have a significantly earlier crossover between KIT-DRESS compared to Mainstream Australian English listeners (
Loakes et al. 2024a); this is consistent with the idea that Aboriginal English speakers produce vowels which are phonetically less open, and whether this is true for the Mildura speakers will be empirically tested in the current study. What is as yet unknown is the exact acoustic realisation of vowels in Aboriginal English in Victoria, including duration and dynamic features, and their relationship to vowels produced by Mainstream Australian English speakers.
Aims
The aim of the current study is to provide an acoustic analysis of the short vowels KIT, DRESS, TRAP, STRUT, and LOT in L1 Aboriginal English, as well as the long vowel GOOSE. These vowels are chosen because they are among the monophthongs that have experienced rapid change in mainstream Australian English (
Cox et al. 2024 for production;
Mannell 2004;
Loakes et al. 2024a,
2024b for perception). Additionally, because Aboriginal English is said to have changed less rapidly (
Butcher 2008), vowels in general are an important focus for accent differences between varieties. Recent work by
Cox et al. (
2024) summarises the main diachronic changes in Mainstream Australian English and incorporates both static and dynamic analyses. They note that static measures are important for showing the relationship between vowels in the vowel space, while dynamic measures “provide tools for assessing the changes in a vowel’s time-varying spectral detail” (2024, p. 17). As such, we use a mix of tools to analyse vowel acoustics (as described in
Section 2.3).
With a focus on data from two regions (Warrnambool and Mildura), the paper will also explore potential regional variation within L1 Aboriginal English. Regional variation within Aboriginal English in Australia has not been widely analysed, but such variation is acknowledged. As mentioned above,
Eades (
2013) advocates for the term
Aboriginal Englishes”, and there has also been regional variation observed in the phonetics of L1 Aboriginal English for the communities analysed in this study, in both voice quality (
Loakes and Gregory 2022, for male speakers) and in the realisation of /t/ (
Loakes et al. 2022). We also look at age as a variable, because as described above there has been rapid diachronic change reported in Australian English vowels for MAE speakers (i.e.,
Cox et al. 2024). While Aboriginal English speakers have been reported to have more conservative vowel spaces (
Butcher 2008;
Butcher and Anderson 2008;
Loakes et al. 2016), there are nevertheless differences between older and younger groups in how they process vowels, yet the differences are less marked (e.g.,
Loakes et al. 2024a,
2024b). Some age differences in AAE are thus likely and should be considered. Finally, gender is also analysed, because depending on the vowel, gender is a known factor driving vowel realisation in Australian English (see e.g.,
Cox and Palethorpe 2019).
As mentioned, KIT, DRESS, and TRAP have been the focus of recent perceptual studies showing various differences in the ways Aboriginal and Mainstream Australian English listeners respond to the contrasts (
Loakes et al. 2024a,
2024b), and STRUT and LOT are also of interest having been included in a study by
Mannell (
2004) focusing on change in the Australian English perceptual space. As such, there is a relatively comprehensive understanding of how these vowels have changed in Mainstream Australian English communities, as well as in the communities in the current study (
Loakes et al. 2024a,
2024b). In the current study, we also decided to include the long vowel GOOSE (phonetically /ʉː/ in Australian English) because this vowel has undergone a substantial fronting change in the mainstream variety (
Cox et al. 2024;
Cox 1999; also see
Elvin et al. 2016), with less fronting observed for both L1 and L2 Aboriginal English speakers (
Butcher and Anderson 2008;
Loakes et al. 2016), which also makes for another interesting comparator.
Variables analysed in the current paper aim to give an overview of the production of vowels for L1 Aboriginal English speakers in Victoria. This includes static and dynamic measures for formants (including the F1/F2 vowel space), and vowel duration. Previous studies on Aboriginal English vowels have only focused on static measurements for F1/F2 (
Butcher and Anderson 2008;
Loakes et al. 2016) and have not included dynamic analyses or duration analyses, so this study will provide a more comprehensive and nuanced understanding of the acoustics of vowels in this variety. Monophthongal vowels, which are the primary focus of our study, are said to be sufficiently distinguished by duration and F1/F2 measurements at the target rather than by dynamic features (e.g.,
Watson and Harrington 1999), and trajectory movement in short vowels has also been shown to be dependent on coarticulatory factors in Australian English (
Elvin et al. 2016). More recently, however, research on Australian English has also shown trajectory movement can have a bearing on sociophonetic differences in Australian English in diphthongs (
Penney et al. 2023) and even in short monophthongs (e.g.,
Docherty et al. 2015;
Cox and Palethorpe 2019), so we include dynamic F1/F2 measures in our analysis bearing this in mind.
2. Materials and Methods
Analysing the speech of 33 Aboriginal English speakers, we focus on acoustic measures to describe the short vowel system of Aboriginal Englishes spoken as an L1 in two locations in the southeast of Australia. We measure duration, static F1/F2, and dynamic F1/F2 trajectories of KIT, DRESS, TRAP, STRUT, LOT, and GOOSE using current methodology as will be described. Along with region, the sociolinguistic variables of age and gender are also considered in the analysis. Speakers can be grouped into two distinct age groups, >40 and <40, enabling investigation into diachronic changes. This will enable a more complete picture of the acoustic qualities of the vowels of AAE. We compare the AAE data with a “baseline” sample of 28 Mainstream Australian English speakers from the same regions. This MAE sample will not be the primary focus of the paper but gives a sense of the variation occurring within Mainstream Australian English, and highlights the differences and/or similarities, as the case may be, between AAE and MAE.
2.1. Speakers
The speech sample was collected by the first author as part of a more extensive study on sociophonetic variation in AAE (see more detail in, e.g.,
Loakes et al. 2024b). Participants are monolingual speakers of Australian English from rural Victoria who live in Warrnambool in the southwest of the state of Victoria, and Mildura in the northwest of the state. Warrnambool and Mildura are small towns, both with approximately 35,000 people, located 159 miles (256 km) and 335 miles (540 km), respectively, from the capital Melbourne, and the towns are located 328 miles from one another. These locations are shown in the map below in
Figure 1.
The AAE speakers self-identified as members of this speech community, specifically signing up for a study on “Aboriginal English”. They described their variety of English as “Aboriginal”, “Aboriginal English”, or “Koori English” (Koori is a term used by Aboriginal people in Victoria). Mainstream Australian English speakers all spoke the institutional standard (
Cox and Fletcher 2017). All speakers self-identified as male or female. Participants also fell into two distinct age ranges: a younger group under 40, and an older group over 40. A breakdown of the location, gender, and age of AAE speakers is presented in
Table 1, along with the breakdown of the MAE group.
2.2. Materials and Recording Procedures
Recordings took place in fieldwork conditions in Warrnambool and Mildura, so the recording setting differed depending on the speakers’ preferences. In Warrnambool, AAE speakers were recorded both in their homes and in two different Aboriginal co-operatives (culturally appropriate community service centres), which had meeting rooms: one in Warrnambool, one in a location close to Warrnambool. MAE speakers were recorded in their homes. In Mildura, almost no participants were recorded in their own homes. AAE speakers were recorded in the Aboriginal co-operative there, while MAE speakers were recorded in a public space—often the foyer in the town library, which is a central community space. Speech data were recorded using a portable Zoom (Tokyo, Japan) Handy Recorder H4n at a sampling rate of 44,100 Hz. Participants were presented with the target vowels in isolated /hVt/ contexts as a control and because of a specific interest in sociophonetic variation in /t/ (i.e.,
Loakes et al. 2022). They were instructed to read the words aloud at a regular conversational rate, but rates differed depending on the speaker. Each vowel was produced six times, although some iterations were later discarded, primarily due to background noise or intelligibility issues. In total, 2168 vowel tokens were available for analysis with the total number of tokens per vowel shown in
Table 2.
Participants also took part in a forced-choice perception (vowel categorisation) task and a sociolinguistic interview, but those results are not analysed here (
see Loakes et al. 2024a,
2024b) for results of the vowel-categorisation task, and (
McDougall et al. forthcoming) and (
Blackwell and McDougall forthcoming) for disfluency analysis in the spontaneous speech). We note that the advantage of focusing on controlled speech in the first instance means we will produce a baseline sample for later comparison with spontaneously produced speech. While controlled speech can arguably be seen as less natural, the focus on citation forms means we can produce vowel spaces comparable with other research (especially the L2 spaces in
Butcher and Anderson 2008). We note that there is also evidence that controlled speech may be sufficient for analysing speakers’ usual production behaviour in some non-mainstream speaker groups. For example,
King et al. (
2020) found in a study of Māori speakers that viewing the speakers through a typical sociolinguistic lens was actually not appropriate for that group, and that the idea of prestige norms in citation speech did not apply. This may also be the case for the Aboriginal English speakers in the current study, although this is not the main focus here.
2.3. Analysis
The start and end boundaries of each vowel were first determined automatically in WebMaus (
Kisler et al. 2017) and then hand corrected as needed. Duration measures were automatically detected from these boundaries using EmuR (
Jochim et al. 2023). Formants were automatically extracted using the ‘on the fly’ extract trackdata function in EmuR. These were then visualised and hand corrected when necessary. Monophthongs were subject to a static analysis with the peak F1/F2 taken depending on the vowel (e.g.,
Cox 1999). Vowel targets for monophthongs were calculated from the point at which F1 or F2 was highest or lowest, then matched to the other formant’s data at that point. The process was limited to the first half of each segment in line with previous work (
Harrington et al. 1997;
Cox 2006) to reduce effects of the following consonant. The following list shows at what point the target was calculated for each vowel.
KIT—minimum F1
DRESS—maximum F2
TRAP—maximum F1
STRUT—maximum F1
LOT—mimimum F2
GOOSE—minimum F1
Linear mixed effects models were chosen to statistically investigate the sociolinguistic elements of region, gender, and age for static measures (duration, F1(Hz), F2(Hz)). These enable the random effect of speaker to be included in the analysis, and all fixed factors (region, gender, age, and vowel) and their interactions to be included in the modelling. Post-hoc pairwise comparisons are calculated using the emmeans package (
Lenth 2023).
Dynamic formant analyses were conducted with measurements taken at five points across the vowel: 20%, 35%, 50%, 65%, and 80%. Trajectories were calculated across the normalised length of the vowel (21 points) for F1 and F2. Generalised Additive Mixed Models were fitted for formant (F1 and F2) for each vowel. For the GAMM analysis, we fitted generalised additive mixed models using the mgcv (
Wood 2006, version 1.8–31) and itsadug (
van Rij et al. 2020) packages in R (
R Core Team 2020). As the inclusion of interactions of multiple predictors (such as dialect and vowel) is not straightforward in GAMMs, separate models were fitted for F1 and F2 of each of the vowels to enable interpretation of potential changes in each vowel for each sociolinguistic variable. Separate models were fitted using the factors of dialect, location, gender, and age as ordered parameteric terms. In all models a smooth over-normalised vowel duration, a smooth over-normalised vowel duration by parametric factor (dialect, location, age or gender), and a (random) factor smooth over-normalised vowel duration by speaker was included.
3. Results
Results are presented below and include both inferential and descriptive statistics.
Appendix A contains mean static measures broken down by dialect, location, gender, and age for reference in
Table A1.
3.1. Duration in AAE
Duration was measured for each AAE vowel, and the mean duration was calculated.
Figure 2 shows AAE vowel durations broken down according to location and gender.
Table A1 in the
Appendix A provides mean duration values broken down by vowel, dialect, location, gender, and age. A linear mixed effects model was built by using the fixed factors of vowel, location, gender, and age, and a random intercept for participant for the AAE speakers. Interactions between the fixed factors were included in the model. It was fit by REML, and the
t-tests use Satterthwaite’s method as determined by the package lme4, lmertest (
Bates et al. 2015). Post-hoc pairwise comparisons were calculated using the emmeans package (
Lenth 2023). A full statistical summary can be found in
Table A2.
Similar to what we expect from Australian English vowels in the Mainstream variety (see e.g.,
Elvin et al. 2016), TRAP, LOT, and GOOSE have significantly longer duration than the reference vowel KIT for AAE speakers. There was a significant interaction of Location and LOT (t(1233) = −3.73,
p < 0.001) and also for Gender and LOT (t(1233) = −3.49,
p < 0.001). These three factors (Location * Gender * LOT then also show a significant interaction (t(1233) =3.11,
p < 0.01). The effect of Location * Gender * GOOSE was statistically significant and negative (beta = −16.38, t(1233) = −2.10,
p < 0.05) and the effect of Gender * Age * TRAP was also statistically significant and negative (beta = −17.40, t(1233) =−2.16,
p = < 0.05). Post hoc tests did not show any significant differences in duration between locations, age groups, or gender except as seen in
Figure 2, where Female MI speakers have significantly longer LOT vowel than the male speakers in the same location (t(29.6) = 2.3,
p < 0.05). While it seems in
Figure 2 that there is a trend for female speakers in Mildura to have longer vowels overall, this is only significant for LOT.
DRESS and TRAP were examined in more detail due to duration being used by some speakers of Australian English to distinguish these vowels in prenasal contexts (
Cox and Palethorpe 2014), as well as the fact these vowels are involved in a prelateral merger and are variably produced (
Cox et al. 2024) and perceived (
Loakes et al. 2024a,
2024b). In this citation form data, post-hoc results showed that DRESS and TRAP differed significantly in length for AAE speakers in the following contexts:
Females: WN: | <40 t(1210) = −3.57 p < 0.01, |
| >40 t(1209) = −4.2, p < 0.01 |
Males: MI, | <40, t(1209) = −3.1, p < 0.05 |
WN, | <40, t(1209) = −4.1, p < 0.01 |
Female speakers in WN regardless of age produce TRAP with a significantly longer vowel duration than DRESS, whilst in MI these vowels are produced with a similar duration. In contrast, young male speakers, irrespective of location, produce TRAP with a significantly longer duration than DRESS. Taken with the above results, this description of duration in AAE vowels shows that there is some sociophonetic variation employed by speakers, dependent on location, gender, and age.
3.2. Duration Comparison of AAE and MAE
Whilst the acoustic description of AAE is the main focus of this paper, it is helpful to compare with a MAE sample and thus highlight the differences and/or similarities, as the case may be, between AAE and MAE. When we focus on the comparison of AAE speakers with MAE speakers, we see that there are some differences in duration depending on the vowel and the dialect (see
Figure 3).
A linear mixed effects model was built by using the fixed factors of vowel, gender, and dialect and a random intercept for participant for all speakers. Interactions between the fixed factors were included in the model. All vowels showed significant (p < 0.01) differences in duration from the reference vowel (KIT). There was also significant interaction between DRESS and dialect (p < 0.01) and LOT and dialect (p < 0.05). Gender showed significant interaction with the vowel GOOSE (p < 0.001), whilst the interaction between dialect, LOT, and gender was also statistically significant (p < 0.05). A Type III Analysis of Variance was conducted to examine the effects of vowel, gender, and dialect on duration. The analysis used Satterthwaite’s method for estimating degrees of freedom. As expected, the factor vowel was found to have a highly significant effect on duration, F(5, 2088.06) = 200.99, p < 0.001. The interaction between dialect and vowel was also highly significant, F(5, 2088.06) = 7.09, p < 0.001, indicating that the effect of vowel varied depending on dialect. The interaction between vowel and gender was significant, F(5, 2088.06) = 8.60, p < 0.001. No significant effects were observed for dialect or gender alone, nor for the three-way interaction between dialect, vowel, and gender.
3.3. AAE Static F1/F2
Static measures of the short vowels were taken at the vowel target as described in the methodology. A full list of mean F1 and F2 broken down by vowel, dialect, location, gender, and age may be found in
Table A1. AAE vowel means were plotted in
Figure 4 by sociophonetic variable (location, gender, and age) after being bark normalized for ease of comparison in the plots. As noted, statistical comparisons have been made on the raw values.
In terms of location, AAE speakers from both locations have very similar mean F1/F2 values. WN speakers in general have a slightly more expansive vowel space; however, LOT for MI speakers is phonetically more back. Even after Bark normalisation, female AAE speakers have more fronted mean values for KIT and GOOSE and higher production of DRESS. Younger speakers have a more retracted vowel space as well as an expanded F1 space in comparison to older speakers.
A linear mixed effects model was built separately for both F1 and F2 using non-normalised Hz values. Fixed factors of vowel, location, gender, and age, and a random intercept for participant, were included along with interactions for each of the fixed factors. It was fit by REML, and the
t-tests use Satterthwaite’s method as determined by the package lmerTest, lme4 (
Bates et al. 2015). A full summary is found in
Appendix A,
Table A4. As would be expected, there were significant differences in F1 for all vowels (except GOOSE) in comparison to the reference vowel KIT. There were a number of significant interactions:
Location * DRESS p < 0.05
Location * Gender * DRESS p < 0.01
Location * Gender * TRAP p < 0.05
Location * Age * DRESS p < 0.05
Gender * Age * DRESS p < 0.001
These interactions all involve the F1 of one of the non-high front vowels and involve all the sociolinguistic variables (Location, Age, and Gender), though in different combinations.
In terms of F2, there were significant difference for all vowels in comparison to the reference vowel KIT. In addition, Gender was also a significant factor (t(1233) = −3.06, p < 0.01). There were a number of significant interactions:
Gender * Age p < 0.01
Gender * STRUT p < 0.001
Age * TRAP p < 0.05
Location * Gender * TRAP p < 0.05
Location * Gender * LOT p < 0.001
Location * Age * STRUT p < 0.05
Location * Age * LOT p < 0.05
Gender * Age * DRESS p < 0.001
Gender * Age * TRAP p < 0.001
Gender * Age * LOT p < 0.01
Gender * Age * GOOSE p < 0.05
F2 interactions also involve all the sociolinguistic variables (in various configurations), and incorporate more of the vowels (including DRESS, TRAP, STRUT, LOT, and GOOSE).
In order to better understand the interactions, post-hoc tests were carried out. Results are presented in
Table 3; significant post-hoc results (
p < 0.05) for each of the major fixed factors (location, gender, and age) are presented by vowel. Arrows are used to show the direction of the difference, with ↑ showing the second value is higher, or ↓, lower than the first for each pair of sociolinguistic variables; Location (MI/WN), Gender (F/M), and Age (<40/>40).
When all the variables were controlled for, we see that location differences were minimal in F1 with the only significant differences between MI and WN occurring for Male < 40 speakers in DRESS and STRUT; these speakers in WN have higher F1 values and therefore a phonetically lower position in the vowel space. Female > 40 speakers in WN show significant differences from their Mildura counterparts with lower F2 values for KIT, DRESS, and TRAP, indicating these vowels are significantly retracted for this group.
Gender differences (F/M) are concentrated in F2 where across locations, <40 male speakers exhibit significant retraction for KIT and GOOSE. Age (<40/>40) is an important factor for female Mildura speakers with F1 in TRAP being significantly lower for >40 and F2 significantly higher. These speakers also have a significantly higher F2 in DRESS than their younger counterpart. Older male WN speakers show a similar pattern but in different vowels; DRESS (F1) and KIT (F2).
Post-hoc results for significant interactions of vowels are not shown due to there being significant differences in F1 and F2 for most vowels (as expected). However, of note are the female speakers in MI, >40 and the male <40 speakers in WN who do not have a significant difference in vowel height for DRESS and TRAP (p = 0.9, p =< 0.9, respectively).
3.4. Static F1/F2 Comparison of AAE and MAE
When the two dialects are plotted together (
Figure 5), we can see that many of the vowels are in general overlapping between MAE and AAE. MAE speakers have more variability in the F1 dimension for STRUT and LOT, whilst AAE speakers vary more along the F2 axis for these vowels. MAE speakers also have even larger ellipses for the vowels DRESS and TRAP than the AAE speakers, which likely reflects the phonetically more open productions by younger MAE speakers.
Linear mixed effects models were calculated individually for F1 and F2 with the fixed interacting factors of vowel, gender, and dialect and a random intercept for participant for all speakers. A full summary is provided in
Appendix A,
Table A5. For F1 there were significant differences in vowel height in comparison to the reference vowel KIT for all vowels except GOOSE. Significant interactions were Dialect * STRUT (
p < 0.05) and Gender * DRESS (
p < 0.001). A Type III Analysis of Variance was used to examine the effects of vowel, gender, and dialect on F1 and found the interaction between dialect and vowel and gender and vowel were statistically significant (F(5, 2104.75) = 5.30,
p < 0.0001, F (5, 2104.75) = 12.95,
p < 0.001, respectively). Post-hoc tests confirmed that MAE in comparison to AAE has significantly lower STRUT vowels in the vowel space.
Results from the LMER for F2 showed a significant effect for Dialect, t(99.0) = 2.5, p < 0.05. Gender was also statistically significant, t(−287.3) = −6.7, p < 0.001. All vowels differed significantly in F2. Dialect and vowel showed a significant interaction for STRUT (p < 0.01), whilst Gender and Vowel showed a number of significant interactions for the vowels DRESS (p < 0.05), TRAP (p < 0.001), STRUT (p < 0.001), and LOT (p < 0.001). A Type III Analysis of Variance was used to examine the effects of vowel, gender, and dialect on F2, and significant main effects were observed for Dialect (F(1, 56.45) = 10.14, p < 0.01), Gender (F(1, 56.45) = 32.18, p < 0.001), and Vowel (F(5, 2101.58) = 1260.06, p < 0.001, indicating that these factors independently influence F2. There are significant two-way interactions between Dialect and Vowel (F(5, 2101.58) = 4.98, p < 0.001) and between Gender and Vowel (F(5, 2101.58) = 14.20, p < 0.001) showing that the effect of dialect and of gender varies significantly across different vowels. However, the interaction between Dialect and Gender and the three-way interaction between Dialect, Gender, and Vowel were not significant, indicating that the combined influence of these factors does not significantly affect the outcome.
Post-hoc pairwise comparisons were calculated using the emmeans package (
Lenth 2023). This showed that for female speakers KIT (
p < 0.05), DRESS (
p < 0.001), and TRAP (
p < 0.05) were significantly retracted when produced by AAE speakers compared to MAE speakers. KIT (
p < 0.05) and DRESS (
p < 0.001) were also significantly retracted for male AAE speakers in comparison to male MAE speakers.
3.5. AAE Dynamic F1/F2
Whilst static formant measurements form an important part of the description of the vowel system of a language (especially for comparison with existing descriptions), it is also important to investigate what is occurring across the entire duration of a vowel, including for the short vowels (also see
Elvin et al. 2016;
Penney et al. 2018). As shown in
Figure 6, when it comes to the AAE speakers, we see relatively substantial movement in the trajectories of F1 and F2.
AAE speakers show some differences for location, with more trajectory movement in MI than in WN. The AAE female speakers in general show more movement across the vowels than the male speakers, but especially between 65% and 80% of the vowel length. The < 40 AAE group shows extensive movement throughout the vowel, particularly in GOOSE, DRESS, and TRAP. Again, most of this movement occurs during the latter part of the vowel.
To look more at the vowel trajectory, a GAMM analysis was conducted separately for each formant of each vowel and for each sociophonetic factor; region, age, and gender. Separate models were fitted using the factors of location, gender, and age as ordered parametric terms. In all models a smooth over-normalised vowel duration, a smooth over-normalised vowel duration by parametric factor (location, age, or gender), and a (random) factor smooth over-normalised vowel duration by speaker was included. Formant values were included as non-normalised Hz values (
Cox et al. 2024). A summary for the parametric and non-linear analyses for the comparsion between location (MI vs. WN), age (<40 vs. >40), and gender (F vs. M) are presented in
Table 4. Full results appear in
Appendix A,
Table A6,
Table A7 and
Table A8.
Location shows significant differences between MI and WN for the KIT and DRESS vowels with these differences occuring in both F1 and F2 in the non-linear effects. Age showed significant effect both parametic and non-linear across F1 and F2 for DRESS. Significant parametric effects were also present for TRAP and LOT. Gender showed the greatest number of significant effects, with parametic or non-linear effects being significant for each vowel. In KIT and DRESS, this is true across both F1 and F2.
3.6. Dynamic F1/F2 Comparison of AAE and MAE
Turning to compare AAE with MAE (
Figure 7), the vowel trajectories particularly of STRUT and LOT show relatively more movement for AAE speakers. The directionality of this movement is also different in most cases, with AAE speakers starting further towards the back of the vowel space and then moving forward.
To look more at the vowel trajectories, a GAMM analysis was conducted separately for F1 (Hz) and F2(Hz) of each vowel, and a smooth over-normalised vowel duration, a smooth over-normalised vowel duration by parametric factor (dialect), and a (random) factor smooth over-normalised vowel duration by speaker was included. A summary of significant effects is shown in
Table 5 with a full statistical summary shown in
Appendix A,
Table A9.
There are significant differences for Dialect between AAE and MAE across all vowels except GOOSE. These differences are concentrated in the parametric effects, and primarily F1 parametric effects. The DRESS vowel also has significant F2 parametric effects.
4. Discussion
The overall purpose of this paper was to give a detailed acoustic description of vowels in L1 Aboriginal Australian English. By focusing on vowel duration, as well as static and dynamic measures for formants (F1/F2), the aim was to comprehensively describe acoustic features of vowels known to be undergoing change in Australia. Sociophonetic variables were considered (age, gender, and region), and by comparing Aboriginal Australian English with a sample of Mainstream Australian English, varietal differences in Australia were also considered, though they were not the primary aim of the paper.
For duration, we did not see large differences across the data set, and regardless, these results should be treated with caution because duration can interact with speech rate. Nevertheless, we saw some limited sociophonetic variability in the data with respect to this variable. Among the short vowels, TRAP and LOT are phonetically longest, and in our data, GOOSE was also significantly longer, which can be expected given that this vowel is phonemically long. We saw that female AAE speakers in Mildura had somewhat longer vowels than their male counterparts, but this was only significant for LOT. This vowel also patterned differently according to location and gender. Additionally, the duration analysis focused on DRESS and TRAP due to their being involved in various sound changes in Australian English. For AAE, we saw that speakers in WN had a longer TRAP vowel compared to DRESS (this was not observed for MI), and we also saw that male speakers tended to produce a phonetically longer TRAP vowel as well. When comparing the AAE and MAE speakers in this study, we saw that all vowels showed significant differences in duration from the reference vowel (KIT), and we also saw a significant interaction between DRESS and dialect and TRAP and dialect. Post-hoc pairwise comparisons confirmed AAE speakers have a significantly shorter DRESS vowel compared to the MAE speakers, but this was not significant for TRAP. Given sound change in DRESS and TRAP, it is perhaps not surprising that differences are observed for this vowel pair, although length is not typically the overarching feature mentioned with respect to change (see e.g.,
Loakes et al. 2024a,
2024b;
Cox et al. 2024). It is also worth noting that the actual duration measurements in this study are consistent with other work reporting the length of short vowels in Australian English in wordlist contexts, for example,
Elvin et al. (
2016) and
Penney et al. (
2018), who show similar ranges for duration of these short vowels. While some variability is observed, in this study differences between L1 AAE and MAE vowels are not particularly evident durationally; rather, we find differences in other acoustic dimensions.
As far as static F1/F2 is concerned, we described the vowel spaces of the L1 Aboriginal English speakers and found they were very similar to the Mainstream Australian English speakers’ vowel spaces in their general shape. However, the Aboriginal English speakers had a less expanded space than the Mainstream Australian English speakers. In terms of regional difference, AAE speakers from both locations had very similar vowel acoustics (i.e., there was limited regional variation between MI and WN), but overall, the WN speakers had somewhat more peripheral vowels than the MI speakers. We also saw that for AAE speakers, gender differences occurred and were primarily concentrated in F2. For the dynamic analyses, we focused on movement across the duration of the vowel and saw a relatively large amount of trajectory movement for AAE speakers, and especially in the area around 65–80% of the vowel’s length. Some small age, location, and gender differences were also described for AAE. Significant varietal differences between AAE and MAE were observed and were largely concentrated in the F1 parametric effects.
When considering the findings overall, we can speak to a number of matters referred to in the introduction. The first is whether L1 Aboriginal English is potentially more conservative than the Mainstream variety, having undergone less change. This was noted by
Butcher and Anderson (
2008) for L2 speakers, and was also observed in perception (i.e., earlier category crossovers) for AAE listeners who form the speaker groups for this study (i.e.,
Loakes et al. 2024a,
2024b). While we found that the vowel spaces of AAE and MAE in this study were similar in shape, which is not surprising given that all participants are speakers of English who were born and still living in the same regions, the findings also nevertheless align with previous research showing that the L1 AAE speakers produce phonetically less open and more retracted vowels than the MAE speakers (i.e., the AAE vowel space is less peripheral). This is especially evident in
Figure 5 and
Figure 6 and was borne out in the statistical results for the KIT, DRESS, and TRAP vowels, which are the vowels often implicated in sound changes in Australian English (
c.f. Mannell 2004;
Cox and Palethorpe 2008;
Loakes et al. 2024a,
2024b). MAE speakers also had larger ellipses, which we hypothesis would be due to greater differences between older and younger speakers, reflecting the rapid change occurring in the MAE vowel system. While we did not specifically test for age in the MAE group, this would align with other research on age-graded variation in the mainstream variety (
i.e., Cox and Palethorpe 2008 for production;
Loakes et al. 2024b for perception).
Looking at the results more closely, while we did not see great amounts of difference between the two regions within AAE, we can also say that WN speakers have slightly less conservative vowels than the MI speakers, and this may be due to the proximity of WN to Melbourne. This finding about the Mildura speakers being the most conservative is also consistent with findings in a perception study (
Loakes et al. 2024a) that showed more limited perceptual variability among the older and younger Mildura speakers, compared to findings for WN speakers (
Loakes et al. 2024b). In that study, the issue of geographic isolation for Mildura was noted as a likely reason for less rapid change there. Additionally, in the current study we found younger speakers in AAE were exhibiting more open vowels than older speakers, moving towards the changes observed in MAE, but at a less rapid pace.
In terms of the variables that were important for driving speaker behaviour in the AAE data, age, gender, and region all played some role in the distribution of patterns. As mentioned, regional variability was relatively limited. Gender differences were evident, and age differences were more noticeable in Mildura as compared to Warrnambool, and there were also various interactions as described. Previous research has talked about Aboriginal Englishes plural (i.e.,
Eades 2013), and in the communities studied here that plurality has been seen previously in the realisation of voice quality (
Loakes and Gregory 2022) and in more fine-grained differences in production of /t/ (
Loakes et al. 2022). Variation within AAE is known to be generally greater than for MAE (e.g.,
Mailhammer 2021;
Butcher 2008;
Loakes et al. 2022), although that was not specifically observed for the vowels in this study. In the current study, regional differences are very limited and do not support the notion of different varieties of AAE based on the vowel system alone. This highlights the importance of focusing on various acoustic (and other linguistic) measures in the description of a language variety. Given what we know from our previous work on voice quality (
Loakes and Gregory 2022), an interesting area for further research would be to triangulate formant, duration, and voice quality measurements in descriptive and sound change research, to consider overall perceptual effects for listeners.
5. Conclusions
Australian Aboriginal English is known to be a distinct variety of Australian English, with structurally different linguistic features compared to Mainstream Australian English (
Malcolm 2013;
Eades 2013;
Mailhammer 2021) as well as socially conditioned differences (
Louro and Collard 2021;
Loakes and Gregory 2022;
Loakes et al. 2022,
2024a,
2024b). As far as phonetic analyses of vowels are concerned, previous research has included an analysis of static measures in F1/F2 (
Butcher and Anderson 2008;
Loakes et al. 2016) and static measures of voice quality (
Loakes and Gregory 2022), and we build on that knowledge in the current research by bringing in more data points (duration and dynamic F1/F2 analyses). The results in the current paper thus more comprehensively describe how the acoustics of vowels in Aboriginal English align with or are different from the mainstream variety along various acoustic dimensions.
The study has highlighted varietal differences among Mainstream and Aboriginal Australian Englishes spoken in Victoria where the acoustics of vowels are concerned, in particular for formant behaviour as well as showing some sociophonetic variability in terms of gender, regional, and aged-based variability across the samples. This study therefore gives a more precise understanding of how AAE is uniquely different from MAE, even in this clearly acrolectal L1 variety. Having pointed out differences, it is also important to note that another of our findings was that similarities exist as well in vowels produced by AAE and MAE speakers, especially in duration and in the shapes of the overall vowel space and vowel trajectories. This speaks to the fact that both groups are using L1 versions of Australian English that have the same phonologies, but which have experienced different rates of change and internal phonetic variability.
Now that we have an analysis of vowel quality in controlled speech, future work will focus on spontaneous conversational speech by AAE speakers to determine whether and how the patterns observed here are used by speakers in more interactive contexts. Work has begun on the spontaneous speech with respect to how AAE and MAE speakers use disfluencies, with some small but significant differences between the varieties in the way that speakers use unfilled pauses (
McDougall et al. forthcoming) and filled pauses (
Blackwell and McDougall forthcoming). Given previous phonetic work on these communities, and impressionistic observations, we predict further small but significant differences in vowel production in spontaneous speech in the AAE and MAE groups, but this is still to be empirically tested.