Next Article in Journal
On the Absence of Certain Island Effects in Mende
Previous Article in Journal
A Construction Morphology Approach to Neoclassical Compounds and the Function of the Linking Vowel
Previous Article in Special Issue
Australian English Monophthong Change across 50 Years: Static versus Dynamic Measures
 
 
Article
Peer-Review Record

The Targetedness of English Schwa: Evidence from Schwa-Initial Minimal Pairs

Languages 2024, 9(4), 130; https://doi.org/10.3390/languages9040130
by Emily R. Napoli * and Cynthia G. Clopper *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Languages 2024, 9(4), 130; https://doi.org/10.3390/languages9040130
Submission received: 29 November 2023 / Revised: 22 February 2024 / Accepted: 22 March 2024 / Published: 2 April 2024
(This article belongs to the Special Issue An Acoustic Analysis of Vowels)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study examined variation in the acoustic realization of schwa in English, which is sometimes believed to lack a distinct target. Speakers read aloud sentences in which schwa occurred in the determiner a followed by another word or word-initially. To elicit samples of hyperarticulated schwas, sentence materials were varied: items could be a schwa-initial word which happens to have an a + word counterpart (e.g., acute and a cute) or a schwa-initial word without such a counterpart (e.g., effect but *a fect), and items were either predictable or not due to the sematic context. While earlier results were replicated with ‘conventional’ analyses on duration and midpoint first (F1) and second (F2) formant frequency data, growth curve analyses (GCAs) showed the F1 and F2 trajectories of schwa were steeper and more curved word-initially than in the determiner a. Given that schwa’s F1 trajectory was found to be curved both word-initially and in a, it is argued that schwa must be targeted. Furthermore, schwa’s F2 trajectory was found to be curved word-initially but not in a, so it is proposed that there may be degrees of targetedness rather than simply targeted/targetless. All in all, I found this an interesting study. Although there are many descriptive studies on the acoustic properties of English vowels, schwa is not usually included despite it being probably the most frequent English vowel, so this study is welcome. The authors did a nice job at explaining why GCAs were a suitable tool for their research question about targets. The findings reveal that schwa exhibits distinct patterns of spectral change across various phonetic contexts, which more conventional approaches to analyzing vowels would miss. There are a number of issues with the paper as it currently stands, but I believe these are all rectifiable.

 

The notion of ‘targets’ and theories of vowels

In the introduction, nowhere is it stated what is meant by ‘target’ – its meaning may be obvious to the authors, but that may not be the case for readers given that different ideas exist for theorizing vowels. As targetedness is such a fundamental notion in the paper, the authors must provide a definition very early on and provide some background to the theory behind it, so that interested readers can look into it. While the discussion outlines some implications of the findings on ways of theorizing vowels, I would have thought that the theory of vowel inherent spectral change (VISC) would be particularly applicable here – please look this up and consider its relevance. Additionally, do the findings involving acoustic speech patterns provide any implications or expectations about speech/vowel perception?

 

Sentence materials

It wasn’t entirely clear to me why the sentence materials needed to be manipulated for competitor words and semantic predictability – this seems like a separate question from the one about variation due to the position of schwa. Maybe around lines 141-148 the authors could provide some information.

 

Function and content words

It seems the authors are following the parlance of others by using the terms ‘function words’ and ‘content words’. However, function words turn out to be just a single word, namely, the determiner a – why not just refer to a instead of ‘function words’? In this connection, the authors switch to ‘phrase-initial’ and ‘word-initial schwa’ around lines 96-97. Please be consistent throughout when referring to the two contexts.

 

Acoustic measures and statistical analyses

As the authors point out, the GCAs were not able to include item effects. It would be helpful for readers if the authors spelled out whether this was – in the authors' view – a minor or major limitation. How important was it for conclusions about schwa to generalize across words? Or was it more important to demonstrate that the effects were not a quirk of a few speakers saying these particular words? Furthermore, are there alternative ways to conduct GCAs which allow more than one grouping term, such as Bayesian multilevel models? Why were linear scales used for the acoustic response variables? As acoustic information from speech is ultimately most meaningful to human ears, frequency and duration are often (but not always) modeled on non-linear scales approximating human hearing, e.g., Bark, ERB or Mel, and log-transformed time for duration.

 

More specific points

·       Lines 107-108 repeat the sentence which follows – maybe delete the first of the two sentences?

·       Don’t assume readers know what COCA frequencies are (line 281) – please describe.

·       For the duration and midpoint analyses (385-404), provide model summaries as an appendix or in the supplementary materials.

·       The paragraphs on lines 409-415 and lines 476-483 just describe which results were significant results – are these really needed?

·       Discussing the overall linear and quadratic effects in lines 416-422 is misleading because these interacted with other predictors; thus, these effects can only be understood with reference to the interactions. For that reason, it does not make much sense to discuss the overall effects on their own, especially before discussing the interactions.

·       When walking through the quadratic results (lines 427-434), I believe the authors should first mention the significant quadratic x schwa interaction, as this indicates that the difference in the size of deviation from a straight line between word-initial and phrase-initial schwas. The two additional analyses summarized on lines 427-429 should be presented as analyses following up on the interaction. Helpfully, these two analyses show the estimate of the quadratic term was larger for phrase-initial schwa (-81) than for word-initial schwa (-76). Unless I’m misunderstanding something here, I couldn’t follow why on lines 433-434 the authors concluded word-initial schwa displayed a greater degree of deviation from a straight line.

Author Response

Please see the attachment. 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study investigated the nature of the target of schwa, specifically aiming to determine whether the specificity (or existence) of this target is related to the lexical identity of the schwa in question. The question itself is interesting and if analyzed thoroughly might be an interesting addition the literature on the nature of speech targets.

There are a few major issues with this paper. My primary issue with the paper is that the entire experimental analysis rests on an assumption about linearity of formant trajectories. The relationship between articulation and acoustics is complicated and nonlinear. While the idea that schwa may be a "pass-through" state (ie targetless) is a reasonable null hypothesis, I don't see the logic that suggests that a lack of target necessarily implies linear movement in formant space. I think you need a combination of articulatory and acoustic data to compare linearity of movement. While there is evidence that there is greater change in articulation and acoustics (mostly concomitantly) between phones than during target midpoints, I'm not sure it follows that a targetless schwa will show necessarily linear formant trajectories. The preceding phonetic context, especially if labial, can complicates this relationship further.

My second major issue with the paper is the lack of contextualization of the effects, also known as interpretation. The text references effects on F1 value or trajectory, but it does not relate these effects back to the context. For example, "F1 decreased significantly over the course of the duration of schwa" (l. 416). Does this have any bearing on the hypothesis? Did you expect F1 to fall or rise given the phonetic context?

The lack of interpretation is worsened by a confusion between statistical significance and scientific significance. The first paragraph of the results section calls a 3Hz difference in F1 significant. This sounds more like an algorithmic or other kind of difference, not behavioral. Can you explain why a 3Hz difference matters? What does it mean for F1 to be 3Hz lower in biased sentences? Is the human auditory system sensitive to 3 Hz differences in F1? In line 427, what does it mean for the estimate of the quadratic term to be -75? Can you contextualize the units? In the paragraph in lines 446-460, is it the difference itself, or the SIZE of the difference, that is consistent with previous literature? It is not enough to claim that a difference exists.

Other issues:
Procedures: How many times did each speaker utter each sentence? I'm guessing once, given 64 speakers *80 sentences = 5,120 trials; later in the paper you say 237 (5%) were excluded, and 237/.05 = 4,740, which isn't terribly far off from 5,120?

Linear model results are absent! They are partially discussed in-text without effect sizes.

Finally, I wondered whether the length of the schwa could be interacting with the trajectory, especially the slope. Did you consider interactions between duration and slope in a GCA model? Were there differences in duration between a# and #a? Was there a difference in the production of the vowel "to" depending on the type of schwa that followed (thereby affecting coarticulation)?

References: Bakst & Niziolek 2021 is a better reference than their 2019 conference paper as it contains an additional experiment (possibly relevant to you) and a more thorough interpretation.

l. 142: I don't like this "easy/hard" distinction. Easy to--what?
I found this paragraph kind of hard to follow until I read the following paragraph--I now understand why you're making an easy/hard distinction, but I think it'll read a little easier for the reader to follow if you rearrange some of this information.
l. 244 I think you need to explain what cloze data are.
l. 282 Many of your readers will not be familiar with COCA.
l. 306 failed to produce target schwa: how did you determine this? Was the schwa too short or was it totally absent?
l. 317 midpoint of vowel sequence suggests a hypothesis that the preceding vowel and schwa were the same length. Do you have any examples to show?
l. 391 What was the value of the covariate of schwa duration?
l. 402 It's not clear that these schwas are hyperarticulated, but rather are less reduced--I don't think these are the same.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed all the points from my previous review.

Some VERY minor comments:

On several occasions, the authors refer to possible parabola shapes as “narrow” and “wide”. Does “narrow” mean shape of the formant track is deeper/more extreme and “wide” mean the formant track is relatively shallow? Maybe clarify these shapes further in the paragraph starting line 398 and ending line 416.

Something else to consider and not a request. It’s possible that some readers may wish to use the authors’ approach in their future work. Would the authors consider including the model code for R in Supplementary Materials?

Back to TopTop