**Cross-Modal Priming Effect of Rhythm on Visual Word Recognition and Its Relationships to Music Aptitude and Reading Achievement**

#### **Tess S. Fotidzis 1,\*, Heechun Moon 2, Jessica R. Steele <sup>3</sup> and Cyrille L. Magne <sup>3</sup>**


Received: 22 October 2018; Accepted: 28 November 2018; Published: 29 November 2018

**Abstract:** Recent evidence suggests the existence of shared neural resources for rhythm processing in language and music. Such overlaps could be the basis of the facilitating effect of regular musical rhythm on spoken word processing previously reported for typical children and adults, as well as adults with Parkinson's disease and children with developmental language disorders. The present study builds upon these previous findings by examining whether non-linguistic rhythmic priming also influences visual word processing, and the extent to which such cross-modal priming effect of rhythm is related to individual differences in musical aptitude and reading skills. An electroencephalogram (EEG) was recorded while participants listened to a rhythmic tone prime, followed by a visual target word with a stress pattern that either matched or mismatched the rhythmic structure of the auditory prime. Participants were also administered standardized assessments of musical aptitude and reading achievement. Event-related potentials (ERPs) elicited by target words with a mismatching stress pattern showed an increased fronto-central negativity. Additionally, the size of the negative effect correlated with individual differences in musical rhythm aptitude and reading comprehension skills. Results support the existence of shared neurocognitive resources for linguistic and musical rhythm processing, and have important implications for the use of rhythm-based activities for reading interventions.

**Keywords:** implicit prosody; rhythm sensitivity; event related potentials; reading achievement; musical aptitude

#### **1. Introduction**

Music and language are complex cognitive abilities that are universal across human cultures. Both involve the combination of small sound units (e.g., phonemes for speech, and notes for music) which in turn, allow us to generate an unlimited number of utterances or melodies, in accordance with specific linguistic or musical grammatical rules (e.g., [1]). Of specific interest for the present study, is the notion of rhythm. In music, rhythm is marked by the periodic succession of acoustic elements as they unfold over time, and some of these elements may be perceived as stronger than others. Meter is defined as the abstract hierarchical organization of these recurring strong and weak elements that emerge from rhythm. It is this metrical structure that allows listeners to form predictions and anticipations, and in turn dance or clap their hands to the beat of the music [2].

Similarly, in speech, the pattern of stressed (i.e., strong), and unstressed (i.e., weak) syllables occurring at the lexical level contributes to the metrical structure of an utterance. Lexical stress is usually defined as the relative emphasis that one syllable, or several syllables, receive in a word [3].

Stress is typically realized by a combination of increased duration, loudness, and/or pitch change. In many languages, such as English, the salience of the stressed syllable is further reinforced by the fact that many unstressed syllables contain a reduced vowel [4]. Some languages are described as having fixed stress because the location of the stress is predictable. For instance, in French, the stress is usually on the final full syllable [5]. By contrast, several languages are considered to have variable stress because the position of the stress is not predictable. In such languages, like English, stress may serve as a distinctive feature to distinguish noun-verb stress homographs [6]. For example, the word "permit" is stressed on the first syllable when used as a noun, but stressed on the second syllable when used as a verb.

There is increasing support for the existence of rhythmic regularities in English, despite the apparent lack of physical periodicity of the stressed syllables when compared to the rhythmic structure of music (e.g., [7]). During speech production, rhythmic adjustments, such as stress shifts, may take place to avoid stress on adjacent syllables, and these stress shifts may give rise to a more regular alternating pattern of stressed and unstressed syllables [8]. For example, "thirteen" is normally stressed on the second syllable, but the stress can shift to the first syllable when followed by a word with initial stress (e.g., "thirteen people"). These rhythmic adjustments may play a role in speech perception, as suggested by findings showing that sentences with stress shifts are perceived as more natural than sentences with stress clashes, despite that words with shifted stress deviate from their default metrical structure [9].

In music, the Dynamic Attending Theory (DAT) provides a framework in which auditory rhythms are thought to create hierarchical expectancies for the signal as it unfolds over time [10,11]. According to the DAT, distinct neural oscillations entrain to the multiple hierarchical levels of the metrical structure of the auditory signal, and strong metrical positions act as attentional attractors, thus making acoustic events occurring at these strong positions easier to process. Similarly, listeners do not pay equal attention to all parts of the speech stream, and speech rhythm may influence which moments are hierarchically attended to in the speech signal. For instance, detection of a target phoneme was found to be faster if it was embedded in a rhythmically regular sequence of words (i.e., regular time interval between successive stressed syllables), thus suggesting that speech rhythm cues, such as stressed syllables, guide listeners' attention to specific portions of the speech signal [12]. Further evidence suggests that predictions regarding speech rhythm and meter may be crucial for language acquisition [13], speech segmentation [14], word recognition [15], and syntactic parsing [16].

Given the structural similarities between music and language, a large body of literature has documented which neuro-cognitive systems may be shared between language and music (e.g., [7,17,18]), and converging evidence support the idea that musical and linguistic rhythm perception skills partially overlap [19–21]. In line with these findings, several EEG studies revealed a priming effect of musical rhythm on spoken language processing. For instance, listeners showed a more robust neural marker of beat tracking and better comprehension when stressed syllables aligned with strong musical beats in sung sentences [22]. Likewise, EEG findings demonstrated that spoken words were more easily processed when they followed non-linguistic primes with a metrical structure that matched the word metrical structure [23]. A follow-up study using a similar design showed this benefit of rhythm priming on speech processing may be mediated by cross-domain neural phase entrainment [24].

The purpose of the present study was to shed further light on the effect of non-linguistic rhythmic priming on language processing (e.g., [22–24]). We specifically focused on words with a trochaic stress pattern (i.e., a stressed syllable followed by an unstressed syllable) because in the English lexicon, they constitute more than 85% of content words [25]. This high frequency of the trochaic pattern may play a particularly preponderant role in English language development, as infants seem to adopt a metrical segmentation strategy by considering a stress syllable as the beginning of a word in the continuous speech stream [26]. Evidence in support of this important role of the trochaic pattern comes from studies conducted with English speaking infants who develop a preference for the trochaic pattern as early as the age of 6 months [27]. By contrast, the ability to detect words with an iambic

pattern (i.e., an unstressed syllable followed by a stressed syllable) develops later, around 10.5 months, and seems to rely more on using additional sets of linguistic knowledge regarding phonotactic constraints (i.e., the sequences of phonemes that are allowed in a given language), and allophonic cues (i.e., the multiple phonetic variants of a phoneme, whose occurrences depend on their position in a word and their phonetic context), rather than stress cues [13].

The first specific aim was to examine whether the cross-domain rhythmic priming effect is also present when target words are visually presented. To this end, participants were presented with rhythmic auditory prime sequences (either a repeating pattern of long-short or short-long tone pairs), followed by a visual target word with a stress pattern that either matched, or mismatched, the temporal structure of the prime (See Figure 1). Based on previous literature (e.g., [20,23,28]), we predicted that words that do not match the temporal structure of the rhythmic prime would elicit an increased centro-frontal negativity.

**Figure 1.** Rhythmic cross-modal priming experimental paradigm. The auditory prime (long-short or short-long sequence) is followed by a target visual word with a stress pattern that either match or mismatch the prime (Note: stressed syllable is underlined for illustration purposes only).

A second aim of the study was to determine whether such rhythmic priming effect would be related to musical aptitude. Musical aptitude has been associated with enhanced perception of speech cues that are important correlates of rhythm. For instance, individuals with formal musical training better detect violations of word pitch contours [29,30] and syllabic durations [31] than non-musicians. In addition, electrophysiological evidence shows that the size of a negative ERP component elicited by spoken words with an unexpected stress pattern correlates with individual differences in musical rhythm abilities [20]. Thus, in the present study, we expected the amplitude of the negativity elicited by the cross-modal priming effect to correlate with individual scores on a musical aptitude test, if the relationship between musical aptitude and speech rhythm sensitivity transfers to the visual domain.

Finally, the third study aim was to test whether the cross-modal priming effect present in the ERPs correlated with individual differences in reading achievement. Mounting evidence suggests a link between sensitivity to auditory rhythm skills (both linguistic and musical) and reading abilities (e.g., [32–35]). As such, we collected individuals' scores on a college readiness reading achievement test to examine whether the cross-modal ERP effect correlated with individual differences in reading comprehension skills. We expected the amplitude of the negativity elicited by the cross-modal priming effect to correlate with individual scores on the American College Testing (ACT) reading test, if rhythm perception skills relate to reading abilities as suggested by the current literature [32–35].

#### **2. Materials and Methods**

#### *2.1. Participants*

Eighteen first year college students took part in the experiment (14 females and 4 males, mean age = 19.5, age range: 18–22). All were right-handed, native English speakers with less than two years of formal musical training. None of the participants were enrolled in a Music major. The study was approved by the Institutional Review Board at Middle Tennessee State University, and written consent was obtained from the participants prior to the start of the experiment.

#### *2.2. Standardized Measures*

The Advanced Measures of Music Audiation (AMMA; [36]) was used to assess participants' musical aptitude. The AMMA has been used previously to measure the correlation between musical aptitude and index of brain activities (e.g., [20,37–39]). This measure was nationally standardized with a normed sample of 5336 U.S. students and offers percentile ranked norms for both music and non-music majors. Participants were presented with 30 pairs of melodies and asked to determine whether the two melodies of each pair were the same, tonally different, or rhythmically different. The AMMA provides separate scores for rhythmic and tonal abilities. For non-Music majors, reliability scores are 0.80 for the tonal score and 0.81 for the rhythm score [36].

The reading scores on the ACT exam were used to examine the relationship between reading comprehension and speech rhythm sensitivity. The ACT reading section is a standardized achievement test that comprises short passages from four categories (prose fiction, social science, humanities, and natural science) and 40 multiple-choice questions that test the reader's comprehension of the passages. Scores range between 1 and 36. The test was administered and scored by the non-profit organization of the same name (ACT, Inc., Iowa City, IA, USA) using a paper and pencil format.

#### *2.3. EEG Cross-Modal Priming Paradigm*

Prime sequences consisted of a rhythmic tone pattern of either a long-short or short-long structure repeated three times. The tones consisted of a 500 Hz sine wave with a 10 ms rise/fall, and a duration of either 200 ms (long) or 100 ms (short). In long-short sequences, the long tone and short tone were separated by a silence of 100 ms, and each of the three successive long-short tone pairs was followed by a silence of 200 ms. In short-long sequences, the short tone and long tone were separated by a silence of 50 ms, and each of the three successive short-long tone pairs was followed by a silence of 250 ms. Because previous research has shown that native speakers of English have a cultural bias toward grouping a sequence of tones differing in duration, into short-long patterns [40,41], a series of behavioral pilot experiments were conducted with different iterations of the tone sequences to determine which parameters would provide consistent perception of either long-short or short-long patterns.

Visual targets were composed of 140 English real-word bisyllabic nouns and 140 pseudowords, which were all selected from the database of the English Lexicon Project [42]. The lexical frequency of all the words was controlled using the log HAL frequency [43]. The mean log HAL frequency for each set of stress patterns was 10.28 (SD = 0.98) for trochaic sequences and 10.28 (SD = 0.97) for iambic sequences. Pseudowords were matched to the real words in terms of syllable count and word length and were used only for the purpose of the lexical decision task. Half of the real words (*N* = 70) had a trochaic stress pattern (i.e., stressed on the first syllable, for example, "basket"). The other half consisted of fillers with an iambic stress pattern (i.e., stressed on the second syllable, for example, "guitar").

Short-long and long-short prime sequences were combined with the visual target words to create two experimental conditions in which the stress pattern of the target word either matched or mismatched the rhythm of the auditory prime.

We chose to analyze only the ERPs elicited by trochaic words for several reasons. First, trochaic words comprise the predominant stress pattern in English (85–90% of spoken English words according to [34]), and consequently, participants will likely be familiar with their pronunciation. Second, because stressed syllables correspond to word onset in trochaic words, this introduces fewer temporal jitters than for iambic words when computing ERPs across trials. This scenario is particularly problematic for iambic words during silent reading, because there is no direct way to measure when participants read the second syllable. Third, participants were recruited from a university located in the southeastern region of the United States, and either originated from this area, or have been living in the area for several years. It is well documented that the Southern American English dialect tends to place stress on the first syllable of many iambic words despite that these types of words are stressed on the second syllable in standard American English (e.g., [44]). As such, rhythmic expectations are less clear to predict for iambic words.

#### *2.4. Procedure*

Participants' musical aptitude was first measured using the AMMA [36]. Following administration of the AMMA test, participants were seated in a soundproofed and electrically shielded room. Auditory prime sequences were presented through headphones, and target stimuli were visually presented on a computer screen placed at approximately 3 feet in front of the participant. Words and pseudowords were written in black lowercase characters on a white background. No visual cue was provided to the participant regarding the location of the stress syllables in the target words. Stimulus presentation was controlled using the software E-prime 2.0 Professional with Network Timing Protocol (Psychology software tools, Inc., Pittsburgh, PA, USA). Participants were presented with 5 blocks of 56 stimuli. The trials were randomized within each block, and the order of the blocks was counterbalanced across participants. Each trial was introduced by a fixation cross displayed at the center of a computer screen that remained until 2 s after the onset of the visual target word. Participants were asked to silently read the target word and to press one button if they thought it was a real English word, or another button if they thought it was a nonword. The entire experimental session lasted 1.5 h.

#### *2.5. EEG Acquisition and Preprocessing*

EEG was recorded continuously from 128 Ag/AgCL electrodes embedded in sponges in a Hydrocel Geodesic Sensor Net (EGI, Eugene, OR, USA) placed on the scalp, connected to a NetAmps 300 amplifier, and using a MacPro computer. Electrode impedances were kept below 50 kΩ. Data was referenced online to Cz and re-referenced offline to the averaged mastoids. In order to detect the blinks and vertical eye movements, the vertical and horizontal electrooculograms (EOG) were also recorded. The EEG and EOG were digitized at a sampling rate of 500 Hz. EEG preprocessing was carried out with NetStation Viewer and Waveform tools. The EEG was first filtered with a bandpass of 0.1 to 30 Hz. Data time-locked to the onset of trochaic target words was then segmented into epochs of 1100 ms, starting with a 100 ms prior to the word onset and continuing 1000 ms post-word-onset. Trials containing movements, ocular artifacts, or amplifier saturation were discarded. ERPs were computed separately for each participant and each condition by averaging together the artifact-free EEG segments relative to a 100 ms pre-baseline.

#### *2.6. Data Analysis*

Statistical analyses were performed using MATLAB and the FieldTrip open source toolbox [45]. A planned comparison between the ERPs elicited by mismatching trochaic words and matching trochaic words was performed using a cluster-based permutation approach. This non-parametric data-driven approach does not require the specification of any latency range or region of interest a priori, while also offering a solution to the problem of multiple comparisons (see [46]).

To relate the ERP results to the behavioral measures (i.e., musical aptitude and reading comprehension), an index of sensitivity to speech rhythm cues was first calculated from the ERPs using the mean of the significant amplitude differences between ERPs elicited by matching and mismatching trochaic words at each channels, and time points belonging to the resulting clusters (see [20,47] for similar approaches). Pearson correlations were then tested between the ERP cluster mean difference and the participants' scores on the AMMA and ACT reading section, respectively. A multiple regression was also computed with the ERP cluster mean difference as the outcome measure, and the AMMA Rhythm scores and ACT Reading scores as the predictor variables.

#### **3. Results**

#### *3.1. Metrical Expectancy*

Overall, participants performed well on the lexical decision task, as suggested by the mean accuracy rate (*M* = 98.82%, SD = 0.85). A paired samples *t*-test was computed to compare accuracy rates for real target words in the matching (*M* = 99.83%, SD = 0.70), and mismatching (*M* = 99.42%, SD = 1.40) rhythm conditions. No statistically significant differences were found between the two conditions, *t* (35) = 1.54, *p* = 0.13, two-tailed.

Analyses of the ERP data revealed that target trochaic words that mismatched the rhythmic prime elicited a significantly larger negativity from 300 to 708 ms over a centro-frontal cluster of electrodes (*p* < 0.001, See Figure 2).

**Figure 2.** Rhythmic priming Event-related potential (ERP) effect. Grand-average event-related potentials (ERPs) recorded for matching (purple), and mismatching (green) trochaic target words, averaged for the significant group of channels in the cluster. The latency range of the significant clusters is indicated in blue. (Note: Negative amplitude values are plotted upward. The topographic map shows the mean differences in scalp amplitudes in the latency range of the significant clusters. Electrodes belonging to the cluster are indicated with a black dot).

#### *3.2. Brain-Behavior Relationships*

The negative ERP cluster mean difference was statistically significantly positively correlated with the AMMA Rhythm scores (*r* = 0.74, *p* < 0.001; see Figure 3A) and the ACT Reading scores (*r* = 0.60, *p* = 0.009; see Figure 3B). A statistically significant positive correlation was also found between the AMMA Rhythm scores and ACT Reading scores (*r* = 0.55, *p* = 0.016; see Figure 3C). By contrast, no statistically significant correlation was found between the AMMA Tonal scores and the negative ERP cluster mean difference (*r* = 0.30, *p* = 0.23) or the ACT Reading scores (*r* = 0.09, *p* = 0.70). The maximum Cook's distance for the reported correlations indicated no undue influence of any data point on the fitted models (max Cook's *d* < 0.5).

**Figure 3.** Brain-behavior correlations. (**A**) Correlation between speech rhythm sensitivity (as indexed by the negative ERP cluster mean difference) and musical rhythm aptitude; (**B**) correlation between speech rhythm sensitivity and reading comprehension; (**C**) correlation between musical rhythm aptitude and reading comprehension. (Note: The solid line represents a linear fit.)

A multiple regression was conducted to investigate whether AMMA Rhythm scores and ACT Reading scores predicted the size of the negative ERP cluster mean difference. Table 1 summarizes the analysis results. The regression model explained 59.9% of the variance and was a statistically significant predictor of the negative ERP cluster mean difference (*R*<sup>2</sup> = 0.599, *F* (2,15) = 11.2, *p* = 0.001). As can be seen in Table 1, AMMA Rhythm scores statistically significantly contributed to the model (*β* = 0.594, *t* (15) = 3.023, *p* = 0.009), but ACT Reading scores did not (*β* = 0.267, *t* (15) = 1.359, *p* = 0.194). The final predictive model was: Negative ERP Cluster Mean Difference = (0.281 × AMMA Rhythm) + (0.081 × ACT Reading) + 8.000.

**Table 1.** Multiple regression coefficients.1


<sup>1</sup> Outcome: Negative ERP cluster mean difference; *B*: unstandardized coefficient; *SE*: standard error; *β*: standardized coefficient; *t*: *t*-value; *p*: *p*-value; ACT: American College Testing; AMMA: Advanced Measures of Music Audiation.

#### **4. Discussion**

The current study aimed to examine the cross-modal priming effect of non-linguistic auditory rhythm on written word processing and investigate whether such effect would relate to individual differences in musical aptitude and reading comprehension. As hypothesized, trochaic target words that did not match the rhythmic structure of the auditory prime were associated with an increased negativity over the centro-frontal part of the scalp. This finding is in line with previous ERP studies on speech rhythm and meter [6,15,20,28,31,48–50]. It has been generally proposed that this negative effect either reflects an increased N400 [15,49], or a domain-general rule-based error-detection mechanism [6,20,28,31,51,52]. The fact that similar negative effects have been reported in response to metric deviations in tone sequences (e.g., [53,54]) further supports the latter interpretation.

While the aforementioned studies were conducted either in the linguistic or musical domain, the negative effect observed for mismatching target words was generated by non-linguistic prime sequences in the present experiment. Cason and Schön [23] previously reported a cross-domain priming effect of music on speech processing, which was reflected by a similar increased negativity when the metrical structure of the spoken target word did not match the rhythmic structure of the musical prime. Several other findings have since shown that temporal expectancies generated by rhythmically regular non-linguistic primes can facilitate spoken language processing in typical adults (e.g., [24,55]), and children [56,57], as well as adults with Parkinson's disease [58], children with cochlear implants [59], and children with language disorders [60]. This beneficial effect may stem from the regular rhythmic structure of the prime, which provides temporally predictable cues to which internal neural oscillators can anchor [24]. The present findings support and extend this line of research by showing this negativity is elicited even when the target words were visually presented, thus suggesting that non-linguistic rhythm can not only induce metrical expectations across distinct cognitive domains, but also across different sensory modalities [61]. These findings also provide additional evidence in favor of the view that rhythm/meter processing relies on a domain-general neural system that is not specific to language [19,21,22].

We further investigated whether this cross-modal priming effect was related to individual differences in musical aptitude. Interestingly, our results showed a statistically significant correlation between the size of the brain response elicited by unexpected stress patterns and the AMMA rhythm subscore, but not the tonal subscore. In addition, musical rhythm aptitude was a statistically significant predictor of speech rhythm sensitivity, even after controlling for reading comprehension skills. This is in line with previous ERP studies showing that adult musicians performed better than non-musicians at detecting words pronounced with an incorrect stress pattern [31]. In addition, this enhanced sensitivity to speech meter was associated with larger electrophysiological responses to incorrectly pronounced words, which was interpreted as reflecting more efficient early auditory processing of the temporal properties of speech.

Robust associations have also been found between musical rhythm skills and speech prosody perception, even after controlling for years of music education [19]. Noteworthy for the present experiment, individual differences in brain sensitivity to speech rhythm variations can be explained by variance in musical rhythm aptitude in individuals with less than two years of musical training. For instance, in a recent experiment [20], participants' musical aptitude was assessed using the same standardized measure of musical abilities (i.e., AMMA) as in the present study. Participants listened to sequences consisting of four bisyllabic words for which the stress pattern of the final word either matched or mismatched the stress pattern of the preceding words. Words with a mismatching stress pattern elicited an increased negative ERP component with the same scalp distribution and latency as the one found in the current data. More importantly, participants' musical rhythm aptitude statistically significantly correlated with the size of the negative effect. Thus, in light of the aforementioned literature, the present results confirm and extend previous data suggesting a possible transfer of learning between the musical and linguistic domains (See [62] for a review).

Adding to the growing literature showing a relationship between sensitivity to speech rhythm and reading skills, our results revealed a statistically significant positive correlation between the scores on the ACT reading subtest and the size of the negative ERP effect elicited by mismatching stress patterns. Previous studies have mainly focused on typically developing young readers using several novel speech rhythm tasks in conjunction with standardized measures of reading abilities, and results consistently showed a correlation between performances on the speech rhythm tasks and individual differences in word reading skills [63–66]. It has been proposed that early sensitivity to speech rhythm cues may contribute to the development of phonological representations [32]. However, sensitivity to speech rhythm cues still explains unique variance in word reading skills after controlling for phonological processing skills [67], thus suggesting that it also makes a significant contribution to reading development independently of phonological awareness.

More directly related to the present study, research with older readers and adults suggests that knowledge of the prosodic structure of words continues to play a role in skilled reading. For instance, visual word recognition is facilitated when primed by word fragments with a matching stress pattern [68,69]. Two other studies conducted on typical adults focused on lexical stress perception in isolated multisyllabic words [70,71], and found a significant relationship with reading comprehension. Likewise, adult struggling readers usually show lower performance than their typical peers on tasks measuring perception of word stress patterns or auditory rhythms [72–75] (but see [74,76]).

Interestingly, the finding that reading comprehension was not a statistically significant contributor to speech rhythm sensitivity after controlling for musical rhythm aptitude supports the Temporal Sampling Framework (TSF) proposed by Goswami [32]. According to the TSF, the link between speech rhythm sensitivity and reading skills is mediated by domain-general neurocognitive mechanisms for processing acoustic information carrying rhythmic cues. In line with this interpretation, we found a statistically significant correlation between the AMMA rhythm scores and reading achievement scores.

The OPERA (overlap, precision, emotion, repetition, attention) hypothesis formulated by Patel [77,78] further provides a potential explanation of music-training driven plasticity in brain networks involved in language. OPERA offers a set of five optimal conditions that must be met for music training to drive plasticity: (1) music and language have overlapping anatomical substrates; (2) music activities require a greater level of precision compared to language; (3) music activities evoke strong emotions; (4) music training involves repeated practice; (5) music activities require sustained attention. In line with this framework, the Precise Auditory Timing Hypothesis (PATH) proposed by Tierney and Kraus [79] predicts that music programs that focus on rhythm activities, with an emphasis on entrainment and timing, will be more effective in improving reading-related skills, such as phonological processing skills, because there are overlaps between language and music networks processing rhythmic information, and music requires a higher level of auditory-motor timing precision than language. OPERA and PATH thus provide compelling explanations for the significant relationships we report here between musical rhythm aptitude, speech rhythm sensitivity, and reading achievement. While our present study was correlational (and conducted with non-musicians), data from recent longitudinal studies using randomized controlled trials indeed show promising results of rhythm-based intervention for the development of language skills in children with reading disorders [80], and typical peers [81].

Finally, the fact that we found a "metrical" negativity to visual targets, despite that participants were not allowed to sound out the words, further supports theories proposing that information about the metrical structure of a word is part of its lexical representation and automatically retrieved during silent reading [82,83]. This idea is in line with the Implicit Prosody Hypothesis (IPH) originally proposed by Fodor [84]. The IPH is closely related to the concept of verbal imagery or inner voice, which can be found in the literature throughout the 20th century [82]. According to the IPH, readers create a mental representation of the prosodic structure of the text while they are silently reading. Several studies have provided compelling evidence in support for the IPH, especially regarding lexical stress. For instance, eye-tracking studies showed that readers had longer reading times and more eye fixations for four-syllable words with two stressed syllables, than for one stressed syllable [85], and that expectations generated by the stress pattern of successive words may affect early stages of syntactic analysis of upcoming words in written sentences [82,86]. Taken together, these results and the present data provide compelling evidence for a role of prosodic representations regarding a word stress pattern during silent reading.

One potential limitation of the current research is the use of ACT reading scores, which may not be fully representative of the participants' reading skills. In particular, phonemic awareness, decoding, and fluency, which are components known to greatly contribute to reading comprehension [87], cannot be teased apart in the ACT reading subsets. Future research using a more comprehensive battery of language and reading assessments would better allow a more complete understanding of which reading components are more closely related to speech rhythm perception skills.

#### **5. Conclusions**

The present data confirm and extend previous studies showing facilitating effects of a regular non-linguistic rhythm on spoken language processing (e.g., [23,55,59]), by demonstrating this to also be the case for written language processing. We propose that this cross-modal effect of rhythm is mediated by the automatic retrieval of the word metrical structure (i.e., implicit prosody) during silent reading (i.e., implicit prosody generated through verbal imagery). Finally, because we found that the negativity associated with this cross-modal priming effect of rhythm correlated with individual differences in musical aptitude and reading achievement, this further supports the potential clinical and education implications of using rhythm-based intervention for populations with language or learning disabilities.

**Author Contributions:** T.S.F. collected the data and wrote the paper. H.M. collected and analyzed the data. J.R.S. wrote and edited the paper. C.L.M. conceived the idea, designed the experiments, and wrote the paper.

**Funding:** This study was funded by NSF Grant # BCS-1261460 awarded to Cyrille Magne and by the MTSU Foundation.

**Conflicts of Interest:** The authors declare no conflict of interest. The funding sources had no role in study design; in the collection, analysis and interpretation of data; in the writing of the report; nor in the decision to submit the article for publication.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Impaired Recognition of Metrical and Syntactic Boundaries in Children with Developmental Language Disorders**

#### **Susan Richards \* and Usha Goswami**

Centre for Neuroscience in Education, University of Cambridge, Cambridge CB2 3EB, UK; ucg10@cam.ac.uk (U.G.)

**\*** Correspondence: susan.richards@cantab.net; Tel.: +44-1223-333550

Received: 19 December 2018; Accepted: 31 January 2019; Published: 5 February 2019

**Abstract:** In oral language, syntactic structure is cued in part by phrasal metrical hierarchies of acoustic stress patterns. For example, many children's texts use prosodic phrasing comprising tightly integrated hierarchies of metre and syntax to highlight the phonological and syntactic structure of language. Children with developmental language disorders (DLDs) are relatively insensitive to acoustic stress. Here, we disrupted the coincidence of metrical and syntactic boundaries as cued by stress patterns in children's texts so that metrical and/or syntactic phrasing conflicted. We tested three groups of children: children with DLD, age-matched typically developing controls (AMC) and younger language-matched controls (YLC). Children with DLDs and younger, language-matched controls were poor at spotting both metrical and syntactic disruptions. The data are interpreted within a prosodic phrasing hypothesis of DLD based on impaired acoustic processing of speech rhythm.

**Keywords:** language disorder; rhythm; prosody

#### **1. Introduction**

Children with Developmental Language Disorder (DLD) have persistent difficulties with learning language that are not associated with a known condition, such as sensori-neural hearing loss or Autism Spectrum Disorder [1]. Prevalence of the disorder is estimated at approximately 7% in primary school populations [2–4], and children with DLD can face a variety of challenges in accessing education and employment. Children with DLD typically have difficulty with the accurate processing and production of grammatical structures in speech [5–7].

Although the implications of having DLD are well-documented, and DLD is found across languages, the underlying causes are as yet unclear. A range of perceptual and cognitive hypotheses have been proposed, including impaired rapid auditory processing [8], impaired phonological memory [9] and genetically determined grammatical deficits [7]. One aspect of language processing that has not attracted significant research attention is the processing of language rhythm. The concept of language rhythm is not consistently defined in the literature and has often been regarded as a purely temporal phenomenon [10]. Others, however, have conceptualised linguistic rhythm in terms of the patterning of syllable prominence, with some syllables being acoustically more prominent than others [11]. Regarding the rhythm of spoken English, syllable prominence can be thought of in terms of strong or stressed syllables (the more prominent) and weak or unstressed syllables (the less prominent). For example, in the word baNAna, the second syllable 'NA' is more prominent than the first syllable 'ba' and third syllable'na'. Accordingly, this word has a weak-strong-weak rhythmic structure, with the second syllable 'NA' carrying the primary stress. The patterning of strong and weak syllables across words, phrases and sentences thus contributes to the perception of language rhythm in addition to temporal factors. These patterns, made up of strong and weak syllables, can

be grouped hierarchically into larger units via prosodic feet, in which one or more weak syllables is grouped with a strong syllable to form a temporal unit. This concept is a familiar one in certain kinds of poetry, in which patterns of recurring syllable rhythms are grouped to fit a higher-order temporal structure, for example via trochees (strong-weak syllable groupings repeating) or dactyls (strong-weak-weak groupings repeating). The pattern of groupings of strong and weak syllables into temporal units is commonly referred to as 'metre'. For the English nursery rhyme 'Jack and Jill went up the hill', a perfect metrical poem, a trochaic rhythm is used, whereas for the nursery rhyme 'Pussycat pussycat where have you been?' a dactyl structure is repeated.

There are sound theoretical reasons for regarding efficient rhythmic processing as a key foundation skill for language development, making the study of the potential impacts of an early difficulty with efficient rhythmic processing of interest in attempting to understand developmental language disorders. Infants are exposed to the rhythmic aspects of language before birth and as newborns are able to use rhythmic properties to differentiate between languages [12,13]. Rhythmic sensitivity is accordingly considered a precursor of language acquisition, with the earliest representations of the speech signal encoding its rhythmic structure. Subsequent aspects of language, such as semantics and syntax, may be scaffolded onto these rhythmic representations [14]. Infants have been shown to use rhythmic aspects of language to establish structured linguistic representations at the level of word boundaries [15], lexical representations [16,17] and larger-grained grammatical units, such as phrases and clauses [18,19]. If rhythm is able to act in a bootstrapping role for subsequent language, then a difficulty in processing rhythm at the earliest stages of development could have a significant impact on the child's subsequent trajectory of language development. In this study, we investigate the potential impact that a rhythmic processing difficulty might have at the interface of rhythm and syntactic structure. This is of particular interest since children with DLD typically present with difficulties in the accurate processing and production of linguistic syntax [20], and are known to have difficulties in processing linguistic stress patterns [21]. Accordingly, it is possible that difficulties in processing prosodic phrasing and prosodic hierarchies dependent on stress patterning may underlie their syntactic difficulties [22].

Impaired auditory sensory processing skills in children with DLD appear to contribute to their impaired processing of syllable stress patterns [21]. Four key acoustic parameters contribute to the perception of stress: frequency, intensity, duration and amplitude envelope rise time (AERT) [23]. Stressed syllables tend to be of a higher frequency than unstressed syllables, have longer durations and are of a higher intensity than unstressed syllables [23]. The fourth parameter, AERT, refers to the length of time between the between the onset of a sound and the point at which its amplitude reaches peak intensity. In speech, this is typically measured as the rise in amplitude from the beginning of a syllable until the speaker reaches the peak of the syllable nucleus (vowel). Stressed syllables have larger rise times, with a greater change in amplitude until the amplitude peaks at the syllable nucleus, whilst unstressed syllables have smaller changes in amplitude before the peak of the nucleus is reached. In order to speak deliberately to a rhythm, the speaker times their production of the rise times of the vowels in each stressed syllable. Children's sensory processing of frequency, duration, intensity and AERT are therefore likely to be central to their ability to differentiate syllable stress patterns and prosodic hierarchies.

Research into the frequency sensitivity of children with DLD has produced mixed results, with some cohorts of language-impaired children being found to have reduced frequency discrimination skills [22,24], whilst other groups have not differed from age-matched controls [21,25]. Duration discrimination has reliably been found to be poorer in children with DLD [21,22,26], whilst tests of intensity discrimination have found no difference between children with DLD and age-matched controls [24,26]. Several studies have shown impaired discrimination of AERT in children with DLD [21,22,26,27], leading to our first investigations into impaired speech rhythm in DLD [26]. Children with DLD also have difficulties with non-linguistic aspects of rhythmic processing. For example, Corriveau and Goswami [28] asked children with DLD to tap to a metronome beat and found that they were considerably poorer at synchronising their taps with the metronome than either

age-matched or language-matched control children at rates of 2 Hz and 1.5 Hz (rates that broadly correspond to typical inter-stress intervals found in speech, [29]). A widely-studied family, known as the KE family, some members of whom display a hereditary form of DLD characterised by articulation difficulties, have also been tested with tasks measuring sensitivity to non-speech pitch and rhythm. Affected members performed more poorly on tests of rhythmic perception and production, indicating a level of rhythmic difficulty for those who also displayed language difficulties [30]. Indeed, tapping to a beat is also impaired in children who stutter [31].

Regarding relations with linguistic processing, in their study of children with DLD, Cumming et al. [32] reported that individual differences in a speech rhythm matching task and a musical beat perception task were significant predictors of children's scores in standardised measures of receptive and expressive language development. Those children with DLD who had better rhythm matching or better musical beat perception had better language scores than those with poorer rhythmic skills. Both Corriveau and Goswami [28] and Cumming et al. [32] reported that individual differences in beat synchronisation contributed unique variance to measures of language and literacy. Weinert [33] also linked rhythmic processing with language ability, finding that children with DLD who did more poorly in a rhythm discrimination task were also poorer at learning an artificial language. Finally, links between rhythmic processing and language skills have also been reported for typically developing children. Gordon et al. [34] found that performance in a test of rhythm discrimination correlated significantly with scores in expressive morpho-syntax in 6-year-old children with no language impairments, accounting for 48% of variance in scores. Accordingly, proficiency in rhythmic processing may be linked to better syntactic skills across the ability range.

One plausible reason for a relationship between rhythm and syntax could be that children with better rhythmic skills may be better at exploiting prosodic phrasing in order to bootstrap language learning [21,22,32]. There is evidence that prosodic phrasing contains cues to syntactic structure, and that both adult language-listeners and infant language-learners make use of these cues in order to parse the speech stream and comprehend language. For example, Price, Ostendorf, Shattuck-Hufnagel and Fong [35] found that adult listeners were able to disambiguate between two possible syntactic parsings of phonologically identical sentences by using prosodic features, such as intonational phrase boundaries and size of prosodic breaks (duration of pauses). Infant experiments have employed preference paradigms, in which infants aged between 7 and 10 months are played recordings with pauses inserted either clause-finally (i.e., coinciding with a syntactic boundary) or mid-clause [18]. The infants demonstrated a preference for stimuli where the pauses were clause-final. A similar preference was also demonstrated by 9-month-old infants for phrase-final pauses [19]. This indicates that, before the end of their first year, infants are already sensitive to the typical coincidence of prosodic and syntactic cues found in the language environment. Jusczyk et al. describe this use of prosodic cues as a 'perceptual precategorisation' [19] (p. 287), thought to enable a more detailed analysis of each resulting perceptual grouping. By aligning the segmentation of perceptual groups with meaningful grammatical units, this precategorisation process would serve perceptually to delimit alternatives, effectively chunking the continuous incoming speech stream and consequently enabling a more nuanced grammatical analysis to take place. By this means, efficient processing of the prosodic structure of speech could pave the way for efficient learning of syntactic organisation.

Whilst much research has been conducted on the nature of the grammatical deficit in DLD, little attention has been paid to the role that prosodic factors may play in the development of grammatical competence, and hence to the role that a difficulty with processing prosodic phrasing might have in the trajectory of the disorder. However, the infant work outlined above indicates that prosodic processing of rhythm patterns may lay the foundations for the discovery of grammatical units at an early stage of language development. In line with this perspective, Demuth has demonstrated that young typically developing children will vary their production of grammatical morphemes depending on the prosodic context. Accordingly, she has argued for a 'Prosodic Licensing' approach to syntactic development, in which the prosodic structure of a given language and the location of a particular grammatical morpheme in the prosodic contexts afforded by that language will interact to 'license' the use of particular morphemes by the young child [36,37]. Demuth and Tomas [37] argued that an understanding of how prosodic phonology operated to support morphological development in typical development could help to illuminate morpho-syntactic errors by children with DLD. Given our perceptual studies showing that children with DLD have difficulties in processing both speech and non-speech rhythm [21,22,32], children with DLD may also have difficulties in processing the rhythmic aspects of speech that can facilitate the overall acquisition of grammatical structure. If so, this could provide an acoustic, stimulus-driven account of the grammatical difficulties that typify the receptive and expressive language of children with DLD.

The current investigation explores children's sensitivity to prosodic phrasing as a cue to the parsing of the speech stream into smaller, more manageable, grammatical units. Whilst prosodic and syntactic structures do not always coincide in natural speech, there is nonetheless a core area of children's typical language exposure in which the two levels are tightly integrated, namely the realm of children's oral and textual culture. Children's stories and nursery routines draw heavily on rhythmic devices to structure language, as aspects of children's linguistic life, such as nursery rhymes and clapping games, depend on the integration of repetitive language and repetitive rhythm. A further aspect of a typical child's linguistic environment is children's literature, which frequently relies heavily upon rhythm and rhyme. Many successful children's authors build upon the playfulness of oral rhymes, with writing characterised by strong, repetitive rhyme and rhythm frameworks. We hypothesised that the predominance of rhythm and rhyme in these texts may serve a scaffolding function in developing children's awareness of prosodic-syntactic units. Accordingly, we selected a representative story by former UK children's laureate Julia Donaldson called '*Room on the Broom*': a story with a strong rhythmic format [38].

The rhythmic format of *Room on the Broom* creates a tight integration of prosody and syntax and hence contains rich structural cues to grammar. The child is exposed to cues at multiple hierarchical layers, drawing their attention simultaneously to the phonological, prosodic and syntactic structure of the language. The property of rhyme emphasises the phonological structure of words by drawing attention to the onset-rime division, whilst also providing a guide to linguistic structure since each rhyme occurs at the end of a syntactic unit (be that clause or phrase). The overarching metrical structure also draws attention to the rhyme boundary point, since it occurs at regular intervals every four metrical feet. Within that metrical structure, there are further subdivisions into pairs of metrical feet, each of which also generally represents a complete syntactic unit. The metrical structure is therefore not an arbitrary form superimposed on the syntax of the text, but the two structures form a rich and highly integrated input which serves to highlight and reinforce the rhythmic and syntactic properties of language.

An illustration is provided as Figure 1, which decomposes the structural embedding in the opening sentence of this popular children's book. The figure marks out the major syntactic structures (shown above the text in green) and the major prosodic structures (shown below the text in blue). The figure shows that the major groupings in the syntactic structure are mirrored by major prosodic boundaries (the dashed red lines) in the prosodic structure. The prosodic boundaries are hierarchically nested such that the larger the prosodic-syntactic unit, the greater the overlap of boundary cues. Accordingly, the end of each rhyme line represents the combined boundary of four different levels of metrical analysis, as well as the boundary of a major syntactic unit. The prosodic structure is built around the stressed syllables, which serve to demarcate the end of a metrical foot (predominantly anapaest; i.e., weak-weak-Strong (wwS)). The symmetry is not faultless, as can be seen from 'a very tall hat', in which the lexical word 'very' crosses the boundary of the metrical foot; however, for the majority of the couplet, there is a strong coincidence of prosodic and syntactic boundaries. Given this level of dovetailing between the prosodic and syntactic structures, our study aimed to measure to what extent the children with DLD were able to integrate these two systems of representation.

**Figure 1.** A diagram to illustrate the syntactic and prosodic structure of a line from Room on the Broom. Abbreviations: d-determiner; h:n-noun, head of noun phrase; v-verb; m:int-modifier:intensifier; m:adj-modifier:adjective; h:pron-pronoun, head of noun phrase; h:prep-preposition, head of prepositional phrase; c-conjunction, q-qualifier; Cl-clause; S-subject; O-object; A-adverb; NP-noun phrase; VP-verb phrase; PP-prepositional phrase.

#### **2. Materials and Methods**

#### *2.1. Participants*

Fifty-nine (59) children took part in the study. Children were divided into three groups: 13 had developmental language disorder (DLD group) Mean (*M*) age 102 months, range 77–140; 24 were age-matched typically developing controls (AMC group) *M* 107 months, range 77–132; and 22 were younger, language-matched controls (YLC group) *M* 66 months, range 57–74. All of the children were attending mainstream schools across state and private sectors in the East of England.

Children with DLD were recruited via their schools by asking teachers to nominate pupils whom they considered displayed difficulties with language. Those children identified by their teachers then completed four standardised language tests, the British Picture Vocabulary Scales-2nd Edition (BPVS II) [39], and three subtests of the Clinical Evaluation of Language Fundamentals UK-3rd Edition (CELF3UK) [40]: the Recalling Sentences, Concepts & Directions and Formulated Sentences subtests. Children who scored at or below −1.33 SD on at least two of the four tests went on to be included in the DLD group. Age-matched children (AMC group) were largely recruited from the same schools as the children with DLD and also completed the four standardised language tests. Only children scoring higher than −1 SD on all four tests were included in the study as part of the AMC group. The younger children (YLC group) all attended a single school who agreed to take part for this purpose. Children in the YLC group completed the BPVS II and the CELF3UK Recalling Sentences subtest only. All children also completed the Block Design, Picture Completion and Digit Span subtests of the Wechsler Intelligence Scale for Children 3rd Edition (WISC III) [41] as measures of phonological memory and non-verbal intelligence quotient (IQ). Results of the standardised tests are displayed in Table 1.

As different groups did different tasks, one-way ANOVAs by group or independent samples t-tests were used to assess group differences. The matching was confirmed as the DLD group did not differ significantly from the AMC group on Age (months) (*p* = 0.675), whilst both the DLD and AMC groups were significantly older than the YLC group (*p* = 0.000). The DLD and YLC groups did not differ significantly from each other on measures of language (Recalling Sentences *p* = 0.434; BPVS II *p* = 0.641), whilst both groups were significantly different from the AMC group (*p* = 0.000). The DLD group were also significantly different from the AMC group on the additional language measures of Formulated Sentences and Concepts & Directions (*p* = 0.000).

For the IQ measures, the DLD group scored within one standard deviation of the standardised mean for both tasks, indicating that their non-verbal IQ was within typical norms; however, their scores as a group were nonetheless significantly lower than those of the AMC group (Picture Completion *p* = 0.017; Block Design *p* = 0.014). The DLD group also had a significantly lower Digit Span score than the AMC group (*p* = 0.000).


**Table 1.** Results of standardized tests by group (Language: raw scores; intelligence quotient (IQ): scaled scores): one-way ANOVAs and independent samples *t*-tests.

<sup>a</sup> Aged-matched children (AMC) > Developmental Language Disorder (DLD); <sup>b</sup> AMC > younger, language-matched control (YLC); <sup>c</sup> DLD > YLC; <sup>d</sup> adjusted *F* and *df* used due to significant Levene's test; <sup>e</sup> IQ subtests are scaled scores: *M* = 10, SD = 3. BPVSII, British Picture Vocabulary Scales-2nd Edition.

#### *2.2. Materials*

The aim of the experimental task was to investigate whether children were sensitive to the coincident boundaries of prosodic and syntactic units as exemplified in the rhythmic texts that form a central part of children's literature. The rhyming couplets in the chosen text consisted of two lines, each of which contained four stressed syllables (in capitals):

#### *the WITCH had a CAT and a VEry tall HAT*

Each rhyme line was also composed of two syntactic units, each of which contained two stressed syllables (i.e., two metrical feet):

#### *the WITCH had a CAT and a VEry tall HAT*

This clear and regular correspondence between metrical and syntactic units continues throughout the text. From an analysis of the whole book, 10 couplets were chosen to form the stimulus set. Five couplets had the regular pattern:


The other five couplets had the regular pattern:


To investigate whether metrical groupings influence detection of syntactic-prosodic units, three conditions were created: Metrical-Coincident; Metrical-NonCoincident; and NonMetrical-NonCoincident. A pause was created in the spoken recordings of the couplets to create the three different conditions, as detailed in Table 2.



It should be noted that the syntax in each version remains identical, only the prosodic grouping is altered. Accurate judgements therefore would not reflect syntactic knowledge per se, but rather intuitive knowledge of how prosody and syntax typically interact.

#### *2.3. Recording*

All stimuli were recorded in a soundproof booth by a female speaker of British English using a TASCAM DR-100 recorder via a SHURE SM58 condenser microphone. A regular beat was induced in the speaker using a priming metronome beat in one ear (not audible on the recording) with an inter-beat interval of 750 ms. The stimulus was then spoken so as to align the stressed syllables of the recording with the beats at 750 ms intervals. The precision of this timing was then verified and adjusted as necessary with Audacity software. The inserted pause was equivalent to the insertion of one silent stressed syllable interval, such that there was 1500 ms between the preceding and the following stressed syllable.

Each couplet was recorded in three different versions: Met-Co, Met-NonCo and NonMet-NonCo. The couplets were then arranged in three blocks of 10 couplets, with each block containing a counterbalanced mix of all three conditions (e.g., four Met-Co, four Met-NonCo and three NonMet-NonCo). Each couplet occurred only once in each block, and the order of couplets was fixed across blocks. Each block was listened to in a separate session, with the order of presentation of blocks across the three sessions randomised across participants. Each child ultimately listened to each block and therefore recorded scores for all three versions of each couplet.

#### *2.4. Procedure*

Each child completed the task individually in a quiet area at school. In the first testing session, the experimenter read the entire storybook to the child so that each child was familiar with the text as a whole. Each task block was then presented as part of a wider set of experimental tasks in subsequent sessions.

The task was contextualised by talking about how when reading out loud it was important to take a breath in a 'sensible place, where it fits with the words' because otherwise 'it ... sounds interrupted ... like ... this.' It was then explained that they were going to hear someone reading the words from 'Room on the Broom' but that sometimes the reader would take a breath in a 'funny place; where it sounds wrong; like it doesn't fit'. The task was presented using a laptop computer running Presentation software with the children listening through Sennheiser HD650 headphones via a UGM96

soundcard. The corresponding picture from the book was displayed during the playback of each stimulus. Responses and Response Times were recorded using key presses on the laptop keyboard. Children were asked to press the key with the green 'tick' sticker if they thought the breath sounded like it was in a sensible place which fitted with the words, or the key with the pink 'cross' sticker if they thought it sounded wrong or interrupted. These buttons corresponded to the 'L' and 'A' buttons on the keyboard.

Each presentation of a block of 10 trials was preceded by three practice trials, during which children were given feedback to ensure they understood the task. This was followed by presentation of the 10 experimental stimuli, during which children were given only generic encouragement.

#### *2.5. Auditory Threshold Estimation Tasks*

Children in the AMC and DLD groups also completed four auditory threshold (AT) estimation tasks designed to probe sensitivity to four key acoustic indicators of stress in speech: Amplitude Envelope Rise Time (AERT); Frequency; Duration and Intensity. These were presented via the laptop computer using the Dino software program.

The AT tasks all followed a similar format in which, for each trial, the child heard three tones and was asked to choose which tone was different from the other two. Presentation was always in an AXB format where the middle tone (X) was always the reference tone, one of A and B was also the reference tone whilst the other differed from the reference by a stipulated amount (see below). Children were shown a picture of three cartoon animals and were told that each animal would make a noise and jump at the same time. Their job was to choose the animal that made the different sound. Responses were through mouse click or by pointing. The program provided continuous feedback, with correct answers rewarded with a colourful icon and incorrect answers indicated by an auditory sigh. Each block was preceded by five practice trials during which children received live feedback and further explanation of the task. Tasks were presented in a fixed order of Frequency, Intensity, AERT, Duration.

Frequency: Stimuli consisted of 200 ms tones played at 80.95 dB. The minimum frequency was 250 Hz (reference tone) and the maximum was 279.92 Hz. Increments between tones were of 0.0513 semitones. Children were asked to choose the tone with the different, higher sound.

Intensity: Stimuli consisted of 200 ms tones at a frequency of 250 Hz. The minimum intensity was 61.472 dB and the maximum was 80.95 dB (reference tone). Intensity intervals between levels were of 0.5128 dB. Children were asked to choose the tone with the different, quieter sound.

AERT: Stimuli consisted of 800 ms tones played at 80.95 dB at a frequency of 531.25 Hz. The minimum rise time was a 15 ms slope (reference tone) and the maximum was a 300-ms slope. Fall-off was consistent at 50 ms. Increments to the slope between levels were of 7.0377 ms. Children were asked to choose the tone with the different, gentler beginning.

Duration: Stimuli consisted of tones played 80.95 dB at a frequency of 250 Hz. The minimum duration was 400 ms (reference tone) and maximum duration was 595 ms. Increments in duration between levels were of 5.1282 ms. Children were asked to choose the tone with the different, longer sound.

The Dino program uses a staircasing procedure in order to estimate the auditory threshold. Trials begin with the maximum difference between stimuli (i.e., levels 1 and 40) and initially use a two-up, one-down procedure. This means that two correct answers result in a narrowing of the difference between stimuli, whilst one incorrect answer results in a widening of the difference between stimuli. After four reversals, the procedure is three-up, one-down. Initially, stimuli pairings change by eight levels in each stepchange (e.g., moving from levels 1:40 to levels 1:32); after four reversals, this becomes progressively four-, two- and one-level stepchanges. The final threshold figure is taken as the mean level from the fourth reversal.

Ethical approval for the study was obtained from the Cambridge Psychology Research Ethics Committee reference PRE.2009.02.

#### **3. Results**

#### *3.1. Accuracy*

Children's scores were summed across blocks and calculated according to number of responses correct (i.e., identifying stimuli in condition Met-Co as correct with a 'tick' press and those in conditions Met-NonCo/NonMet-NonCo as incorrect with a 'cross' press). The maximum score was therefore 30, with a maximum score of 10 for each condition.

Due to software errors, two children from each group unintentionally listened to the same block presentation twice. These children's scores were removed from the summary analysis. From a boxplot of scores by group, one AMC child appeared as an outlier. This was confirmed by calculating this child's z-score; this child was still an outlier and so these scores were also removed. Scores for each of the conditions by group are given in Table 3.


**Table 3.** Accuracy scores by condition and group.

As will be recalled, the different conditions were mixed together during task presentation to the child; however, in order to judge whether the groups differed in sensitivity to the task, d' was calculated for each group. Hits were defined as selecting the 'tick response for target tick and the cross response for target cross'. The resulting mean group values were AMC d' = 2.442, DLD d' = 1.395, YLC d' = 1.045. A one-way ANOVA (DV d') revealed that the AMC group was significantly more sensitive than the DLD group (*p* = 0.033) and the YLC group (*p* = 0.001) (Games–Howell corrections). The sensitivity of the DLD and YLC groups did not differ from each other. Accordingly, the DLD children were less sensitive to prosodic-syntactic groupings than would be expected for their age, but were not less sensitive to these groupings than would be expected for their language attainment levels.

In order to compare the groups in terms of accuracy of performance, a 3 × 3 repeated-measures ANOVA (Group: AMC, DLD, YLC; Condition: Met-Co, Met-NonCo, NonMet-NonCo) was conducted. The ANOVA showed no significant main effect of Condition, *F*(1.193,58.473) = 2.004, *p* = 0.160 (Greenhouse–Geisser correction); however, there was a significant effect of Group, *F*(2,49) = 12.077, *p* = 0.000 and the Condition\*Group interaction was also significant, *F*(4,98) = 4.465, *p* = 0.002. Pairwise comparisons (Bonferroni) indicated that the AMC group scored more highly than both the DLD group (*p* = 0.011) and the YLC group (*p* = 0.000), whilst there was no significant difference in score between the DLD and YLC groups. This is consistent with the d' analysis.

The significant Group\*Condition interaction was explored by running a series of one-way ANOVAs for each condition. The ANOVAs revealed no main effect of group for the Met-Co condition, but a significant group effect for the Met-NonCo, *F*(2,20.810) =14.243, *p* = 0.000 and for the NonMet-NonCo, *F*(2,20.556) = 14.435, *p* = 0.000 (Welch's *F*) conditions. Post-hoc tests (Games–Howell) showed that the AMC group were more accurate than the DLD and YLC groups in both of these conditions (Met-NonCo *p* = 0.009, DLD, *p* = 0.001, YLC; NonMet-NonCo *p* = 0.044, DLD, *p* = 0.000 YLC). The DLD and YLC groups did not differ significantly from each other in either condition.

Inspection of the graphed results (Figure 2) helps to illustrate the differing effect of condition for the three groups. As the graph shows, the AMC group scored more highly for the non-coincident Met-NonCo and NonMet-NonCo conditions than for the Met-Co condition, whilst the YLC group's scores were lower for the non-coincident stimuli than the coincident Met-Co type. An unexpected pattern in the results was the relatively poor performance of the AMC group in the Met-Co condition. When compared to their performance in the non-coincident conditions, this suggests that the AMC children were slightly more likely to reject a correct rendition than to accept an incorrect one. The graph also indicates that the children with DLD were as accurate at identifying when the coincidence of prosodic-syntactic cues was correct (Met-Co) as were the AMC and YLC children. However, once these structures were disrupted, their performance fell markedly, suggesting poor sensitivity both to regular metrical groupings that did not coincide with a syntactic boundary (the Met-NonCo condition) and poor sensitivity to irregular groupings that did not coincide with a syntactic boundary. If the children with DLD were sensitive to metrical structure but did not relate this to syntactic structures, then we would expect lower accuracy for Met-NonCo than NonMet-NonCo. However, the performance of the DLD children in both disrupted conditions was statistically equivalent, suggesting that they were insensitive to both speech rhythm and its relationship to syntax.

**Figure 2.** A graph showing the mean score for each condition by group. AMC: Aged-matched children; DLD: Developmental Language Disorder; YLC: younger, language-matched control.

#### *3.2. Reaction Times*

In order to explore group performance in more detail, reaction time data (RT) were also analysed. The mean RT was calculated for each child for each Condition (regardless of correctness of response). Data are shown in Figure 3 and Table 4.

**Figure 3.** A graph of mean reaction times to each condition by group.

A repeated-measures 3 × 3 ANOVA (Group: (AMC, DLD, YLC) × Condition: (Met-Co, Met-NonCo, NonMet-NonCo)) showed no significant effect of condition on Reaction Times, *F*(2,98) = 0.796, *p* = 0.454, nor of group, *F*(2,49) = 2.968, *p* = 0.061. However, there was a significant Group\*Condition interaction, *F*(4,98) = 2.877, *p* = 0.027.


**Table 4.** Reaction Times (s) by Condition and Group.

In order to explore the Group\*Condition interaction, a series of one-way ANOVAs were run for each Condition. There was no significant effect of group for the Met-Co and the Met-NonCo conditions. There was, however, a significant effect of group for the NonMet-NonCo task, *F*(2,49) = 4.667, *p* = 0.014. Pairwise comparison (Games–Howell) of the means for each group showed a significant difference between the YLC and AMC groups in the NonMet-NonCo condition: the younger children were significantly slower (*p* = 0.012). Furthermore, inspection of the graph in Figure 3 shows that both the YLC and DLD groups tended to be slower in response than the AMC group. Accordingly, for the two non-coincident tasks, the DLD children appeared to respond within a similar timeframe to the younger YLC children.

For completeness, the group\*condition interaction was also explored using one-way repeated-measures ANOVAs by group. There were significant effects of condition for the AMC children, *F*(1.339,26.774) = 4.181, *p* = 0.04 (Greenhouse–Geisser correction) but no significant effect of condition for the DLD children, *F*(2,20) = 1.286, *p* = 0.298 nor for the YLC group, *F*(2,38) = 1.687, *p* = 0.199. For the AMC group, despite the significant overall effect of condition, post-hoc pairwise comparisons (Bonferroni) showed no significant differences between conditions, although there was a trend for the responses in NonMet-NonCo to be quicker than those of the Met-Co (*p* = 0.077). In other words, there was a tendency for the AMC group to take longer to decide that the coincident stimulus was correct than to decide that the non-coincident stimulus was incorrect, even though they performed well in both conditions. This result tallies with observations during testing: AMC children often pressed the [x] button as soon as they heard the first non-coincident boundary, immediately confident that this presentation was 'wrong'. In order to be sure that the coincident stimulus was correct, however, the stimulus had to be listened to in its entirety. This may explain this difference in response time trends for the AMC group.

A different effect was observed for the DLD group, who rarely pressed the response buttons before the full stimulus was played. This is reflected in the lack of variation in their response times. It seems that, for the DLD children, in marked contrast to the AMC children, there was no confident decision-making about aberrant prosodic-syntactic groupings reflected in quicker response times. The DLD children puzzled for an equally long time over the coincident stimuli as they did over the non-coincident stimuli. In doing so, they presented a response profile that was statistically comparable to the younger language-matched children.

#### *3.3. Acoustic Threshold Estimation Tasks*

The AMC and DLD groups both completed the four AT tasks in AERT, Frequency, Duration and Intensity. Two scores were not recorded by the software: one Frequency score (one DLD child) and one Intensity score (one AMC child). A series of independent samples t-tests was conducted to examine any differences in acoustic sensitivity between the two groups (see Table 5).


**Table 5.** Results of *t*-tests by group for auditory threshold (AT) tasks.

<sup>a</sup> AMC < DLD. AERT, Amplitude Envelope Rise Time.

The AMC group had significantly lower thresholds (i.e., were able to discriminate more fine-grained differences between stimuli) than the DLD group for the conditions of AERT, Frequency and Duration, whilst there was no significant difference between groups for Intensity. The finding that the AMC and DLD groups performed the Intensity threshold task at equivalent levels shows that the attentional load of the task is not a factor in explaining group performance.

A correlation analysis between acoustic threshold and accuracy score on the experimental task revealed significant correlations between task performance and sensitivity to the acoustic features of AERT (*p* = 0.027), Duration (*p* = 0.004) and Frequency (*p* = 0.000), with the greater the sensitivity to acoustic differences, the more accurate the performance on the task (see Table 6).

**Table 6.** Correlation coefficients (Pearson one-tailed) for AT tasks and accuracy score.


\* *p* <0.05; \*\* *p* <0.01; \*\*\* *p* <0.001.

The children with DLD therefore had larger thresholds (i.e., required a greater difference between stimuli in order to discriminate) for the acoustic measures of Frequency, Duration and AERT and all three of these measures were significantly correlated with success on the experimental task.

A series of three-step fixed order multiple regressions were carried out to explore the unique contributions of each of the acoustic parameters of Frequency, Duration and AERT to success on the task once age and non-verbal IQ (NVIQ) were controlled. NVIQ was taken as the mean score across the two subtests of Picture Completion and Block Design. Table 7 shows the results of the equations with accuracy as the dependent variable (i.e., the child's overall score summed across all three conditions).


**Table 7.** Stepwise regressions showing the unique variance in children's accuracy in judging metrical and syntactic boundaries contributed by the different auditory processing measures.

Note: b = unstandardised beta; SEb = standard error of b; β = standardized beta; ΔR2 = change in R2.

Age was not a significant predictor of performance in this task (*p* range = 0.165–0.303); however, NVIQ contributed significant amounts of unique variance for all three equations (range 29.3–35.1%, *p* range = 0.000–0.001). The greatest unique variance accounted for by the AT tasks was for Frequency (7.1%) followed by Duration (5.8%), although neither was significant once NVIQ was controlled (*p* = 0.071, 0.105, respectively). AERT also failed to make a significant contribution to overall accuracy once NVIQ was controlled (*p* = 0.374) contributing 1.8% of unique variance, the smallest amount. The significant correlations between sensitivity to AERT, Duration and Frequency and overall score may therefore have been partly mediated by NVIQ, with the acoustic cues of Frequency and Duration providing smaller (non-significant) additional contributions to the variance in score.

A further set of three-step fixed order regressions was calculated to explore the relationship between the different acoustic parameters and the overall mean response time (calculated by summing all RT values for each child and dividing by 30). These are shown in Table 8.


**Table 8.** Stepwise regressions showing the unique variance in children's response time in judging metrical and syntactic boundaries contributed by the different auditory processing measures.

In these regressions, Age was a significant predictor of RT, explaining between 18.8% and 19.3% of unique variance. NVIQ did not contribute significantly to the model, and nor did any of the three acoustic parameters. This suggests that the speed of responding was primarily mediated by the age of the participants, even though the younger LMC children were not included in these analyses. The negative b values indicate that the older the participants were, the faster their responses.

#### **4. Discussion**

This study set out to investigate the influence of phrasal metrical hierarchies of acoustic stress patterns on children's capacity to recognise appropriate prosodic-syntactic groupings. In children's literature, the metrical regularities of the texts typically serve to emphasise the syntactic structures of language through the coincidence of prosodic and syntactic boundaries. Here, children listened to different readings of phrases from a children's story, and were asked to indicate when the reader took a breath in a 'funny place, where it sounds wrong, like it doesn't fit'. Two types of disruption were tested, breaths that created metrical groupings that conflicted with syntactic groupings (Met-NonCo) and breaths that violated both metre and syntax (NonMet-NonCo). As children with DLD have known acoustic difficulties with stress, indexed by their insensitivity to amplitude envelope rise time (AERT), and acoustic difficulties with grouping, indexed by their insensitivity to duration [21,22], this may affect their ability to use the prosodic-syntactic grouping typical of representative texts in children's literature to aid grammatical learning. If children have robust knowledge of how prosody and syntax interact, then any violation of these coinciding units should be readily identified. Alternatively, if children are able to detect metrical patterns but are unable to relate these to the overall prosodic-syntactic structure, then phrases in the condition in which there is an acoustic metrical rhythmic structure that does not coincide with the syntax (condition Met-NonCo) should prove more difficult to reject than phrases in the condition in which there is no consistent metrical acoustic pattern (condition NonMet NonCo). If the children with DLD are insensitive to both acoustic prosody and its relationship with syntax, then there should be no difference in performance between the three different conditions.

Overall, the DLD children were indeed less sensitive to the prosodic-syntactic groupings that we used, as their d' scores were significantly lower than that of the age-matched control (AMC) children. Nevertheless, the DLD children were as sensitive to the prosodic-syntactic groupings as the younger language-matched control (YLC) children, as their d' scores were statistically equivalent. However, their reduced sensitivity did not reflect a lack of effort in the task. The slowed response times of the DLD and YLC children showed that they were analyzing the non-coincident phrases, as they were slower to respond in the two non-coincident conditions. The DLD group were impaired compared to the AMC group at detecting violations of prosodic-syntactic units, and this was shown by both their response time data and their accuracy data. The accuracy of the DLD group was comparable to that of the younger children. This is the pattern that we would expect if DLD children have difficulty in processing language metre and its relationship with syntactic structures. From the accuracy scores, it seems that the most significant impact of metrical grouping appears to be the attention that it draws to the prosodic-syntactic unit, rather than its temporal regularity per se. If metrical regularity alone (regardless of syntax) were influencing responses, we would expect a higher error rate in the Met-NonCo condition than NonMet-NonCo, with DLD children responding to the regular metre in the former condition and judging the stimulus phrases as acceptable. The data did not support this explanation, as the slight difference visible in Figure 2 was not significant.

Indeed, the response time data showed that the DLD and YLC children did not differ in their response times across conditions. If DLD children were confident in using prosody to detect syntactic boundaries regardless of metre, we would expect swift responses to these violations (fast responding for both Met-NonCo and NonMet-NonCo conditions, as found for AMC children). Alternatively, if the DLD children were able to detect the metrical grouping but could not readily relate that perceptual structure to the syntactic groupings, then we would expect Met-NonCo stimuli to produce slower responses due to the conflicting information. This was not the case, suggesting that the DLD children could neither detect the metrical grouping nor integrate it with their expectations of prosodic-syntactic groupings. Inspection of the response time data showed that the children with DLD did not make early detections of errors, almost always choosing to listen to the whole recording. Accordingly, the children with DLD were unable to systematically determine whether structural boundaries had been violated. Their performance in both accuracy and speed of response resembled that of the younger children (YLC group), suggesting a developmental delay in their ability to integrate prosodic and syntactic structures.

Overall, the data suggest that younger children and DLD children have less well-developed schema for how prosody and syntax interact compared to older typically developing children and therefore that this is an aspect of language processing that continues to develop throughout childhood. The DLD children appear to have underdeveloped schema for the interaction of prosody and syntax for their age. This suggests that they may not be processing all of the cues available to them in segmenting the speech stream into prosodic units and grammatical clauses. Instead, they responded similarly to the younger children. Note that the syntax itself was identical in all three conditions, so the test is not one of grammatical structures per se, but of how these structures interact with prosodic units in typical speech.

The range of responses here sits interestingly between experiments with infants, which have shown that infants are sensitive discriminators of pauses inserted within clauses or at clause boundaries [42], and those with adults, who also judge sentences where pauses coincide with phrasal boundaries to be more natural [19]. The older AMC children were more adult-like in their responses, being able to judge both the Met-Co sentences as being natural and the Met-NonCo and the NonMet-NonCo sentences as 'sounding funny'. The question, however, is why the DLD and, particularly, the YLC children have relatively poor accuracy for the non-coincident stimuli if 9-month-old infants are sensitive to these boundaries. One explanation could lie in task demands. In our experiment, the children were asked to decide explicitly which was the 'correct' version, and so this required a greater degree of metalinguistic awareness than the infants in Jusczyk et al.'s passive listening study [19]. This raises an important conceptual difference between 'sensitivity' in the sense of discriminating between prosodic structures and 'awareness' in the sense of consciously noticing the significance of any discriminated difference. It is possible that the children with DLD were sensitive to the differences between the prosodic structures in the stimuli but were unable to determine whether this observed difference resulted in a pragmatic difference: that the breaks were in a 'funny place'. As we also collected acoustic data, however, and found reduced sensitivity for DLD children in three of the four acoustic parameters that contribute to stress perception, this seems unlikely. It seems more likely that the children's lack of pragmatic awareness stemmed from their poorer discrimination of the acoustic parameters that enabled reliable identification of the prosodic structures. This perceptual difficulty then reduced the ability of the DLD children to integrate information about perceptual prominence with syntactic expectations.

On the basis of our previous acoustic work [21,22,26,28], we proposed that children with DLD may be delayed in developing schema for prosodic-syntactic hierarchies because of impaired sensory processing. In English, the four acoustic parameters that we measured of AERT, duration, frequency and intensity combine to give the percept of stress and thereby linguistic rhythm [23]. Accordingly, acoustic sensitivity to these parameters is likely to influence linguistic rhythm development. The DLD group had significantly higher thresholds for AERT, duration and frequency than did the AMC group, whilst they did not differ in intensity thresholds. This pattern of responses accords with previous studies that have found that children with DLD have impairments in discriminating AERT [21,26,27,43] and duration [21,22,26]. Some studies have also found a frequency impairment [22,24], but see also [21,25]. Poorer sensitivity in these acoustic tasks has previously been associated with poorer performance on tasks probing linguistic stress [21]. Here, we also found a significant association between the acoustic thresholds for AERT, duration and frequency and the capacity to detect violations of prosodic-syntactic boundaries. This suggests that the less sensitive auditory systems of children with DLD may be impacting upon their perception and subsequent integration of prosodic cues with larger-grained syntactic units. However, once non-verbal IQ was controlled in a series of regression equations, none of the acoustic parameters measured accounted for unique variance in task performance. This contrasts with some of our earlier DLD studies, where both AERT and duration measures have explained unique variance in language tasks even after NVIQ has been controlled [22,26]. Processing prosody across larger phrasal units requires the tracking of relative acoustic hierarchies across time and then integrating this lower-level phrasing into the overall acoustic hierarchy. If children with DLD need greater acoustic differentiation between phrases in order to discriminate the overall acoustic hierarchy, then the less salient stress cues available in natural language may result in demarcations in the signal being missed. This may in turn lead to a failure to establish schema (relative stress templates) for the hierarchical relationships necessary for interpreting prosodic structure at a phrasal and clausal level.

The coincidence of the acoustic cues that create the prosodic-syntactic structure is particularly salient in repetitive and rhythmic children's literature, which was used to generate the stimuli used in this experiment. In stories such as *Room on the Broom*, there is reciprocal cuing of prosodic and syntactic elements such that sensitivity to one facet of the structure should facilitate processing of the other facet. However, our results suggest that this was not the case for the DLD children. Unlike typically developing children of the same age, they were unable to detect the prosodic-syntactic mismatches, suggesting that they are not yet proficient in integrating the two structural systems. Instead, they performed like the younger children in the experiment. This suggests a developmental trajectory for the development of prosodic-syntactic schemata in which the children with DLD exhibit delayed

development. If children with DLD are in general slower to establish prosodic hierarchies, possibly due to poorer acoustic sensitivity, then it could be that they require greater exposure to structured linguistic input than do typically developing children to attain a similar developmental level.

It has previously been found that infant-directed speech (IDS) is much richer in acoustic cues to linguistic features than adult-directed speech (ADS) and that IDS can facilitate syntactic boundary detection in infants when compared with ADS [19,42]. Studies of the acoustic characteristics of IDS have found that the rhythmic focus rapidly shifts as the infant ages [44]. It could be therefore that for children with DLD, a longer period of structured prosodic input is required if sensitivity to the meaning of the units is to develop. If children with DLD are less efficient at discovering these acoustic cues to syntactic boundaries, and as IDS changes rapidly with the age of the child, it could be that children with DLD end up 'missing out' on this crucial early aspect of language acquisition: the incoming signal 'moves on' before their system is ready to cope with a less-structured and salient input. Such a scenario would have significant implications for language development. Morgan and Saffran [45] argued that prosody should be regarded as a kind of parameter-setting device, providing a rough categorisation of the input into smaller units and thereby constraining the amount of input, which is then subject to further analysis, for example by statistical learning. If this is the case, then sensitivity to prosodic units would be a powerful tool in the process of discovering grammatical units. In this view of language acquisition, poorer sensitivity (in terms of acoustic, stimulus-driven sensitivity) to metrical hierarchies of stress patterns would mean that constraining parameters fail to be set in chunking the input stream. Accordingly, a subsequent analysis would be carried out across much greater chunks of input, resulting in a far more unwieldy task. This in turn would lead to difficulty in segmenting language into grammatical units, such as clauses and phrases, with knock-on implications for acquiring smaller-grained aspects of morphology: exactly the kinds of linguistic difficulties that characterise children with DLD.

#### **5. Conclusions**

In conclusion, our results suggest that children with DLD have poorer sensitivity to the acoustic cues to linguistic rhythm that enable the creation of prosodic-syntactic schemata. This difficulty in recovering metrical hierarchies of acoustic stress patterns impairs their ability to capitalise on the prosodic cues to syntax present in speech which bootstrap grammatical competence. If prosodic cues enable more efficient parsing of the speech stream, then explicitly teaching children to listen for these acoustic stress cues may increase their ability to integrate prosody and syntax. Via such instruction, children's capacity to derive grammatical structure from prosodically driven input could be increased. Accordingly, interventions using rhythmic children's texts to highlight this congruence of prosody and syntax could theoretically be of great value in scaffolding grammatical development in children with DLD.

**Author Contributions:** Conceptualization, S.R. & U.G.; Investigation, S.R.; Supervision, U.G.; Writing (original draft), S.R.; Writing (review & editing), S.R. & U.G.

**Funding:** This research was funded by an Economic and Social Research Council PhD studentship awarded to S.R. and supervised by U.G.

**Acknowledgments:** We would like to thank all the teachers, parents and children who enabled this research to take place.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Event-Related Potential Evidence of Implicit Metric Structure during Silent Reading**

**Mara Breen 1,\*, Ahren B. Fitzroy 1,2 and Michelle Oraa Ali 1,3**


Received: 17 July 2019; Accepted: 5 August 2019; Published: 8 August 2019

**Abstract:** Under the Implicit Prosody Hypothesis, readers generate prosodic structures during silent reading that can direct their real-time interpretations of the text. In the current study, we investigated the processing of implicit meter by recording event-related potentials (ERPs) while participants read a series of 160 rhyming couplets, where the rhyme target was always a stress-alternating noun–verb homograph (e.g., permit, which is pronounced PERmit as a noun and perMIT as a verb). The target had a strong–weak or weak–strong stress pattern, which was either consistent or inconsistent with the stress expectation generated by the couplet. Inconsistent strong–weak targets elicited negativities between 80–155 ms and 325–375 ms relative to consistent strong–weak targets; inconsistent weak–strong targets elicited a positivity between 365–435 ms relative to consistent weak–strong targets. These results are largely consistent with effects of metric violations during listening, demonstrating that implicit prosodic representations are similar to explicit prosodic representations.

**Keywords:** implicit prosody; reading; meter; rhythm; lexical stress; event-related potentials; poetry

#### **1. Introduction**

According to the Implicit Prosody Hypothesis [1–3], readers generate imagined representations of prosodic structure during silent reading that are similar to the explicit prosodic representations that readers produce when reading aloud. This hypothesis has been supported by behavioral evidence demonstrating similarity between real and imagined representations of a variety of prosodic phenomena, including intonation, phrasing, stress, and meter [4,5]. For example, evidence for implicit intonational structure is provided by the fact that readers are faster to recognize target words that are produced aloud with a previously imagined intonation contour [6,7]. Readers impose implicit phrase boundaries in sentences that are long enough to have a phrase break [8] and tend to balance the size of adjacent phrases even during silent reading [9,10], providing evidence for implicit prosodic phrasing. Readers take longer to silently read words with two stressed syllables than words with one stressed syllable [11], and take longer to read sentences in which a local lexical stress pattern mismatches the predicted metric structure as determined by prior sentence material [12–16], providing evidence for an implicit metric structure. Although these behavioral similarities between patterns associated with explicit and implicit prosody provide indirect support for implicit prosodic representations, they cannot tell us to what extent implicit prosodic representations are processed similarly to explicit prosodic representations. In the current study, we used event-related potentials (ERPs) to investigate the processing of implicit prosodic representations, and how it compares to that of explicit prosody.

#### *1.1. Behavioral Studies of Explicit and Implicit Linguistic Metric Representation*

The specific focus of the current study is the similarity between implicit and explicit metric processing. For this investigation, we exploit metrical regularity in English; English is a stress-timed language, meaning that speakers produce temporally regularized sequences of strong (stressed) and weak (unstressed) syllables. The metric structure in stress-timed languages is conveyed by the timing of strong syllables [17,18]. There are constraints on the ordering of strong and weak beats in stress-timed languages, as strong beats tend to occur at regular intervals [19], speakers avoid clashes of strong beats and lapses of weak beats [20], and under some circumstances, speakers shift the location of stress on words to maintain metric regularity (e.g., thirTEEN MEN → THIRteen MEN) [18]. Speakers signal strong syllables in speech with a variety of acoustic cues, including longer duration and higher intensity [21–24]. Strong syllables also hold a privileged position in auditory language comprehension; listeners are faster to detect phonemes in stressed syllables [25], lexical access is more disrupted by the mispronunciation of stressed syllables than unstressed syllables [26], and listeners tend to interpret stressed syllables as word onsets [27,28]. Moreover, listeners use the pattern of strong and weak syllables to predict what words will come next [29], and to resolve lexical ambiguity [30–33].

Like speakers and listeners, there is evidence that readers are also sensitive to a metric structure. For example, readers spend more time fixating four-syllable words with two stressed syllables (e.g., RAdiAtion) than four-syllable words with one stressed syllable (e.g., geOmetry) [11]. In silent reading, syntactically ambiguous sentences are more likely to be resolved in ways that maintain alternating strong and weak syllables [14,15]. In the study that serves as the inspiration for the current study, Breen and Clifton tracked participants' eye movements as they read limericks designed to induce readers to generate strong expectations about the stress pattern of upcoming words [13]. The target word in the critical items, which was always the final word of the second line of the limerick, was a stress-alternating noun–verb homograph; these words are realized with strong–weak (SW) stress as a noun (e.g., PERmit), but weak–strong (WS) stress as a verb (e.g., perMIT) [29]. In this way, the target was either SW or WS, and this lexical stress pattern was either consistent or inconsistent with the metric structure of the limerick (see Table 1). Throughout this paper, we will refer to the occurrence of an inconsistent SW word when a WS word is predicted as a strong–weak (SW) violation, and to the occurrence of an inconsistent WS word when a SW word is predicted as a weak–strong (WS) violation.


**Table 1.** Metric structure of experimental couplets in each of the four conditions for the target word 'permit'.

Italics and underlines indicate metrically strong syllables, bold indicates target words, and capital letters indicate lexical stressed syllables within target words. Asterisks (\*) indicate metrically inconsistent targets. Screen breaks are indicated with solid vertical lines. Text emphasis is for descriptive purposes only; in the experiment, all words were presented in plain text (see Figure 1).

**Figure 1.** Presentation times in milliseconds of each region of the limerick couplets.

Breen and Clifton predicted that readers would encounter difficulty whenever the stress pattern of the target word mismatched the pattern of the limerick. However, they only observed an effect of metric mismatch for WS violations (e.g., Table 1D); reading times for SW violations (e.g., Table 1B) did not differ from those of consistent SW words. Breen and Clifton argued that these results reflect the uneven distribution of SW and WS words in the English lexicon; 85–90% of content words in English have an initial stressed syllable [34]. Specifically, there is a minimal cost to encountering a SW word in a context where a WS word is predicted because SW is the default stress pattern. Identifying a WS word in a context that predicts SW, on the other hand, is costly because of both the conflict with context and the lower base frequency of the WS pattern. This interpretation is supported by previous work showing that auditory word identification is more disrupted when a canonically SW word is pronounced as WS, than when a canonically WS word is pronounced as SW [35]. Moreover, the observed effect was not on initial reading times, but only on the combined duration of fixations on the target word and time spent rereading earlier sentence material. The latency of this effect, therefore, suggests that the WS violation did not disrupt initial reading times but required later reanalysis.

#### *1.2. Event-Related Potential Studies of Explicit Linguistic Metric Processing*

In ERP investigations of explicit metric processing during speech perception, multiple methods have been used to investigate metric violations. One major source of variation among these studies is whether the metric violation is determined by the lexical stress pattern of the word in isolation or only by the context in which the word occurs. In studies of the first variety, researchers presented multisyllabic words auditorily with the correct or incorrect stress pattern either in isolation [36–38] or in a sentence context [39,40]. In studies of the second variety, researchers established a context that created an expectation of a specific metric pattern, then presented a target that had the correct metric pattern in isolation but was consistent or inconsistent with the expected pattern created by the context. One such paradigm used word strings to create metric context: listeners heard a string of three or four prime words with the same lexical stress pattern (all SW or all WS, e.g., BANKer, HELPful, PARty or moRALE, emBRACE, deLIGHT) followed by a target word with the same stress pattern as the primes or the opposite pattern [41,42]. In another such paradigm, participants heard sentences with a consistent metric structure including a target which was either consistent or inconsistent with

the established pattern [43–48] (e.g., stress clash in "The chamPAGNE COCKtails are very delicious"). A final method used cross-modal information to inform prosodic interpretation, as in [49] where participants viewed pictures which disambiguated the meaning of semantically ambiguous two-syllable strings like greenhouse, which are disambiguated by stress patterns (GREENhouse vs. green HOUSE).

Regardless of the type of manipulation, these ERP studies demonstrate that encountering metric violations while listening generally gives rise to an early negativity between 250 and 500 ms [36–48]. However, this early effect is not consistent across studies, in terms of timing and polarity. Some of the variance can be explained by the different responses to SW violations and WS violations in two-syllable words; SW violations, where a SW word appears when a WS word is predicted, typically elicit an early negativity [41–47,49]. The results are more mixed for WS violations, where a WS word appears when a SW word is predicted, which elicit an early negativity in some cases [41,42] but has also been shown to elicit an early positivity relative to predicted metric patterns [36,37,40,42,48]. In two studies, both SW and WS violations elicited an early negativity, but the negativity to SW violations peaked earlier [41,49].

Additionally, explicit metric violations have often been shown to elicit a late positivity between 500 and 1000 ms [36–40,42–44,46,48]. In contrast to the early time window, this later effect does not seem to differ in polarity or timing as a function of target lexical stress pattern. However, its presence is dependent on the experimental task; in cases where the participants' task is to make an explicit assessment of the accuracy of the metric structure of the target, that target usually elicits a late positivity [36–38,46,48,49], though this is not always the case [45,47]. In contrast, if the participants' task does not include a specific assessment of the metric structure, a late positivity is absent [41]. Indeed, in cases where the explicitness of a metric judgment is varied within the experiment, a late positivity is generally evident only when the task requires this judgment [39,40,42–44].

Despite some variation across studies, these neural effects of metric inconsistency appear to be distinct from the neural effects of either syntactic or semantic violations. Syntactic violations typically elicit a biphasic response consisting of a left-lateralized anterior negativity peaking around 300 ms (LAN) and a posterior positivity peaking around 600 ms after stimulus onset (P600/LPC) [50]. A simultaneous test of metric and syntactic violations reported distinct negativities for each violation type however, with the negativity to metric violations occurring earlier than the negativity evoked by syntactic violations (which was interpreted as a LAN) [43]. Semantic violations typically elicit a parietally-maximal negativity around 400 ms (N400) [51]. Although some authors have interpreted the early negativity elicited by metric violations as an N400 [39,40], this metric negativity has been observed in response to illegal stress shifts in pseudowords which have no lexico-semantic content and should not result in an N400 [45]. Further, semantic incongruity and metric incongruity have been shown to modulate the amplitude of an early negativity differently when considered in the same design, even by authors who categorize deviations from a predicted metric structure as N400 effects [39,40]. Finally, [44] observed that simultaneous metric and semantic violations lead to a larger negativity than that observed for semantic violation alone, and [52] used neuroimaging to demonstrate that the responses to semantic and metric violations have different neural generators, providing evidence that metric violations are not simply processed as semantic violations.

#### *1.3. Event-Related Potential Studies of Implicit Linguistic Metric Processing*

ERPs have also been used to explore implicit metric representations during silent reading. In one study, readers were presented with strings of four two-syllable English prime words with consistent lexical stress patterns, followed by a target word that was consistent or inconsistent with the stress pattern of the previous words [53]. Both SW and WS violations resulted in a larger fronto-central negativity from 250–400 ms after word onset, relative to words with a predicted stress pattern. In addition, all SW targets, whether consistent or inconsistent with the context, elicited a larger negativity (350–450 ms after word onset) than WS targets. In another study exploring silent metric processing in word lists, readers were presented with strings of three two-syllable German prime words

followed by a SW or WS target. In this case, there were no observable ERP differences for SW violations, but WS violations were more positive than correct WS targets in three time windows: between 250–400, 400–600, and 600–800 ms after target onset [54]. A final study presented participants with an auditory tone sequence with a SW or WS pattern followed by a visually presented two-syllable English word which was consistent or inconsistent with the tone sequence stress pattern [55]. The results demonstrated a larger negativity from 300–700 ms after target presentation for SW violations compared to correct SW targets, but no significant ERP effect for WS violations. In general, these studies demonstrate that, similar to explicit metric violations, implicit metric violations often evoke an early negativity that is more reliably observed for SW than WS violations. Moreover, two of these studies are consistent with results from explicit meter studies in that when the task does not require an explicit metric judgment (and none of these did; rather, participants' task was to make an old/new judgment of the target [53], a lexical decision judgment [55], or answer a semantic question about the word strings [54]), there is no late positivity.

Multiple factors could be contributing to the variability in results observed across previous investigations of ERP responses to implicit metric violations. First, these studies have used different target words in the SW and WS conditions, meaning that the observed results may reflect differences beyond prosody, including phonetic, orthographic, or lexical differences between conditions. Second, these studies used single words or word lists to create metric expectations but, in these contexts, readers are not required to fully process the syntactic and semantic structure of the targets; this variability could lead to heterogenous depth of processing across conditions. Therefore, in the current study, we implemented metric expectations using metrically regular rhyming couplets, which encourage readers to make strong predictions about when strong and weak syllables will occur but also require deep linguistic processing. Moreover, our target words are stress-alternating noun–verb homographs, which can have SW or WS stress depending on the syntactic category. In this way, readers are exposed to the same visual, orthographic, and segmental input across all conditions.

If readers are generating implicit metric predictions during silent reading, we predict that targets which are inconsistent with the metric context will result in early differences in the ERP waveform compared to metrically consistent targets. However, based on prior work, we predict that this early effect may differ depending on the type of violation. Specifically, we predict SW violations will elicit an early negativity relative to consistent SW words. Conversely, WS violations may result in either a reduced negativity, or a positivity, relative to WS consistent targets. Moreover, we predict the absence of a late positivity in response to metric violations, as participants are not making explicit judgments about the metric structure.

#### **2. Materials and Methods**

#### *2.1. Participants*

Eighteen participants from Mount Holyoke College with an average age of 20 years (SD = 1.57 years) contributed data to the analyses. Seventeen participants identified as female and one identified as nonbinary/genderqueer. All participants were right-handed native speakers of American English, meaning they had been speaking English in the US since at least the age of three. One participant was born outside the US to English-speaking parents and moved to the US at age three. Five participants identified as bilingual as they had acquired high proficiency in another language starting before the age of three. All participants reported having normal or corrected-to-normal vision and had not taken psychoactive medications in the 24 h prior to the experiment. For the two-hour experiment, participants received compensation in the form of Psychology course research credit or \$20. Data were collected but discarded from an additional four female participants due to voluntary withdrawal from the experiment (*n* = 1), recording equipment malfunction (*n* = 1), or excessive noise in the EEG (exclusion of more than 50% of trials from one or more conditions due to artifact; *n* = 2).

#### *2.2. Materials*

Experimental materials consisted of 160 limerick couplets (i.e., the first two lines of the limerick) adapted from the stimuli in [13], in which the final word of the second line was one of 40 stress-alternating noun–verb homograph targets (see Table 1). The stress pattern of these targets varies depending on the syntactic category: the noun form has a strong–weak pattern (e.g., PERmit), the verb (or adjective) form has a weak–strong pattern (e.g., perMIT). The homographs were selected from [29] and the Kucera–Francis corpus [56]. The frequency of occurrence of each target as a noun or verb/adjective in the Kucera–Francis corpus did not differ, as measured by a paired *t*-test, *t*(39) = 0.28, *p* = 0.78.

For each target homograph, four couplets were constructed, crossing the factors stress pattern (SW vs. WS) and metric consistency (consistent vs. inconsistent). All experimental couplets can be found in the Appendix A. The first line of each couplet established the metric and rhyming context for the target word. The stress pattern manipulation was implemented such that for half of the couplets, the target (e.g., permit in Table 1) was the noun form with a SW pattern (Table 1A,B). For the other half, the target was the verb/adjective form (Table 1C,D). The metric consistency manipulation meant that for half of the couplets, the stress pattern of the target homograph was consistent with the stress pattern predicted by the couplet (Table 1A,C). For the other half, the stress pattern of the target was inconsistent with the established pattern (Table 1B,D). The occurrence of an inconsistent SW word when a WS word is predicted is a strong–weak (SW) violation (Table 1B)*,* and the occurrence of an inconsistent WS word when a SW word is predicted is a weak–strong (WS) violation (Table 1D).

In addition to the 160 experimental couplets, participants read 160 filler couplets which were always metrically consistent but varied in the stress pattern of the target regions (see Table S1 and examples (1), (2)). In this way, participants read a total of 80 rhythmically inconsistent items in a pool of 320 (25% of the total).

Examples:


#### *2.3. Procedure*

After providing informed consent, participants were seated comfortably in a sound-isolated room where they viewed the couplets on a computer screen located approximately 90 cm away. The 320 experimental and filler couplets were presented in a different randomized order for each participant. Each trial began with the presentation of the word "Ready?" which stayed on the screen until the participant responded with a keypress. The word was then replaced by a fixation cross, which remained on the screen for 1000 ms (Figure 1). Following the fixation cross, couplets were presented in six one-to-four-word (one-to-five-syllable) segments in the center of the screen (see Table S1). The 1st, 2nd, 4th and 5th segments were presented for 1000 ms each; the 3rd and 6th segments, corresponding to the end of the first and second lines, respectively, were presented for 2000 ms each. In the experimental couplets, the 3rd and 6th segments always contained two-to-three syllables, one of which was strong. The 1st, 2nd, 4th and 5th segments were more variable, but constrained so that each contained one strong syllable, and one-to-four weak syllables. The number of words and syllables varied across these segments because the couplets varied widely in terms of the number of words and stress patterns of the words that made them up. However, segments were consistently defined based on the first author's intuition of natural syntactic and prosodic breaks in limerick structure.

To ensure that participants were reading for meaning, 25% of the filler trials (12.5% of all trials) were followed by a yes/no comprehension question about the semantic content. Participants held a response box in their lap for the duration of the experiment which they used to answer comprehension questions, and to advance the presentation of trials. Participants were given breaks between trials to allow time for blinking, as well as a longer break after every 40 trials; the length of these breaks were determined by the participant. The entire experimental session lasted approximately 2 h. All experimental procedures were approved in advance by the Institutional Review Board of Mount Holyoke College.

Reference-free electroencephalogram (EEG) data were collected using 64 active Ag/AgCl electrodes placed in an elastic cap and connected to a BioSemi Active-Two system, which digitized the EEG at a sampling rate of 2048 Hz and employed a hardware lowpass filter reaching −3 dB at 409.6 Hz. Reference-free EEG was also collected from two active electrodes attached bilaterally to the participant's mastoids, and from four active electrodes placed above and below the left eye and bilaterally outside the outer canthi. All electrode offsets were brought below 20 mV at the start of the recording and kept below 50 mV throughout the recording. Continuous EEG data were referenced offline to the averaged mastoid recording, downsampled to 512 Hz, and filtered at 60 Hz using a Parks–McClellan notch filter. Bipolar vertical and horizontal electrooculogram (VEOG, HEOG) signals were derived by subtracting the above eye signal from the below eye signal, and the left from the right eye signal, respectively. Continuous EEG was segmented into epochs from 100 ms prior to target word onset to 800 ms following target word onset, and baseline-corrected to the 100 ms prestimulus period. Electrodes Oz and Iz were each identified as unusable for at least one participant and were excluded from further processing and analysis. Epochs containing eyeblinks or eye movements were identified algorithmically using moving window peak-to-peak voltage deflection detection on the VEOG channel (threshold = 150 μV, window size = 200 ms, window step = 25 ms) and step-like artifact detection on the HEOG channel (threshold = 100 μV, window size = 400 ms, window step = 25 ms), respectively. Additionally, epochs exceeding ±170 μV in any EEG channel were marked as artifact. The results of automatic artifact detection were then manually inspected and if needed, adjusted, and trials found to contain artifacts were excised. Artifact-free trials were then averaged by participant and condition; participants included in the analysis contributed data from at least 20 out of 40 trials (M = 31; SD = 6) in every condition. EEG data processing was performed in MATLAB using the EEGLAB [57] and ERPLAB [58] analysis packages.

#### *2.4. Analysis*

Previous ERP investigations of implicit linguistic metric processing have revealed effects across multiple time windows, with inconsistent time windows observed across studies [36–49]. We therefore opted to define our temporal regions of interest using a data-driven approach. To minimize implicit multiple comparisons when selecting time windows [59], we performed a series of cluster-based permutation tests over a moving 50 ms window (5 ms step) using the Mass Univariate ERP Toolbox [60]. These tests were performed separately for SW and WS violations. Within each moving window, electrodes at which the inconsistent vs. consistent *t*-test of mean window amplitude resulted in *p* ≤ 0.01 were identified, then clustered if they were within 5.44 cm of one another. Cluster magnitudes were then calculated as the sum of all *t*-scores for electrodes contained within a cluster. Lower-tailed *t*-tests were used for the SW comparisons based on prior findings that SW violations consistently elicit relative negativities [36,41,44,45,49,53,55], whereas two-tailed *t*-tests were used for the WS comparisons based on prior findings that WS violations elicit both relative negativities and positivities [40–42,54]. This process was replicated over 5000 shuffled iterations, and a cluster magnitude threshold was defined as the magnitude that clusters met or exceeded on only 5% of the shuffled (i.e., chance) iterations. Moving windows within which any clusters identified in the experimental data met or exceeded the cluster magnitude threshold were defined as temporal regions of interest (see Figure S1). This approach revealed three regions of interest, which were further investigated using conventional, ANOVA-based ERP analyses: 80–155 ms (SW), 325–375 ms (SW), and 365–435 ms (WS).

We selected 49 electrodes for conventional, ANOVA-based ERP analysis (Figure 2). Scalp position was treated as two factors in the statistical model: electrode anteriority had seven levels ranging from most anterior to most posterior electrodes, and electrode laterality had seven levels, ranging from left to right. Based on our cluster-based permutation tests, we assessed SW ERP amplitudes in two time windows (80–155 ms and 325–375 ms), and WS ERP amplitudes in one time window (365–435 ms). Mean amplitudes from each participant in each time window were entered into a 2 (metric consistency) × 7 (anteriority) × 7 (laterality) repeated-measures ANOVA. Significant and marginal interactions of

metric consistency with electrode position in the absence of a main effect of metric consistency were further investigated with follow-up ANOVAs over constrained scalp regions. Only main effects and interactions which involve metric consistency will be discussed. Whenever Mauchly's Test indicated that the assumption of sphericity had been violated for comparisons with more than two levels, Huynh–Feldt-corrected degrees of freedom were used to compute statistical significance. All statistical analyses were implemented in the R statistical framework [61] with the ez package [62].

**Figure 2.** Event-related potential (ERP) results. (**A**) Effects of metric predictability on a grand average (*n* = 18) waveform amplitude for strong–weak (SW) targets (left) and weak–strong (WS) targets (right). Temporal regions of interest identified in the cluster-based permutation analyses are highlighted in grey. Temporal regions of interest that revealed a significant (*p* < 0.05) main effect of metric consistency in conventional ANOVA analyses are indicated with an asterisk. Waveforms are averaged over the 49 electrodes included in the ANOVA analyses; the 7 (anteriority) × 7 (laterality) grid arrangement used to model electrode position in all ANOVAs is shown in the inset. (**B**) Scalp maps showing the topography of mean amplitude differences between the inconsistent and consistent conditions within the two temporal regions of interest identified for SW targets (left), and the one temporal region of interest identified for WS targets (right). The scalp region over which a significant (*p* < 0.05) main effect of metric consistency was observed within the specified time window is outlined in black for each scalp map.

#### **3. Results**

#### *3.1. Behavioral*

Participants answered comprehension questions with an average accuracy rate of 96.25% (SD = 4.2%), demonstrating that they were attending to the couplets and engaged with the task.

#### *3.2. Event-Related Potentials*

#### 3.2.1. SW Violations

SW violations elicited a negativity from 80–155 ms relative to predicted SW targets over left and medial scalp positions (Figure 2). An overall 2 × 7 × 7 ANOVA looking only at SW targets revealed a marginal interaction of metric consistency and electrode laterality, *F*(2.74,46.55) = 2.38, *p* = 0.087, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.007. A follow-up 2 <sup>×</sup> 7 ANOVA looking only at SW targets over left and medial scalp positions

revealed a negativity elicited by SW violations relative to predicted SW targets, *F*(1,17) = 4.75, *p* = 0.044, η<sup>2</sup> = 0.14.

SW violations also elicited a negativity from 325–375 ms relative to predicted SW targets over the entire scalp, that was largest over left and medial scalp positions (Figure 2). An overall 2 × 7 × 7 ANOVA looking only at SW targets revealed a main effect of metric consistency, *F*(1,17) = 5.32, *p* = 0.034, η<sup>2</sup> = 0.09, and a marginal interaction of metric consistency and electrode laterality indicated that this effect was largest over left and medial scalp positions, *F*(3.38,57.45) = 2.51, *p* = 0.06, η<sup>2</sup> = 0.004.

#### 3.2.2. WS Violations

WS violations elicited a positivity from 365–435 ms relative to predicted WS targets over the entire scalp, that was largest over central and posterior scalp positions (Figure 2). An overall 2 × 7 × 7 ANOVA looking only at WS targets revealed a main effect of metric consistency, *F*(1,17) = 4.99, *p* = 0.039, η<sup>2</sup> = 0.06, and a marginal interaction of metric consistency and electrode anteriority indicated that this effect was largest over central and posterior scalp positions, *F*(2.68,45.54) = 2.78, *p* = 0.06, η<sup>2</sup> = 0.01.

#### **4. Discussion**

The goal of the current study was to investigate the realization of metric representations during silent reading using ERPs. Participants silently read metrically regular rhyming couplets in which the final target word had a strong–weak (SW) or weak–strong (WS) lexical stress pattern that was either consistent or inconsistent with the metric stress pattern predicted by the couplet. The results demonstrated that SW targets which were inconsistent with the stress pattern of the couplet (i.e., SW violations) elicited two separate negativities (80–155 ms and 325–375 ms after word onset) relative to SW targets which were consistent with the predicted stress pattern. Conversely, WS targets inconsistent with the stress pattern of the couplet (i.e., WS violations) elicited an early positivity (365–435 ms after word onset) relative to WS targets which were consistent with the predicted stress pattern. Neither SW nor WS violations elicited a late positivity. Together with prior results, the current results support the Implicit Prosody Hypothesis, which maintains that readers are generating implicit versions of prosodic structure even when reading silently, and that these representations are similar to explicit ones.

The observation of a significant negative left-lateralized deflection from 80–155 ms in response to SW violations is an unexpected result based on prior work on explicit and implicit linguistic metric processing. Few studies of linguistic meter have reported consistent differences in components this early, though one study demonstrated a significant negativity between 100–320 ms in response to an inappropriate stressed syllable [46]. However, negativities in the 100–200 ms time window have been widely observed in response to metric violations in musical studies. This effect, termed the metric mismatch negativity (MMN), has been observed when a strong tone occurs at an unpredicted temporal location (i.e., when a weak tone is predicted) [63–65]. This situation is analogous to the circumstance under which we observed the early negativity in the current study, such that a strong beat at a predicted weak time elicits the early negativity (SW violation), whereas a weak beat at a predicted strong time does not (WS violation). Importantly, as this effect was detected based on a marginal interaction of metric consistency with electrode position and this is the first study we are aware of to report this early negativity in response to an implicit strong beat occurring at a predicted weak time, additional experiments will be required to determine the reliability and meaning of this component.

The negativity between 325–375 ms observed for SW violations is consistent with results from previous investigations of both explicit and implicit violations of metric structure. Specifically, previous studies have demonstrated that SW metric violations result in a negative deflection in the 250–500 ms range relative to metrically consistent targets [36,41,44,45,49]. Moreover, a similar effect has also been shown in a small set of studies investigating metric structure in silent reading of single words [53,55]. The current study extends this finding to silent reading of metric violations in sentence contexts using orthographically identical items across all conditions. The observation of a positivity for WS

violations from 365–435 ms after word onset is also consistent with both prior listening and reading studies. Two prior studies of explicit metric violations [40,42] and one prior study of implicit metric processing [54] have observed positive deflections for consistent WS targets relative to inconsistent WS targets. Our results therefore suggest that prior findings of different responses to SW and WS violations are not simply due to idiosyncratic differences between the SW and WS target items chosen for these prior experiments, but do indeed reflect the activation of abstract metric representations during silent reading.

The different results observed across multiple studies for SW vs. WS violations may be due to differences in the underlying phonological structure of the target words. According to [17], the trochaic foot (SW) is the default phonological structure in Germanic languages, including English. This phonological constraint is realized in the lexical stress patterns of words, such that most two-syllable words begin with a stressed syllable (85–90% of the time in English [34]; 73% of the time in German [66]). This asymmetry means that accessing a SW (trochaic) representation of a target is globally easier than accessing a WS (iambic) representation, irrespective of the context in which the target occurs. Therefore, the lexical representation of a SW target is harder to access when its stress pattern conflicts with the local metric context, than when its stress pattern is consistent with the local context. Conversely, resolving WS violations is more challenging for readers, due to conflicting cues in both the local environment and the global environment.

Under this view of phonological asymmetry, the negativity observed between 325–375 ms for SW violations in the current study, and in a similar time window in other studies, could be related to the N400, which reflects the ease with which lexical access is achieved. The negativity for SW violations could reflect either additional lexical processing due to the added difficulty of accessing the appropriate lexical content in the presence of lexical stress mismatch, or lexical repair processes due to automatic activation of the metrically consistent, but semantically inconsistent, alternate form of the noun/verb homograph. However, it is important to note that this interpretation of the negativity as indexing lexical processing is challenged by previous work exploring simultaneous violations of metric and semantic structure, in which the latency and distribution of the negativity differs across violation types [39,40], as well as evidence that metrically inconsistent pseudowords also elicit such negativities, even though they lack semantic context [45]. Alternatively, it could be that the negativity we observed in the current experiment indicates the violation of a consistent, rule-based sequence, in this case realized as the metric structure [45].

In contrast, the positivity observed between 365–435 ms for WS violations in the current study, and in a similar time window in other studies, could be related to conflict processing. When a WS violation occurs, the reader must resolve the conflict between a metric context which leads them to predict a SW target and a semantic context which leads them to predict a WS target. In addition, there is the added conflict that WS two-syllable words are phonologically marked in the language. These factors together may lead to the observed positivity, which is signaling an error in processing that is harder for readers to recover from. This interpretation is consistent with previous ERP research of the metric structure in German, where metric violations in three-syllable words that did not violate metric foot structure led to an early negativity, whereas violations that also conflicted with foot structure resulted in an early positivity [36], similar to the results in the current study.

Consistent with other explicit and implicit metric processing studies that do not involve an explicit metric task, we did not observe evidence of a late positivity for metric violations relative to consistent metric conditions. Previous studies of both explicit and implicit metric processing demonstrate that late positivities in response to metric violation are most likely observed when the participant's task is to assess the metric structure. Indeed, only one previous study of implicit metric processing observed a late positivity in response to metric violations [54] while two others did not [53,55], and none of these studies required an explicit metric judgment. This interpretation is in line with previous work showing a dissociation between early and late ERP effects of syntactic violations, where early negativities are thought to reflect automatic processing and late positivities are thought to reflect controlled processes of repair [67,68] and the difficulty of the required repair process [69]. The current results suggest that although both implicit and explicit metric violations are automatically detected, as evidenced by early (<500 ms) waveform differences, only violations that rise to the level of awareness give rise to a late positivity.

It is also possible that the lack of a late positivity in the current study reflects a lack of power; our choice to present the same orthographic information across conditions meant that the number of items in the experiment was limited by the number of two-syllable stress-alternating noun–verb homographs in English that were known to our participants and could be embedded in rhyming couplets. Moreover, compared to previous studies using word lists, the stress pattern of the target in the current study was locally ambiguous, and only disambiguated by the implicit metric structure provided by the context. Although this manipulation is a better test of the abstract metric structure compared to other studies that used different items across SW and WS conditions, it produces a less clearly defined metric violation than paradigms employing single target words with unambiguous stress patterns.

Although the current results are generally consistent with prior ERP work on explicit and implicit linguistic metric structures, they are inconsistent with results observed in a previous eye-tracking experiment using the same materials. Recall that Breen and Clifton observed inflated reading times only for WS violations, and not for SW violations [13]; moreover, these effects were observed only in relatively later reading time measures. Conversely, our results demonstrate significant early ERP differences for both SW and WS violations, though they differ in polarity, timing, and topography. These differential effects are likely due to differences in the temporal control of stimulus presentation between the studies. In Breen and Clifton's experiment, participants read normally at their own pace, meaning they could take as much time as needed to process material in advance of the critical word, and could look back to prior sentence material to resolve difficulty generated at the target word. In contrast, materials in the current study were presented in a region-by-region segmented fashion, giving participants less time to generate predictions about upcoming material, and disallowing regressions. Moreover, the fact that the current materials were presented in a time-controlled manner means that the metric structure of the sentence materials was more obvious for readers, making the metric inconsistency more explicit, resulting in significant ERP effects of both types of metric violations.

Future work could directly investigate the role of temporal stimulus control on implicit metric violation processing by replicating the current paradigm using simultaneous collection of eye-tracking data and ERPs, a method which has already been used to successfully adjudicate debates about linguistic processing in eye movements [70,71]. In this way, the role of metric inconsistency in silent reading could be assessed without explicitly controlling the timing of materials. Additionally, while current results demonstrate that readers engage in implicit prosody during silent reading of poetry, it is an open question to what extent these findings generalize to normal reading. The couplets used in the current study were designed to have strict metric and rhyming structure, which is rare in non-poetic language. However, our study does provide an insight to the role of meter in implicit prosody. To determine whether our result can be replicated in non-poetic contexts which do not have concomitantly high metrical expectancies, future work will explore differences in brain activity in response to metric violations in silently-read prose sentences.

#### **5. Conclusions**

The current results provide further evidence of an intimate link between metric processing during listening and metric processing during silent reading, which may help inform our understanding of previously described relationships between children's sensitivity to an auditory metric structure and silent reading comprehension. For example, the ability of older children to track a perceived metric structure predicts phonological awareness and reading outcomes [72,73], and children's ability to detect a mis-stressed word predicts phonological awareness and word knowledge [74]. It may be the case that these reading abilities are facilitated by implicit metric structure representations. This claim is further bolstered by a relationship between prosodic fluency and reading comprehension in high school students—those who demonstrate higher prosodic fluency also showed an increased comprehension ability [75,76]. Research about implicit prosody and the underlying neurocognitive processes occurring during silent reading may, therefore, inform future work designing prosodic interventions to improve children's reading comprehension abilities.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2076-3425/9/8/192/s1, Table S1: Distribution of stress patterns across segments for experimental and filler couplets, Figure S1: Moving window cluster-based permutation test results.

**Author Contributions:** Conceptualization, M.B.; methodology, M.B., A.B.F.; software, M.B., A.B.F.; validation, M.B., A.B.F.; formal analysis, M.B., A.B.F., M.O.A.; investigation, M.O.A.; resources, M.B.; data curation, M.B., A.B.F., M.O.A.; writing—original draft preparation, M.B.; writing—review and editing, A.B.F., M.O.A.; visualization, M.B., A.B.F., M.O.A.; supervision, M.B.; project administration, M.B., A.B.F.; funding acquisition, M.B., M.O.A.

**Funding:** This research was funded in part by a James S. McDonnell Foundation Scholar Award in Understanding Human Cognition to author M.B. and a Harap Scholarship Fund Award to author M.O.A.

**Acknowledgments:** The authors would like to acknowledge the assistance of Charles Clifton with materials development, Elizabeth Brija with experimental coding, and Xuefei Chen, Hannah Galloway, Margaret Golder, Johanna Kneifel, Priscilla Lopez, and Corrin Moss with data collection for the paper.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A**

1a. SW/consistent

You must hear/my story,/your highness. I have the/young princess's/address

1b. SW/inconsistent

My workspace/is such a/big mess. I lost an/important/address

1c. WS/consistent

My workspace/is such a/big mess. My clutter/I have to/address

1d. WS/inconsistent

You must hear/my story,/your highness. Your habits I/find I must/address

2a. SW/consistent

The guy who/got lost on/a flyby Dropped all of/his bombs on an/ally

2b. SW/inconsistent

I know a/young woman/from Rye, Who'd make such/a lovely/ally

2c. WS/consistent

I know a/young woman/from Rye, With whom I/would like to/ally

The guy who/got lost on/a flyby Killed folks with whom/we want to/ally

3a. SW/consistent

I just saw/a dog and/a tomcat, Engaged in/some furious/combat

3b. SW/inconsistent

I witnessed/a dog and/a cat, Engaged in/some angry/combat

3c. WS/consistent

I witnessed/a dog and/a cat, Who seemingly/tried to/combat

3d. WS/inconsistent

I just saw/a dog and/a tomcat, That we must/be ready to/combat

4a. SW/consistent

I heard someone/say through/the grapevine: The farmer is/driving his/combine

4b. SW/inconsistent

The farmer/got caught/drinking wine, Then harvesting/in his/combine

4c. WS/consistent

The farmer/got caught/drinking wine, And shotguns/and booze don't/combine

4d. WS/inconsistent

I heard someone/say through/the grapevine: That farmer is/hoping to/combine

5a. SW/consistent

I processed/some prints in/the darkroom Of people I'd/met on a/commune

5b. SW/inconsistent

I know some/who worship/the moon, And live/in a hippie/commune

5c. WS/consistent

I know some/who worship/the moon, With nature/they like to/commune

I processed/some prints in/the darkroom Of folks who/just wanted to/commune

6a. SW/consistent

If out in/the mountains/you backpack, Your team must/agree to this/compact

6b. SW/inconsistent

Before you/head out with/that pack, Your team has/to sign this/compact

#### 6c. WS/consistent

Before you/head out with/that pack, Be sure that/your gear is/compact

#### 6d. WS/inconsistent

If out in/the mountains/you backpack, Your gear must/be basic and/compact

#### 7a. SW/consistent

The crew worked/so hard for/their paychecks They thought they'd/develop a/complex

7b. SW/inconsistent

There once was/a young man/named Rex Who owned/an apartment/complex

#### 7c. WS/consistent

There once was/a young man/named Rex Whose theories/were big and/complex

#### 7d. WS/inconsistent

The crew worked/so hard for/their paychecks Their work was/so terribly/complex

#### 8a. SW/consistent

We stayed in/the woods at/a campground, Which wasn't too/far from a/compound

8b. SW/inconsistent

We got that/old dog at/the pound He came from/a private/compound

#### 8c. WS/consistent

We stayed in/the woods at/a campground, Our pleasure in/nature to/compound

We got that/old dog at/the pound Our sadness/will surely/compound

9a. SW/consistent

There was a/young heroin/addict, Who ended up/causing a/conflict

9b. SW/inconsistent

My parents/are being/quite strict. Our views are/in open/conflict

9c. WS/consistent

My parents/are being/quite strict. Their wishes/and mine do/conflict

9d. WS/inconsistent

There was a/young heroin/addict, Whose habits and/others did/conflict

10a. SW/consistent

The athlete/who just failed/a drugtest, Will soon face/a challenging/contest

10b. SW/inconsistent

The athlete/who thinks he's/the best Just lost an/important/contest

10c. WS/consistent

The athlete/who thinks he's/the best Holds titles/that others/contest

10d. WS/inconsistent

The athlete/who just failed/a drugtest, Is planning the/charges to/contest

11a. SW/consistent

Although that/young man is/an addict, He really should/not be a/convict

11b. SW/inconsistent

I think that/the judge was/too strict In sentencing/that young/convict

11c. WS/consistent

I think that/the judge was/too strict, The jury too/quick to/convict

Although that/young man is/an addict, I think that the/judge shouldn't/convict

12a. SW/consistent

That man applies/way too much/hair grease. A friend should/suggest a big/decrease

12b. SW/inconsistent

Forgive me/for stating/my peace, But you must/commence a/decrease

12c. WS/consistent

Forgive me/for stating/my peace, Your appetite/you must/decrease

12d. WS/inconsistent

That man applies/way too much/hair grease. I think the/amount he should/decrease

13a. SW/consistent

The Soviet/spy is a/suspect. The case has/but one major/defect

13b. SW/inconsistent

The Soviet/spy they/suspect, Has plans with/a major/defect

13c. WS/consistent

The Soviet/spy they/suspect, Is planning/quite soon to/defect

13d. WS/inconsistent

The Soviet/spy is/a suspect. I heard that/he's planning to/defect

14a. SW/consistent

In nothing/but jeans and/a t-shirt, That man took/a trip 'cross the/desert

14b. SW/inconsistent

The fighting/he tried/to avert, By running off/through the/desert

14c. WS/consistent

The fighting/he tried to/avert, By choosing/his squad to/desert

In nothing but/jeans and a/t-shirt, A soldier his/squad chose to/desert

15a. SW/consistent

I know of/an elegant/female Her outfits lack/no fashion/detail

15b. SW/inconsistent

There once was/a woman/named Gail Whose fashion/had every/detail

15c. WS/consistent

I know of/an elegant/female Who wanted/her auto to/detail

15d. WS/inconsistent

There once was/a woman/named Gail Who wanted/her car to/detail

16a. SW/consistent

We once had/a tiresome/house guest, Who loved to/read Birdwatcher's/Digest

16b. SW/inconsistent

We once had/a friend as/a guest, Who loved to/skim Reader's/Digest

16c. WS/consistent

We once had/a friend as/a guest, Whose cooking/we could not/digest

16d. WS/inconsistent

We once had/a tiresome/house guest, Whose humor/was painful to/digest

17a. SW/consistent

The gymnast/requested/a recount Her score, she/thought, rated no/discount

17b. SW/inconsistent

He could not/afford the/amount, And asked for/a modest/discount

17c. WS/consistent

He could not/afford the/amount. The invoice they/would not/discount

The gymnast/requested/a recount She thought it/was wrongful to/discount

18a. SW/consistent

In order to/prove your/attendance You'll have to/check in at the/entrance

18b. SW/inconsistent

This gorgeous/young woman/from France Made everyone/jam the/entrance

18c. WS/consistent

This gorgeous/young woman/from France Would often/the young men/entrance

18d. WS/inconsistent

There was a/young woman/whose nude dance Would always/the gentlemen/entrance

19a. SW/consistent

He tried not/to get badly/sidetracked. He needed/some raspberry/extract

19b. SW/inconsistent

The recipe/seemed quite/exact. It called for/some almond/extract

19c. WS/consistent

The recipe/seemed quite/exact. Some essence/you had to/extract

19d. WS/inconsistent

He tried not/to get badly/sidetracked Some essence/he wanted to/extract

20a. SW/consistent

The city/must safeguard/the seaports, To save us/from dangerous/imports

20b. SW/inconsistent

The panel/is set to/report On how much/we pay for/imports

20c. WS/consistent

The panel/is set to/report On how much/the city/imports

The city/must safeguard/the seaports, Because of how/much it now/imports

21a. SW/consistent

The man who/asked you for/a consult Was given/a horrible/insult

21b. WS/consistent

That woman/who likes the/occult, Is very/unsafe to/insult

21c. WS/inconsistent

The man who/asked you for/a consult Is no-one/you wanted to/insult

21d. SW/inconsistent

That woman/who likes/the occult, Will tolerate/no more/insults

22a. SW/consistent

The teacher/assigned them/a project To find an/unusual/object

22b. SW/inconsistent

The winners/will get to/select A shiny/expensive/object

22c. WS/consistent

The mayor/that we might/elect Has views to/which others/object

22d. WS/inconsistent

The teacher/assigned them/a project That forced many/parents to/object

23a. SW/consistent

There once was/an old man/named Kermit, Who hunted/without any/permit

23b. SW/inconsistent

There once was/an old man/named Britt Who hunted/without a/permit

23c. WS/consistent

There once was/an old man/named Britt Whose vices/no wife could/permit

There once was/an old man/named Kermit Whose gambling his/wife would not/permit

24a. SW/consistent

I know of/an old man/named Herbert, Who's known around/town as a/pervert

24b. SW/inconsistent

The nun/did her best/to convert A man whom/they call a/pervert

24c. WS/consistent

That nun/did her best/to convert Young kids who/the truth do/pervert

24d. WS/inconsistent

I know of/an old man/named Herbert Who always the/truth tries to/pervert

25a. SW/consistent

There once was/a penniless/peasant, Who couldn't/afford a nice/present

25b. SW/inconsistent

There once was/a clever/young gent, Who bought for/his girl a/present

25c. WS/consistent

There once was/a clever/young gent, Who had a/nice talk to/present

25d. WS/inconsistent

There once was/a penniless/peasant, Who went to/his master to/present

26a. SW/consistent

He couldn't/hide all of/his misdeeds, But made off/with all of the/proceeds

26b. SW/inconsistent

In light/of the man's/dirty deeds, He won't/receive any/proceeds

26c. WS/consistent

In light/of the man's/dirty deeds, On Monday/his trial/proceeds

He couldn't/hide all of/his misdeeds On Monday/his retrial/proceeds

27a. SW/consistent

There once was/a crusty old/recluse, Who grew the/most wonderful/produce

27b. SW/inconsistent

There simply/is no good/excuse For failing to/eat your/produce

27c. WS/consistent

There simply/is no good/excuse For failing/to work and/produce

27d. WS/inconsistent

There once was/a crusty/old recluse, Whose garden great/harvests would/produce

28a. SW/consistent

With all of/their time spent/at recess, The children/make no forward/progress

28b. SW/inconsistent

The efforts/at peace,/I confess, Are making/no forward/progress

28c. WS/consistent

The efforts/at peace,/I confess, Will simply/no longer/progress

28d. WS/inconsistent

With all of/their time spent/at recess, The children will/soon fail to/progress

29a. SW/consistent

I noticed/a ruinous/defect In part of/the candidate's/project

29b. SW/inconsistent

The man we/will likely/elect Endorses/this wacky/project

29c. WS/consistent

The mayor that/folks will/elect According/to what polls/project

I noticed/a ruinous/defect In what that/new candidate/projects

30a. SW/consistent

There once was/a young man/named Ernest, Who sponsored/a violent/protest

30b. SW/inconsistent

They put the/man under/arrest For leading/an angry/protest

30c. WS/consistent

They put the/man under/arrest, And gave him/no time to/protest

30d. WS/inconsistent

There once was/a young man/named Ernest, Who rounded up/people to/protest

31a. SW/consistent

In a voice/that was piercing/and treble, The serfs were/inspired by a/rebel

31b. SW/inconsistent

The infantry/failed to/repel The followers/of the/rebel

31c. WS/consistent

The infantry/failed to/repel The fighters/who want to/rebel

31d. WS/inconsistent

In a voice/that was piercing/and treble, The leader urged/peasants to/rebel

32a. SW/consistent

That basketball/star's like a/bloodhound. He seeks out/and catches each/rebound

32b. SW/inconsistent

The basketball/star turned/around, and caught an/amazing/rebound

32c. WS/consistent

The basketball/star turned/around, And watched for/the shot to/rebound

That basketball/star's like a/bloodhound. He waits for/each jumpshot to/rebound

33a. SW/consistent

I met an/old friend who/played baseball, Who warned of/a new safety/recall

33b. SW/inconsistent

I met an/old friend at/the mall, Who warned of/a safety/recall

33c. WS/consistent

I met an/old friend at/the mall, Whose name I/just could not/recall

33d. WS/inconsistent

I met an/old friend who/played baseball, But what his/name was I can't/recall

34a. SW/consistent

There once was/a young man/named Eckerd Who broke an old/pole-vaulting/record

34b. SW/inconsistent

The athlete/won quite an/award For breaking/the scoring/record

34c. WS/consistent

The athlete/won quite an/award The cameras/were there to/record

34d. WS/inconsistent

There once was/a young man/named Eckerd Whose pole-vaulting/feats they did/record

35a. SW/consistent

Last year I/created a/stock fund. And managed to/get a big/refund

35b. SW/inconsistent

I have to/admit I/am stunned, You didn't/give me my/refund

35c. WS/consistent

I have to/admit I/am stunned, My payments/you will not/refund

Last year I/created a/stock fund. The fees they/would happily/refund

36a. SW/consistent

The judges must/all watch/the replay To find out which/team won the/relay

36b. SW/inconsistent

A messenger/came by/today To find out/who won the/relay

36c. WS/consistent

A messenger/came by/today; A message/he had to/relay

36d. WS/inconsistent

The judges must/all watch/the replay. Results to the/coach they will/relay

37a. SW/consistent

I read an/unusual/essay 'Bout how they/conducted a/survey

37b. SW/inconsistent

A lovely/young woman/named Fay Was asked to/complete a/survey

37c. WS/consistent

A lovely/young woman/named Fay The future/she liked to/survey

37d. WS/inconsistent

I read an/unusual/essay Describing how/folks tried to/survey

38a. SW/consistent

The cops are/an interesting/subject They bullied their/most recent/suspect

38b. SW/inconsistent

The cops/didn't try/to protect A recently/collared/suspect

38c. WS/consistent

The cops/didn't try/to protect The people/they chose to/suspect

The cops are/an interesting/subject They bully/the people they/suspect

39a. SW/consistent

A striking young/woman named/Rembrandt, From Portugal,/she was a/transplant

39b. SW/inconsistent

A striking young/dame named/van Zandt, From Spain was/a recent/transplant

39c. WS/consistent

A striking young/dame named/van Zandt Had roses/she hoped to/transplant

39d. WS/inconsistent

A striking young/woman named/Rembrandt, Had roses she/wanted to/transplant

40a. SW/consistent

To get to/the local gym's/squash court, You must take/municipal/transport

40b. SW/inconsistent

The mafia/tried to/extort The captain/of public/transport

40c. WS/consistent

The mafia/tried to/extort A man who/had tried to/transport

40d. WS/inconsistent

To get to/the local/gym's squash court, Your gear should/be ready to/transport

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
