**4. Methodology**

As we reviewed the literature on both of these approaches, we employed the following methodology. We first identified leading search engines used for academic literature reviews: Google Scholar (scholar.google.com); ResearchGate (researchgate.net); and VirtualLRC (cse.google.com). We then used the following search terms to locate relevant articles: assessing expressive oral reading; NAEP fluency 2002; Multidimesional Fluency Scale; Comprehensive Oral Reading Fluency Scale (CORFS).

We focused on articles that appeared in the results of searches conducted in at least two of the search engines. We then focused on studies that appeared consistently in the reference lists of these articles. We also paid close attention to studies that had been conducted in international settings to indicate the widespread use of these assessment measures.

#### **5. Automated Assessment**

One method to gauge an individual's expressive oral reading is to examine recordings of specific features of speech [11]. After recordings of an individual's reading have been made, the contours of their speech are analyzed using computer software programs that depict them visually. The software that is commonly used for this is Praat, developed in 2001 by Dutch researchers, Paul Boersma and David Weenink [19,20]. It has undergone several revisions and remains the most important tool for analyzing speech because it is easily available and user friendly.

Praat has been used by linguists in the field of phonetics to study specific features of speech to understand the sound patterns of normal English [21]. The Praat software has also been used to teach those learning English as a foreign language (EFL) to understand prosodic features of English [22]. As individuals become more aware of the sounds of English, they can practice in more focused ways and use the language in ways that sound more like native English speakers. In addition, Praat software has been used to assist those who have been a ffected by vocal cord paralysis. By examining their speech patterns as they try to improve their speech production, patients can learn to more closely match the pitch and pause patterns in English [23].

The most common automated assessment in education settings provides spectrographic measurements of speech. Analyses of the graphic displays of oral reading can highlight specific elements. The two most common aspects of oral reading that have been examined using this software are pauses and pitch [24]. Various elements of both have been measured using Praat software to provide greater understanding of readers' prosodic practices.

When analyzing pauses in recorded oral reading, examiners consider the ratio of actual and grammatically-expected pauses within sentences. This analysis can determine if a reader's pauses are expected and appropriate, or if they tend to be ungrammatical and indicate unjustified pausing practices [24]. The more closely a reader's actual number of pauses match those expected by the grammar of the text, the more reliably researchers can judge whether or not the child is reading with appropriate phrasing.

When evaluating pitch in oral reading, examiners consider how readers raise and drop their voices. Effective readers are more emphatic in pitch variation than struggling readers [25]. The magnitude of the decrease in pitch during reading is measured to determine whether the declination is appropriate. Measuring pitch of oral reading also examines the general up and down pitch swings in a reader's voice. These variations in pitch are generally considered to be equated with appropriate expressive reading. When such variation of pitch is not present in a child's oral reading, the reading usually comes across as flat and monotone.

Along with pausing and pitch, stress is another property of prosody. However, stress is difficult to isolate and measure because it includes broader concepts, including pitch, duration, and intensity [26–28]. When teachers focus attention on pitch, issues related to stress will generally improve as well [29].

Nationally and internationally, researchers have used automated assessment of prosody. Researchers in Spain [30] asked 103 third- through sixth-grade students to orally read four expository texts and answer comprehension questions. Using Praat software, they measured typical aspects of prosody and found that children with lower levels of reading comprehension made more inappropriate pauses and unacceptable levels and durations of pitch compared to more able readers.

Ardoin et al. examined the role of repeated readings and wide reading in improving multiple dimensions of reading, including fluency. They asked 168 second graders to practice reading four times each week over a nine-week period. Using the Praat software, they found that both repeated reading and wide reading were effective in improving reading fluency, which in turn affected other reading behaviors, including expressive oral reading. The second graders in this study improved in both pitch and pause scores [31].

Researchers have examined pitch and pause durations and changes during oral reading to measure prosodic reading of adult readers. They examined these aspects of expressive reading in relation to adult readers' scores on tests of decoding, word identification, and comprehension. For those with limited reading skills, patterns of pausing accounted for a significant amount of variance in comprehension scores [32].

#### **6. Human Assessment**

In addition to automated, spectrographic measurements of expressive oral reading, educators can instead use rating scales to judge quality of prosody. Rubrics establish criteria for human judgment of acceptable performance of specific tasks. They are commonly used to in classroom settings to systematically evaluate student's abilities and behaviors, especially with processes that are not easily measured in other ways. A performance can be designed to measure a student's ability, knowledge, and skills. For example, a student may be asked to demonstrate some physical or artistic achievement, play a musical instrument, create or critique a work of art, or improvise a dance or a scene. These kinds of performances, tasks, projects, and portfolios can be scored using rubrics. Rubrics allow researchers and teachers to clarify components of a skill and permit them to make judgments about what students know and can do in relation to specific objectives. Observers can use rubrics to judge the degree, frequency, or range of student behaviors and understand the degree to which a student has mastered a skill [33].

Some rubrics provide for holistic evaluation. Using a global approach, a set of interrelated tasks is identified that contributes to the whole. Using this type of rubric, a teacher or researcher can evaluate quickly and e fficiently to provide an overall impression of ability. However, holistic evaluations do not provide the detail available in analytic approaches.

Analytic rubrics break down a final product into component parts, and each part is scored separately. The total score of a student's performance is the sum of all parts. Each component can be evaluated and provide teachers with specific information about strengths and weaknesses that can guide instructional choices to help students improve.

Whether holistic or analytic rubrics are used, several significant issues need to be addressed. Raters' understanding of the scoring task and their ability to score observed behaviors in consistent ways are essential when making judgments about student performance. Consistency within an individual evaluator is also important—this is, does the rater score the performance in a similar manner on more than one occasion? Also, how many raters are required for confidence in scores? How similar are raters' scores on the same performances? Are their ratings similar on di fferent occasions? Di fferences in scores may also be related to the task at hand. For example, the passages students read aloud may influence their abilities to perform well.

Currently, the two most commonly-used rating rubrics are the NAEP and the MDFS. However, another rating instrument, CORFS, has also been developed recently. The rating method created and used by the National Assessment of Educational Progress (NAEP) is a holistic measure [14,34]. The Multidimensional Fluency Scale (MDFS) is an analytic approach that measures four dimensions of expressive oral reading [35]. The Comprehensive Oral Reading Fluency Scale (CORFS) uses two factors to measure prosodic reading analytically [24].
