EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms

Rainey, Luke; Farran, Dale Clark; Durkin, Kelley

doi:10.3390/educsci14101039

Open AccessArticle

EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms

by

Luke Rainey

^*,

Dale Clark Farran

and

Kelley Durkin

Peabody College, Vanderbilt University, Nashville, TN 37203, USA

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2024, 14(10), 1039; https://doi.org/10.3390/educsci14101039

Submission received: 18 June 2024 / Revised: 29 August 2024 / Accepted: 18 September 2024 / Published: 24 September 2024

(This article belongs to the Special Issue Teaching Quality, Teaching Effectiveness, and Teacher Assessment)

Download Review Reports Versions Notes

Abstract

This article explores the development of a new observation research tool called the EMC-PK2, designed to capture coherent mathematics teaching and learning practices in preschool through second-grade classrooms. There is widespread interest in improving early math instruction and moving from traditional didactic instructional methods to a more problem-solving approach. However, there are few observational tools appropriate for research supporting high-quality mathematics teaching and learning practices that can inform what is happening during math lessons and that are appropriate across preschool and elementary school environments. This tool was developed to try and meet that need. It was piloted and first used in a longitudinal study in two large U.S. public school districts, across Pre-K through second grade. Analysis of the observational data offers insights into the psychometrics of the tool, showing reliable use and capturing several key dimensions of practice: at the activity level, teacher facilitation and student engagement; and at the observation level, differentiation and classroom environment. Although costly in both time and resources to implement at a large scale, the EMC-PK2 can offer much-needed understanding for researchers concerned with early math teaching and learning.

Keywords:

early math; classroom observation; teaching; coherence; instrument validation

1. Introduction

Early childhood is a pivotal time for mathematics learning. Mathematics skills develop cumulatively, more quickly than in later grades [1], and are consistent predictors of later mathematics achievement [2,3,4]. However, there are persistent inequalities in demonstrated math knowledge between students from historically advantaged and disadvantaged groups in the United States [5,6]. Overall, students in the U.S. have consistently performed significantly below the international average in mathematics achievement tests [7].

There is a growing body of research that suggests more attention should be paid to the lack of coherent instructional policies and practices in the early grades that support children’s math learning [8,9,10,11]. Implementing math instruction that is high-quality has been difficult for a variety of reasons: different curricula, different instructional strategies, and repeat material that students already know [12,13]. In most states in the U.S., elementary teachers can be licensed without passing the math subsections of their licensing exams [14]. This disconnect can lead to inconsistent instructional practices that negatively impact student learning [15].

Many school leaders have considered coherent high-quality math instruction from a policy alignment perspective, resulting in what are often known as Pre-K-3 initiatives. This approach often focuses on ways to align standards, curriculum, assessment, and professional development from Pre-K to grade 3, hoping it will result in better learning experiences for children [9,16,17,18,19]. A common assumption is that these policy alignments will automatically improve instruction in the classrooms as children move through the early elementary grades, but there is limited evidence to support that supposition.

Another common policy approach to creating higher-quality math instruction involves within- and across-grade alignment using a particular curriculum or broad learning goals. In some cases, this involves adopting the same math curriculum across multiple grades. It may also involve a shared set of instructional strategies or a broad shared goal across grades [8,20,21]. However, curricular coherence alone may not result in changes in instructional practices, and children may still experience incoherent learning experiences between different classrooms, grades, and schools. Teachers may still need professional support to adjust their well-established ways of teaching and time to understand and incorporate the new instructional strategies into their practice. Rarely do efforts toward curricular coherence connect across Pre-K and elementary environments, as preschools are often managed by early learning departments rather than elementary schools and are held to different expectations. The systems that underpin instruction can be deeply rooted in local norms and regulations and held in place by interdependent structural mechanisms, making instructional change difficult [22].

It is understandable why policy makers might be drawn to adopting broad institutional strategies rather than instructional ones. It is easier to mandate those types of top-down changes within a school or district compared to the challenges of defining, measuring, and improving math instruction in the classroom. For this reason, measurement of instructional quality is usually considered through proxies of teacher quality based on qualifications, such as degrees earned or years of experience. In some states and districts, teacher quality is directly tied to student performance on standardized measures. However, these proxies do not provide a nuanced picture of instructional quality as enacted and as taken up by students. Furthermore, they do not suggest which aspects and practices of instruction are important for student learning and therefore are worth investing resources in.

In this paper, we are guided by the following broad questions: What classroom practices define high-quality early math instruction in the early grades? How can these practices be captured through a valid observation tool? Using this observation tool, how does the quality of math instruction change across the early grades?

We will share our review of extant measures of early mathematics teaching and learning which led us to develop a new framework for observing early math across Pre-K through second grade. We will then describe and offer preliminary analyses of this new measure using an argument-based approach to validity. Finally, we will discuss the implications of this work and the future questions that it suggests for the field.

2. Defining High-Quality Math Instruction

For a deeper understanding of high-quality mathematics instruction in and across the early grades, it is important to understand the degree to which classrooms comprise mathematical situations and interactions that are connected in terms of mathematics [23], and in terms of the development of children’s mathematical understanding [6,24]. Students need coherent instruction to experience math content in ways that help them identify patterns in what they are learning and make connections between their prior knowledge and new information [25].

Effective math instruction also requires instructional practices that support conceptual understanding, not just fluency [26]. Direct instruction, the traditional form of math instruction in the U.S., often resembles a teacher leading the class through the lesson material by asking questions that can be answered with one-word responses. The teacher responds to whether the answers were correct or incorrect and moves on to the next question. New material is introduced through demonstration followed by independent work practicing the procedure presented. This is unlike the instructional form in many other countries. Instead of teachers simply stating or defining mathematical concepts, the concepts are developed and elaborated through examples [7]. In the last few decades, there has been a strong shift in the U.S. supported by researchers, policy makers, and educators to promote a perspective of mathematics teaching that engages students in more problem-solving and less didactic instruction [27,28,29].

Effective mathematics teaching focuses on problem-solving and reasoning rather than heavily focusing on computational skills [24,25,30]. Students are encouraged to think of themselves as mathematical thinkers with tasks promoting high-level thinking and discussions about students’ reasoning and strategies [24,31,32]. Teachers ask purposeful, open-ended questions to extend students’ thinking, explore their strategies, and help them make sense of important mathematical concepts and procedures [24,33]. Teachers help facilitate mathematical discourse between students [24]. Students have shown improved mathematics achievement when they have the opportunity to engage in detailed ways with another student’s ideas or have another student engage with their ideas [34,35].

In high-quality mathematics classrooms across grade levels, students engage in problem-solving, have opportunities to discuss their mathematical ideas, and productively use tools and manipulatives. Analyzing the teaching practices of 55 fourth- and fifth-grade teachers, Blazar and Pollard concluded the following:

These analyses point to benefits of teaching practices in two key areas. The first is active mathematics, in which teachers provide opportunities for hands-on participation, physical movement, or peer interaction. These activities overlap with ambitious teaching techniques that often make use of manipulatives and tactile activities in the service of building conceptual understanding [36] (p. 3).

Researchers in the Development and Research in Early Math Education (DREME) Network organized the COHERE project to study policies and practices related to high-quality, coherent math instruction from Pre-K through second grade. The COHERE team, comprising experts in teaching mathematics, childhood development, and psychology, identified three parameters, hereafter referred to as the “Coherence-3”, as key components of early math instruction:

Subject matter (or domain) coherence—the degree to which presentations of subject matter content accurately embody the discipline; for example, concepts, facts, relationships, and processes are in line with those of domain experts, clearly represented and/or explicated, and interconnected [23,28,37,38]. Notably, we focused on whether the content was appropriate for the grade level, using the Common Core State Standards and the California Pre-K Foundations [28,39]. In other words, were children consistently being moved along through the content levels that would prepare them for the demands of the next grade?
Psychological connections between the classroom environment, curriculum, and teaching and patterns of student thinking and learning in ways that support students’ meaningful engagement in the subject matter and continued development in understanding and skill [40,41,42].
Instructional quality and the degree to which teaching activities and strategies are consistent with research on effective instruction and tasks, for example, the extent to which teachers use instructional tasks and strategies that help students connect and relate different experiences, concepts, and representations of concepts [11,20,43,44,45].

3. Developing a Research Observation Tool for Early Math Instruction

To study the quality of instruction in early math classrooms, COHERE required a sensitive and reliable observation protocol to collect data about math instruction systematically across the early elementary grades.

There are elements of the DREME perspective that can be found in existing observational systems for measuring teacher effectiveness in upper elementary and middle-grade mathematics classrooms [46]. However, the field lacks a single widely used, reliable, and valid framework for studying instructional quality [47,48,49]. Bostic and colleagues found that many measures do not attend to levels of quality [47], focusing instead on the amount of some practice or the presence of an initiative, or they are based on self-report. Of the formal measures that attend to levels of quality, many simply provide an overall holistic score and do not provide insights into multiple aspects of quality.

However, Bell and Gitomer [49] suggest that there is room for context-specific observation systems and that no unified framework can exist that will be able to fully provide relevant insights about teaching for every research study. And in fact, they argue that the field of teaching research benefits from multiple observation measures which each expand our understanding of the complex nature of classroom instruction and how it might be improved. However, many agree that the field lacks an observational tool that can capture mathematics instruction in the very early grades, like kindergarten [26,46], and only one tool focused on mathematics instruction has been used in the increasingly prevalent pre-kindergarten classrooms as well as elementary classrooms [50].

In assessing high-quality mathematics instruction, COHERE was interested in both the ways in which teachers create learning opportunities for children and how students participate in those learning opportunities. This version of instructional quality is about measuring classroom norms and “taken-as-shared” practices and concepts for a given classroom [51]. In addition, we needed to address global and domain-specific aspects of the mathematics classroom. For example, global features included overall quality of the classroom environment, including structural elements and process features. Domain-specific features included the quality of interactions or practices specifically related to mathematics instruction.

To determine whether there existed an observation system that we could adopt which was math-specific, captured nuanced dimensions of instruction, and was appropriate for both preschool and elementary grades, or whether we needed to synthesize a new system, we conducted a review of formal observational measures of classroom quality.

Review of Common Classroom Observation Measures

This review identified 55 classroom observation measures and obtained complete samples of 38. The other measures were either not publicly available or required purchase. However, it was still possible to ascertain some information about them by reviewing publications. Information was collected on the following dimensions for each tool: the age or grade level in which it was used or designed for; whether it was math-specific or had a math-specific section; the measure’s required length of observation; whether it could be used “live” or relied on videotapes; the number of items; a description of the sections or subscales; whether it was rating- or narrative-focused; whether it was environment-, practice-, or interaction-focused; and whether it was student- or teacher-focused. In determining the affordances of these measures, they were cross-walked with the three elements of Coherence-3 (described earlier).

Of the observation measures reviewed, the Classroom Observation of Early Mathematics-Environment and Teaching (COEMET) and the Advanced Narrative were the two instruments that included the most applicable items to studying early math instructional coherence across preschool into elementary school [52,53]. Each system was designed for live observations (rather than video coding) and with Pre-K in mind, included both domain-specific math components and global components, and focused on the environment, instructional practices, and interactions, as well as both teacher and student behaviors. They supported a detailed look at the activity level but also included items about the overall lesson.

However, neither instrument on its own provided all the dimensions we believed were important to capturing coherence and high-quality mathematics instruction. For instance, we determined that to measure “subject matter coherence”, it was important to capture precise math content being observed that could be tied to grade level. We also found that the items related to “psychological connections” could be expanded to better capture child-level math engagement. We also believed that a measure would benefit from more “moderators” of things that could prevent or facilitate the Coherence-3 domains. We also tried to avoid Likert-like scaling so typical of many observation measures (e.g., “None, Some, A Lot” Or “Disagree, Somewhat Agree, Agree”). Instead, we developed criterion-referenced items with clear behavioral guidelines for each score point. Similar to the work of Agodini and colleagues [54], we developed several post-observation ratings of how smoothly the classroom ran and teacher tone.

From our review of many classroom observation measures, we decided to incorporate the framework of the COEMET and Narrative into a new measure that provided additional aspects of Coherence-3. The new observation measure was named the Early Mathematics Classroom—Pre-K through 2nd grade (EMC-PK2) Instrument [55]. Throughout its development, members of the DREME Network provided expert feedback for guidance about developing and refining the new measure.

4. EMC-PK2 Dimensions

The measure was divided into three main sections: the Intentional Math Activities (IMAs), Cover, and POST (Table 1).

4.1. Intentional Math Activities (IMAs)

An affordance of both the COEMET and the Narrative frameworks is their method for dividing lessons into activity segments: in COEMET, called “Specific Math Activities”, or SMAs, and in the Narrative, called “Episodes”. The COEMET’s activity segment, the SMA, is defined as “each observed substantive activity. That is, the activity must be set up and/or conducted intentionally by the teacher involving several interactions with one or more children or set up or conducted intentionally to develop mathematics knowledge” [52]. COEMET also categorizes SMAs as “Full” or “Mini” to differentiate between tasks with extended teacher involvement and those that are student-directed, respectively. In the Narrative measure, an Episode involves “coding the current episode until 75% of the students have become engaged in the new activity or content of instruction” [53].

While the breakout rules for activity segments are slightly different between the COEMET and Narrative, they each focus on the enacted behaviors of groups of students engaged in mathematical tasks lasting longer than a minute. This allows for analyzing the range of opportunities across an observation period for student involvement as determined by the task at hand and mediated by the instructional practices of the teacher. By excluding short math moments and focusing on coherent periods of time-on-task, observers have enough time to note and record both teacher and student behaviors and other nuanced details about the task as enacted.

The EMC-PK2 incorporated the affordances of both measures into a new activity record called “Intentional Math Activities”, or IMAs. IMAs used the continuity rules of the Narrative, beginning new activities when students transitioned to a new task. IMAs also incorporated the COEMET’s Full and Mini classifications, and for Full IMAs, raters coded whether the activity involved the lead teacher, teaching assistant, or other staff.

Since IMAs were created for each activity, they were versatile for many different lesson structures. This was important for capturing early childhood instructional environments, which can be quite active and involve many moving bodies around the classroom. This also made the tool appropriate for observations of different lengths. This aspect of the EMC-PK2 also differentiated it from other observational tools used in the early grades. For example, the MQI, working from videotapes, arbitrarily segments the lesson into 7.5 min sections to code [26].

IMAs were mostly consecutive but were sometimes concurrent, as in the case of multiple simultaneous center or rotation activities. Observers created a new IMA record for each concurrent activity with a separate content objective. For instance, an activity block with three rotating centers involving flashcard addition, geometry puzzles, and counting collections would result in three IMAs. The tool allowed observers to quickly create blank records with duplicate start times. Observers visited each concurrent center to record notes, and rotate among them, spending the most time in the activities with teacher involvement. Activities with significant teacher involvement became “Full” and required additional codes, so it was important that observers visited long enough to capture sufficient notes about the types of interactions occurring between the teacher and students. This aspect of our system also made it distinct in that the focus was enlarged from what the teacher was doing with students to what students might be doing in a math lesson independently or working together—important characteristics of early-grade instruction.

For each IMA record, whether Full or Mini, the observer entered notes in real time about the unfolding of the activity. They coded the type of activity (e.g., whole group, small group with and without a teacher leading, students working in pairs or working independently), math content involved, and observed student practices (described more below), and for Full IMAs, they rated the instructional quality. Rating scores were assigned on a 5-point behaviorally anchored, criterion-referenced scale ranging from 1 (describing the least rigorous practice) to 5 (describing the most rigorous practice). Five-point scales have produced satisfactory reliability results in past studies [56,57].

In the EMC-PK2, these instructional quality ratings measured what are sometimes referred to as “high-leverage practices” most likely to affect student mathematics learning [58]. These included teacher responsiveness, how teachers took up student errors, how teachers asked questions, whether teachers maintained cognitive demands, and how teachers adapted the task. There were also ratings of student participation and student engagement. While some of the IMA rating concepts were derived from ideas within the COEMET and Narrative, they were greatly adapted for this measure.

One important difference between the structure of the IMA ratings and COEMET’s SMA ratings was the IMA’s use of behaviorally anchored 5-point interval rating scales rather than simple Likert scales. For example, the IMA’s teacher questioning rating was anchored along the following dimensions:

1—The teacher asks factual recall questions or questions that require only a yes/no response OR the teacher does not ask students questions.
3—The teacher’s questions focus on student thinking but require only short responses that are either correct or incorrect.
5—The teacher’s questions are open-ended and afford opportunities for students to explain and expand upon their thinking.

Ratings of 2 and 4 would reflect practices in between those anchors. Behaviorally anchored rating scales helped observers use the tool reliably and helped prevent common issues with Likert scales such as “shifting standards” [59], “halo errors” [60], or central tendency bias [61].

Another major IMA innovation was its standards-aligned math content checklist. Existing math observation measures use broad domains to describe content being covered (e.g., “Counting and Cardinality” or “Shapes”). Capturing details about the math content progression across multiple grades requires more precise content items. For example, Counting and Cardinality content means quite different mathematical understandings across the early grade levels. Tying each item to a grade-level content standard provides an understanding of whether classrooms are attending to grade-level content or are focused on review.

To develop items for the math content progression, the EMC-PK2 used a decision tree. Observers first selected the broad math domain (e.g., geometry). Within the domain, there was a developmental progression of content items, each associated with a specific grade level aligned with the Common Core State Standards and the California Pre-K Foundations [28,39]. The content strand observed was selected (e.g., partition rectangles and circles into three equal shares). Every content item used during the IMA was selected (i.e., IMAs could cover multiple content strands). By categorizing content items by math domain rather than grade, there was the option to code whatever content was observed whether it was on, below, or above grade level. The grade-level appropriateness decisions were made later.

This section was carefully double-coded, and consensus was reached by research staff with expertise in early math standards, based on the observer’s IMA notes. The decision to review each IMA and reach a consensus was made because of the high number of possible content code items (94 possible items) and the variability of early math content knowledge among observers.

Another novel feature added to the EMC-PK2 was a student mathematical practice checklist, based on the NCTM Effective Mathematics Teaching Practices and the Common Core Standards for Mathematical Practice [24,28]. We found that existing measures included limited items capturing the mathematical nature of student engagement. We developed this checklist in an attempt to describe whether students were involved in rigorous mathematical practices. The checklist included five practices: student-to-student math talk, student verbal math reflection, students sharing steps to solve a math problem, students sharing math reasoning, and using math tools. Observers checked whether each practice occurred at least once during each IMA.

4.2. Cover and Post-Observation Rating Scale (POST)

In addition to IMA records, the EMC-PK2 included a cover sheet and post-observation rating scale (the POST) to capture global aspects of the lesson and summarize the entire observation. For each observation, a cover sheet recorded descriptive information about the classroom, such as teacher name and ID, grade level, date, school, number of adults, and number of children present.

The POST incorporated several relevant elements from the Narrative, COEMET’s Classroom Culture section, prior research evaluating the Tools of the Mind curriculum [62], and new items created for the current study. The POST included items about the teacher’s promotion of problem-solving; accommodating a range of abilities; communicating in multiple ways; the classroom environment being respectful and productive; classroom management; teacher tone; bringing math ideas into non-math situations; and having an overall, big idea to connect the lesson’s math activities. These items were also rated on behaviorally anchored 1 to 5 scales.

5. Conducting Observations

5.1. Training Observers

The EMC-PK2 required significant quality monitoring to ensure ongoing reliability and validity during data collection, especially as it occurred across a full school year. Data collectors were recruited from a pool of graduate students and former classroom teachers, all with significant experience in elementary classrooms, and some with specialized expertise in early math. Each observer undertook several hours of virtual training, introducing them to the functionality of the tool and its key concepts. They were provided practice written and recorded scenarios to code and given feedback based on a set of master codes. New observers then participated in field-based, in-classroom training with an experienced researcher, where live math lessons were coded by each and compared. These low-stakes training experiences provided valuable and authentic practice opportunities prior to data collection.

Following practice, experienced observers joined each observer-in-training as “anchors” as they co-observed at least two math lessons. Observers’ codes were compared with those of the anchor, and observers received a reliability score based on agreement with the anchor codes. Ratings were accepted within 1 point of agreement. Overall scores of 80% and above were considered ready for new observers to collect data. The final percent agreement reliability estimate across all grades in the COHERE study was 90.3%. Fifteen observers collected data across 4 years of the study.

During data collection, observations were subject to a rigorous data-checking process in real time. Observers self-checked and submitted their observations electronically the same day the observation was completed. A data checker would review the items and use the observation notes to identify any inconsistencies or omissions. The checker would follow up with questions for the observer who would address them before their next observation. Although highly demanding of both observers and data checkers, the timeliness and quick turnaround of this review process were intended to take advantage of the observer’s ability to recall and clarify details from that day’s lesson. The data checker would use the observer’s responses to their questions to make any necessary corrections or adjustments to the data record. Corrections would be made by the research team if the observer made an accidental mistake or if they could not justify their rating through their exchange with the data checker. The notes from those exchanges were added to the officiant data record, as were any decisions to correct codes.

5.2. Data Collection for the COHERE Project

It is common in Pre-K for teachers to embed math content across the day, instead of consolidating it into lesson blocks, as typical in elementary grades. The EMC-PK2 is adaptable for both types of math instruction. EMC-PK2 observations for Pre-K were scheduled for 3-hour blocks, and older elementary grades were scheduled for the length of a scheduled lesson, usually about an hour. Observers were trained to position themselves around the room as needed to see the math occurring and to listen to conversations. To support observer mobility, the EMC-PK2 was used on a tablet rather than a laptop.

We collected data with the EMC-PK2 system during the COHERE project. Two cohorts of students consented to participate across multiple study years, Pre-K through 2nd grade, resulting in observations of 106 classrooms across all study years. Observations were conducted each year across six schools in two public school districts in California. These districts were chosen because of their existing efforts to create Pre-K through elementary coherence in mathematics and willingness to have the coherence of the math instruction in their districts studied. The districts comprised between 50,000 and 70,000 students from Pre-K-12 and served majority non-white students. In both districts, most students qualified for free and reduced-price lunch. There were no significant differences overall between the districts on the dimensions captured by the EMC-PK2, so in the current paper, we report findings across districts. The classrooms were representative of their respective schools. For more details about the districts in this study, refer to Coburn et al. [63].

There were 14 Pre-K teachers, 6 transitional kindergarten teachers, 32 kindergarten teachers, 36 first-grade teachers, and 18 second-grade teachers. Transitional kindergarten, or TK, is a publicly funded program for young learners before kindergarten in California. Since our data collection, California has adopted a universal TK approach that effectively replaces Pre-K, but during our data collection, TK covered a smaller age range and existed alongside Pre-K.

Most classrooms were observed three times each year: once in the fall, once in the winter, and once in the spring for a total of 279 observations (see Table 2 for observation details). Having three observational sessions in a year is similar to the MET project and others [36,64]. Due to school closures during the COVID-19 pandemic, there were no spring observations in 2020, when students were in either 1st or 2nd grade.

At the end of each observation, data were electronically sent to the research team who cleaned and structured the data into IMA-level, observation-level, and teacher-level databases. For the observation-level files, IMA binary ratings (e.g., student practices) were aggregated as ratios out of the total number of IMAs. IMA ratings were averaged across the IMAs seen during the observation. For the teacher-level databases, the three (or in some cases, two) observations were averaged together across the year to create final teacher scores.

The findings from the larger COHERE study will be shared in separate papers, which are in progress. The focus of this paper is to provide information about the affordances of the EMC-PK2 for both data collection and analysis and share some preliminary findings related to the measure’s validity and reliability.

6. Argument-Based Validity

To validate the interpretation and use of the EMC-PK2, we used Kane’s Argument-Based Validity Framework and guidance from Bell et al. [65,66]. This work emphasizes that the claims made based on an instrument should be framed as an argument specifying the inferences and assumptions needed to get from item responses to interpretations of those data. There are four main stages of argument: scoring, generalization, extrapolation, and implications.

6.1. Scoring

Scoring refers to the construction of the individual items and ratings used in the tool and whether they generate fair, accurate, and reproducible scores [67].

6.1.1. Accurate and Consistent Application of Scoring

As mentioned previously, the data collection with the EMC-PK2 involved a rigorous training and data-checking process for observers, which led to high reliability of observers on the measure and high confidence in the quality and accuracy of the data collected and their scoring. Recall that the EMC-PK2 involves extensive field notes taken by the observer for each IMA to back up their ratings and provide additional context about the activity.

6.1.2. Bias-Free Scoring

To investigate whether any of the observation items that use a rating scale were affected by rater bias, interclass correlations (ICCs) were calculated for the IMA and POST ratings using covariance estimates from HLM models with observer as a random effect. Low ICCs indicate a low likelihood of rater bias as they suggest a rater was not consistently rating that item in the same way across all their observations and rather was attending to the variation expected in classrooms [68]. Overall, the ICCs were very low (see Table 3), indicating a low likelihood of rater bias. The item with the highest ICC was the IMA rating of student engagement (ICC = 0.234), but even this ICC was relatively low.

6.1.3. Appropriateness of Scoring

Across all four years of the COHERE study, correlations between items were calculated to determine which groupings of items might be candidates for subscales and how strongly they related to one another.

The strongest correlations between IMA items were (1) the ratings focused on teachers’ responding to students’ mathematical thinking, taking up students’ errors, asking questions, and maintaining cognitive demands; and (2) student participation and student engagement (see Table 4). An IMA item about adapting the task did not correlate strongly with any of the other items.

Correlations between POST items (rated at the end of each observation) were also calculated (Table 5). The strongest correlations were between (1) reinforcing problem-solving, accommodations, and communicating mathematics concepts in multiple ways, and (2) classroom environment, behavior management, and teacher tone. The items that related to the structure of the lesson did not correlate strongly with any of the other items.

6.2. Generalization

In Kane’s Argument-Based Validity Framework [65,67], generalization refers to using the data from the items in the EMC-PK2 to generate overall scores accurately representing the quality of the mathematics instruction observed. To assess its content validity, the EMC-PK2 measure was vetted by the larger DREME Network team of experts. This network included experts in early mathematics, classroom instruction, and developmental psychology. Multiple members of the network had previous experience developing and utilizing observation instruments at scale. These experts agreed that the items on the EMC-PK2 were important to high-quality mathematics instruction and that the quality of mathematics instruction could be represented by these items.

6.2.1. Factor Analyses

Exploratory factor analyses were conducted to explore the composition of IMA subscales following procedures similar to those of Agdoni et al. by conducting these factor analyses at the IMA level without nesting [54]. Principal Axis Factoring with Varimax rotation was implemented with pairwise deletion of missing cases required because the IMA rating related to utilizing students’ incorrect responses could be marked N/A if no student errors were observed. Rotation converged in three iterations. Two subscales emerged from the factor analyses: (1) Teacher Facilitation: this included the ratings focused on teachers’ responding to students’ mathematical thinking, taking up students’ errors, asking questions, and maintaining cognitive demands; and (2) Student Engagement: this included the ratings on student participation and student engagement (see Table 6). The IMA item about adapting the task did not load well onto any factor. A sensitivity check using listwise deletion for missing data was also performed with very similar results. Results from the rotated factor matrix for the pairwise deletion analysis are included here, and full results from both analyses are available upon request.

After confirming that the candidate items were conceptually related, Cronbach’s alpha was calculated for each grouping of ratings for the proposed subscales. Reliability on both subscales was good (Teacher Facilitation Cronbach’s α = 0.823, Student Engagement Cronbach’s α = 0.743).

Exploratory factor analyses were further used to inform the composition of POST subscales. The Principal Axis Factoring extraction method was initially attempted; however, extraction was unsuccessful, likely due to multicollinearity issues between the items on classroom environment and teacher tone. As a result, the Generalized Least Squares method was used instead. Varimax rotation was used to create the rotated factor matrix, which is included here. Rotation converged in three iterations. Full results are available upon request. Two subscales emerged from the factor analysis: (1) Differentiation: this included the ratings on reinforcing problem-solving, accommodations, and communicating mathematics concepts in multiple ways; (2) Classroom Atmostphere: this included the ratings on classroom environment, behavior management, and teacher tone (see Table 7). The items that related to the structure of the lesson did not load well onto any factor.

After confirming that the candidate items were conceptually related, Cronbach’s alpha was calculated for each grouping of ratings for the proposed subscales. Reliability on both subscales was good (Differentiation Cronbach’s α = 0.713, Classroom Atmosphere Cronbach’s α = 0.799).

Observers also tracked whether students engaged in certain practices related to math instruction during each IMA. As described earlier, these practices are not observed on a rating scale but are instead simply coded yes/no depending on whether the observer witnessed the practice in each IMA. There are five practices in total, but four that related to student discussion practices were combined into one item called Student Discussion Practices. Reliability on the Student Discussion Practices composite variable was low (Cronbach’s α = 0.437), but this was likely because these practices occurred so infrequently in most classrooms, and they could occur once during one IMA during an observation and then not occur again for the rest of the observation.

We believe that these analyses demonstrate the validity of the EMC-PK2. The items that correlated were linked in predictable ways. For instance, IMA-level ratings focused on teacher facilitation involved complementary aspects of teacher facilitation. A teacher who listens to student thinking may also have the opportunity to ask relevant questions that prompt exploration of misconceptions. The factor loading also indicated that there are important distinctions between the general quality of the classroom environment and the math-specific ways that teachers engage students.

6.2.2. Stability

To determine the stability of the EMC-PK2 across time points, we calculated the correlations of teachers’ scores on the identified subscales between time points. Note that there were fewer teachers observed during the third time point due to the COVID-19 pandemic preventing the third observation for some first- and second-grade teachers. The subscales generally correlated moderately and significantly across time points (Table 8), with correlations similar to those of the COSTI-M [69]. Some variation between teachers’ practices across specific lessons is to be expected, and these moderate correlations suggest that the EMC-PK2 was detecting some of that variation while also capturing the practices that teachers were engaging in consistently across lessons.

6.3. Extrapolation

In Kane’s Argument-Based Validity Framework [65,67], extrapolation refers to how well a measure can be used to draw inferences about what scores imply for real-world performance. A strength of the EMC-PK2 is that the resulting scores can provide detailed information about what happens in mathematics instruction across the early grades, Pre-K through elementary school, in ways that previous measures have not.

Summarizing Indicators of High-Quality Instruction across Grades

While the results of the full COHERE project will be reported in future publications, here, we report the descriptive differences in observed practices across grade levels to show how the EMC-PK2 gives a uniquely nuanced and detailed picture of the quality of math instruction across the early grade levels, including Pre-K. For these descriptive statistics, we averaged teachers’ ratings across the year and then looked at the means and standard deviations for teachers on these variables by grade level (see Table 9). As a reminder, Teacher Facilitation, Student Engagement, Differentation, and Classroom Atmosphere were based on 1 to 5 rating scales, and Student Discussion Practices was reported as the average number of discussion practices observed per activity (out of four possible discussion practices). In Table 9, we present the subscale scores across the grade levels. The subscales are the columns and the grade levels are the rows.

As shown, Teacher Facilitation was generally low across all grade levels, although it did increase as grade level increased. Student Engagement, on the other hand, was generally high across all grade levels, but decreased after the earliest grades. Differentiation was consistently low across grade levels, as aspects of Differentiation were not often observed in many classrooms. In contrast, Classroom Atmosphere was consistently relatively high, with the highest ratings observed in preschool. Student Discussion Practices increased across grade levels from an average of almost no discussion practices observed during math activities in preschool to 0.77 discussion practices observed during math activity in second grade, meaning one discussion practice for every one and a half IMAs observed. Generally, we saw very few instances of these discussion practices at any grade level. There was little evidence of using non-math storylines across the grade levels, but older grades were much more likely to have a big math idea overarching the lesson. Overall, these were warm and positive classrooms for children, particularly in early grades. However, the level of Teacher Facilitation, Differentiation, and Student Discussion Practices were consistently low, even more so at the early grade levels. These data are consistent with the findings from a small study of kindergarten teachers with the MQI, where scores were also characterized as uniformly low [26].

Our analysis also revealed specific ways in which our study’s preschool and early elementary classrooms differed from older elementary classrooms in terms of structure and opportunity (see Table 10). Preschool classrooms involved more adults leading math activities in more small group and one-on-one settings and more use of math tools and visuals. The older grades involved more time in whole groups, but also revealed more student discussion and less tool use. The early grades showed a heavier focus on counting, gradually shifting into operations becoming the dominant focus.

6.4. Implications

In Kane’s Argument-Based Validity Framework [65,67], Implications refers to how results from the measure are used to make practice and policy decisions. The data that informed this initial analysis of the EMC-PK2 were limited by involving only six schools in two districts within the same state. The districts we observed were also involved in some restructuring efforts around improving early math coherence, which could have contributed to variance across years of the study. While we are cautious about suggesting broad implications across different student populations, schools, and districts based on these findings, the predictable nature of the relationships between concepts in the measure suggests that the measure can detect important differences in mathematics instructional quality in the early grades and could be used in broader samples in future studies to guide district and school reform efforts. Ongoing analysis of the EMC-PK2 from a subsequent study by the authors may provide further evidence about whether or not these data also describe the state of routine mathematics instruction in other contexts.

6.5. Summary of Results

The analyses conducted using data collected with the EMC-PK2 resulted in the creation of four valid and reliable subscales: Teacher Facilitation, Student Engagement, Differentiation, and Classroom Atmosphere. Although the names are somewhat different, our four subscales closely resemble those identified by Agodini and colleagues in a large study of first- and second-grade classrooms using a researcher-developed live observation system [54,70]. An important aspect of instruction in Pre-K and early elementary grades, revealed in our factor analyses and verified by Agodini et al. [54], is the independence of mathematics instruction from the overall quality of the classroom environment (e.g., classroom organization and teacher tone). Classrooms can be calm, well-organized places for children to be, but the instructional coherence of the math children are exposed to can vary greatly. Prior observational systems may assume that if classrooms are warm and welcoming, instructional rigor will follow as a matter of course. Our data, like others [36], suggest that these are two somewhat unrelated aspects of classroom functioning. It is important to distinguish them and capture both.

7. Discussion

The mathematics learning young children experience in the early elementary grades is foundational for their later mathematics competencies. Yet, this is an area of instruction that has been tremendously overlooked—even in major efforts such as the MET, funded to determine effective mathematics teaching.

7.1. Challenges

The development and piloting of the EMC-PK2 measure highlights what a demanding challenge it is to measure classroom practices in authentic early-grade settings. For example, both children and teachers often move during lessons, and children are re-grouped, sometimes working individually, sometimes with others, before coming back to a single point of instruction. Once we began visiting these early elementary and Pre-K classrooms, it was clear why so many observation efforts have focused on upper elementary and middle-grade classrooms. There, despite suggestions for a different, more student-centered approach, the teacher is often the focal point of instruction. Having a clear single focus means that videotaping the lesson is more possible so that later coding can take place at a slower pace. We, however, set out to capture what is happening when instruction is more active and involves both teacher and student behaviors, behaviors that must be captured in real time with a live observer. Understanding the teacher and student interactions occurring in math classrooms was crucial to understanding the instructional quality of a lesson.

Another issue we faced is that when we began adapting the standards and practices commonly held by educators and researchers to be indicators of strong math instruction, it was evident many were written in ways that required a high degree of interpretation and could be subject to misinterpretation. For example, one of the Common Core Standards for Mathematical Practice is “Construct viable arguments and critique the reasoning of others” [28]. This standard includes a number of discrete behaviors that could be independently observable: justifying their conclusions; listening to the arguments of others; asking clarifying questions; and using concrete referents such as objects, drawings, diagrams, and actions. Although these discrete behaviors all relate to a coherent and valuable concept, the logistics involved in interpreting them during a live observation become quickly and overly complex.

Other aspects of the standards are written in a way that lacks clear associated behaviors such that a visiting observer could record them. For example, one of the student practice standards requires that students “attend to precision” [28]: a difficult behavior for a visitor to capture during live observations without further clarification. Thus, we had to choose from the standards those practices for students and teachers that could be reliably observed during a live visit.

7.2. Benefits

An important innovation of the EMC-PK2 is its ability to tie teacher instructional practices to specific behavioral anchors. In our review of the most common observation measures, most ratings involved either Likert scales or binary options (observed/did not observe). We believe the exact behavioral descriptions of what should be observed at each rating point supported greater reliability and created our very low rater bias assessed across 15 observers and four years of data collection. It also results in a more nuanced picture of the instructional practices happening in mathematics classrooms across grade levels.

As previously mentioned, the EMC-PK2 provides detailed data about classrooms related to Teacher Facilitation, Student Engagement, Differentiation, and Classroom Atmosphere. It is important to capture these different aspects of students’ classroom experiences as a classroom can be warm and well organized but lack rigorous mathematics-specific instruction. Likewise, a teacher employing rigorous instructional practices may not be able to hold student engagement. All these aspects of instruction are important to measure, and some observation systems appear to assume that if one aspect is strong, the others are likely strong as well. In addition, detailed evidence about the differentiation occurring during a math lesson provided by the EMC-PK2 is lacking in most other observation measures. Differentiation can be difficult to capture in an observation, and the EMC-PK2 provides a way to measure differentiation in a reliable and valid way. Another novel contribution of the data provided by the EMC-PK2 is that they capture what is happening both at the math activity level and the overall lesson level, providing a rich dataset that measures these different grain sizes of what is occurring during math instruction.

One unique aspect of the EMC-PK2 measure is its ability to capture the type and level of math content being covered in every math activity occurring in the classroom. With these data, it is possible to understand how much time is spent on content that is above, on, or below grade-level standards. Past work examining curricula suggests that students often experience repetitive content [13,71], particularly in the early grades, and these data could help capture the coherence (or lack thereof) of math content within and across grades in a school or district.

One of the important findings of our pilot data was discovering how few student practices of any kind were going on in any classroom. After combining many of those practices into a single one called “students talked to each other about math”, our pilot observations revealed that even this baseline practice was relatively rare and led to this variable having lower reliability than the other subscales created from the EMC-PK2 measure.

A component of the tool that makes it unique and rigorous is our data-checking process, made possible through the incorporation of extensive observer field notes. Raters are supported by a detailed review of their observation records on the same day they completed each observation, which ensures their coding holds up to an outside audience. The notetaking not only serves as a support for thorough data checking but also provides the opportunity for deep analysis of classroom practices in the context they were observed. The notes provided a window into how the items in the tool were woven together in practice around a particular strand of math content. This level of rigorous checking is only possible if there are available staff with deep math knowledge to check each observation as it is submitted after a visit.

7.3. Limitations

The rigorous checking described above is actually a limitation to scaling up the use of EMC-PK2. The system was developed as a research tool. As such, we believe it reliably and validly captured the mathematics instruction occurring in Pre-K and early elementary-grade classrooms. We believe that one of the reasons so many previously developed observation tools have had limited predictability may well be that not enough care was taken to ensure the systems are capturing what is actually occurring in enough detail and to maintain a high level of precision in the field. This tool is valuable because it provides detailed and accurate descriptions of mathematics teaching and learning.

We are aware that we have not so far connected the EMC-PK2 data to student outcomes. For a variety of reasons, including the interruption of COVID-19, we have only limited information about student learning in the classrooms we observed. Our preliminary data are promising but much too limited to draw conclusions. The utility of the approach we developed needs to be tested in a larger-scale study that includes measures of student mathematics learning. We face the same problems others do in the early grades, however—students are not assessed with state tests, nor should they be. Some reliable, foundational measures of math knowledge that can be easily implemented in these early grades are strongly needed.

7.4. Future Directions

The EMC-PK2 tool is clearly a research measure, one that might be helpful in achieving a better understanding of mathematics instruction in the early grades, but not one that is likely to have much effect on practice. In order to have a broader impact on more classrooms based on what we learned from the EMC-PK2 data, we developed a practitioner-friendly version of the tool. It is called the DREME Math Observer and is meant to be used by elementary school coaches, learning specialists, principals, and co-teachers for formative development [72]. It captures many of the same instructional practices as the EMC-PK2, and it is meant to identify actionable next steps for teachers to continue improving their math instruction. As others have asserted [14], improvements to early-grade instruction require intensive coaching; incorporating a valid measure of high-quality math instruction that can be used by coaches in an ongoing manner is a possible step forward.

The DREME Math Observer is now a downloadable app suitable for use on a mobile device (iPhone and Android) or a tablet. The app is still in its beta version. The next steps will be to determine its usefulness to those in the practice of improving mathematics instruction in the early grades and, most importantly, to see if improvements in practice when using the tool result in deeper foundational mathematics learning for children.

8. Conclusions

In this paper, we have presented details about the development and validity of a rigorous observational system intended to provide much-needed information about the quality of mathematics instruction across the early grades. We believe this to be a critical endeavor; much more information is needed about the neglected early elementary grades, where a literacy focus has overtaken attention to math learning for young children. That focus may be changing, as the DREME Network funded by the Heising-Simons Foundation exemplifies. As initiatives focused on alignment and coherence within and across grade levels continue to form across the country, it is also important to have nuanced, reliable, and valid measures of math instructional practices in the early grades. We hope that the information provided by the tool we developed can assist the research community in further explorations of these issues. We also hope that our adaptation of the tool for practitioners will have a broader, more general effect on the field as a whole.

Author Contributions

Conceptualization, L.R., D.C.F. and K.D.; methodology, L.R., D.C.F. and K.D.; software, L.R.; validation, L.R., D.C.F. and K.D.; formal analysis, K.D.; investigation, L.R. and D.C.F.; resources, D.C.F. and K.D.; data curation, L.R.; writing—original draft preparation, L.R., D.C.F. and K.D.; writing—review and editing, D.C.F. and K.D.; visualization, L.R.; supervision, D.C.F. and K.D.; project administration, L.R.; funding acquisition, D.C.F. and K.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Heising-Simons Foundation, grant number 2020-1777.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of VANDERBILT UNIVERSITY #182068, 20 October 2023.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The participants of this study did not give written consent for their data to be shared publicly, nor did the participating school districts agree to have data shared publicly, so supporting data are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Shanley, L. Evaluating longitudinal mathematics achievement growth: Modeling and measurement considerations for assessing academic progress. Educ. Res. 2016, 45, 347–357. [Google Scholar] [CrossRef]
Aunio, P.; Niemivirta, M. Predicting children’s mathematical performance in grade one by early numeracy. Learn. Individ. Differ. 2010, 20, 427–435. [Google Scholar] [CrossRef]
Bodovski, K.; Farkas, G. Mathematics growth in early elementary school: The roles of beginning knowledge, student engagement, and instruction. Elem. Sch. J. 2007, 108, 115–130. [Google Scholar] [CrossRef]
Watts, T.W.; Duncan, G.J.; Chen, M.; Claessens, A.; Davis-Kean, P.E.; Duckworth, K.; Engel, M.; Siegler, R.; Susperreguy, M.I. The role of mediators in the development of longitudinal mathematics achievement associations. Child Dev. 2015, 86, 1892–1906. [Google Scholar] [CrossRef] [PubMed]
Denton, K.; West, J. Children’s Reading and Mathematics Achievement in Kindergarten and First Grade (NCES 2002-125); U.S. Department of Education, NCES, U.S. Government Printing Office: Washington, DC, USA, 2002.
National Research Council. Early Childhood Assessment: Why, What, and How; Committee on Developmental Outcomes and Assessments for Young Children, C.E.: Washington, DC, USA, 2008. [Google Scholar]
National Research Council. The state of school mathematics in the United States. In Adding It Up: Helping Children Learn Mathematics; Kilpatrick, J., Swafford, J., Eds.; The National Academies Press: Washington, DC, USA, 2001; pp. 31–70. [Google Scholar] [CrossRef]
Newmann, F.M.; Smith, B.; Allensworth, E.; Bryk, A.S. Instructional program coherence: What it is and why it should guide school improvement policy. Educ. Eval. Policy Anal. 2001, 23, 297–321. [Google Scholar] [CrossRef]
Bogard, K.; Takanishi, R. PK-3: An aligned and coordinated approach to education for children 3 to 8 years old. Soc. Policy Rep. 2005, 19, 3–23. [Google Scholar] [CrossRef]
Kauerz, K. Making the Case for P-3; Education Commission of the States: Washington, DC, USA, 2007. [Google Scholar]
Clements, D.H.; Sarama, J.; Wolfe, C.B.; Spitler, M.E. Longitudinal evaluation of a scale-up model for teaching mathematics with trajectories and technologies: Persistence of effects in the third year. Am. Educ. Res. J. 2013, 50, 812–850. [Google Scholar] [CrossRef]
Claessens, A.; Engel, M.; Curran, C. Opportunities sustained: Kindergarten content and the maintenance of preschool effects. Am. Educ. Res. J. 2014, 51, 403–434. [Google Scholar] [CrossRef]
Farran, D. Mathematics Instruction in Preschool and Kindergarten: Six of One, Half a Dozen of the Other! [Blog Post]. 20 May 2019. Available online: https://dreme.stanford.edu/news/mathematics-instruction-preschool-and-kindergarten-six-one-half-dozen-other (accessed on 1 May 2024).
Blazar, D. Effective teaching in elementary mathematics: Identifying classroom practices that support student achievement. Econ. Educ. Rev. 2015, 48, 16–29. [Google Scholar] [CrossRef]
Polikoff, M.S. Instructional alignment under no child left behind. Am. J. Educ. 2012, 118, 341–368. [Google Scholar] [CrossRef]
Kauerz, K. Ladders of Learning: Fighting Fade-Out by Advancing PK-3 Alignment; New America Foundation Early Education Initiative; New America Foundation: Washington, DC, USA, 2006; Issue Brief No. 3. [Google Scholar]
Reynolds, A.; Magnuson, K.; Ou, S. PK-3 Education: Programs and Practices That Work in Children’s First Decade; Foundation for Child Development Working Paper 6: Advancing PK-3; Foundation for Child Development: New York, NY, USA, 2006. [Google Scholar]
Graves, B. PK-3: What Is It and How Do We Know It Works? Foundation for Child Development Working Paper 4: Advancing PK-3; Foundation for Child Development: New York, NY, USA, 2006. [Google Scholar]
Stipek, D.; Franke, M.; Clements, D.; Farran, D.; Coburn, C. PK-3: What Does It Mean for Instruction? SRCD Soc. Policy Rep. 2017, 30. Available online: https://files.eric.ed.gov/fulltext/ED581657.pdf (accessed on 17 June 2024). [CrossRef]
Cai, J.; Ding, M.; Wang, T. How do exemplary Chinese and U.S. mathematics teachers view instructional coherence? Educ. Stud. Math. 2014, 85, 265–280. [Google Scholar] [CrossRef]
Wonder-McDowell, C.; Reutzel, D.R.; Smith, J.A. Does instructional alignment matter? effects on struggling second graders’ reading achievement. Elem. Sch. J. 2011, 112, 259–279. [Google Scholar] [CrossRef]
Stein, A.; Coburn, C.E. Instructional policy from pre-k to third grade: The challenges of fostering alignment and continuity in two school districts. Educ. Policy 2023, 37, 840–872. [Google Scholar] [CrossRef]
Ferrini-Mundy, J.; Burrill, G.; Schmidt, W.H. Building teacher capacity for implementing curricular coherence: Mathematics teacher professional development tasks. J. Math. Teach. Educ. 2007, 10, 311–324. [Google Scholar] [CrossRef]
National Council of Teachers of Mathematics. Principles to Actions: Ensuring Mathematical Success for All; NCTM: Reston, VA, USA, 2014. [Google Scholar]
Karp, K.S.; Dougherty, B.J.; Bush, S.B. Jumping on board: What is the mathematics whole school agreement? In The Math Pact, Elementary: Achieving Instructional Coherence within and across Grades; Corwin Press: Thousand Oaks, CA, USA, 2020; pp. 1–18. [Google Scholar]
Manzicopoulos, P.; French, B.; Patrick, H. The Mathematical Quality of Instruction (MQI) in kindergarten: An evaluation of the stability of the MQI using generalizability theory. Early Educ. Dev. 2018, 29, 893–908. [Google Scholar] [CrossRef]
National Council of Teachers of Mathematics. Focus in High School Mathematics: Reasoning and Sense Making; National Council of Teachers of Mathematics: Reston, VA, USA, 2009. [Google Scholar]
National Governors Association Center for Best Practices [NGA]; Council of Chief State School Officers. Common Core State Standards for Mathematics [CCSS]; Council of Chief State School Officers: Washington, DC, USA, 2010. [Google Scholar]
National Research Council. Mathematics Learning in Early Childhood: Paths toward Excellence and Equity; The National Academies Press: Washington, DC, USA, 2009. [Google Scholar] [CrossRef]
Carpenter, T.P.; Franke, M.L.; Jacobs, V.R.; Fennema, E.; Empson, S.B. A longitudinal study of invention and understanding in children’s multidigit addition and subtraction. J. Res. Math. Educ. 1998, 29, 3–20. [Google Scholar] [CrossRef]
Cobb, P.; Gresalfi, M.; Hodge, L.L. An interpretive scheme for analyzing the identities that students develop in mathematics classrooms. J. Res. Math. Educ. 2009, 40, 40–68. [Google Scholar] [CrossRef]
Hiebert, J.; Wearne, D. Instructional tasks, classroom discourse, and students’ learning in second-grade arithmetic. Am. Educ. Res. J. 1993, 30, 393–425. [Google Scholar] [CrossRef]
Hiebert, J.; Carpenter, T.P.; Fennema, E.; Fuson, K.; Wearne, D.; Murray, H.; Olivier, A.; Human, P. Making Sense: Teaching and Learning Mathematics with Understanding; Heinemann: Portsmouth, NH, USA, 1997. [Google Scholar]
Webb, N.; Franke, M.; Turrou, A.; Ing, M. Self-Regulation and Learning in Peer-Directed Small Groups; The British Psychological Society: London, UK, 2013. [Google Scholar] [CrossRef]
Webb, N.M.; Franke, M.L.; Ing, M.; Wong, J.; Fernandez, C.H.; Shin, N.; Turrou, A.C. Engaging with others’ mathematical ideas: Inter-relationships among student participation, teachers’ instructional practices, and learning. Int. J. Educ. Res. 2014, 63, 79–93. [Google Scholar] [CrossRef]
Blazar, D.; Pollard, C. Challenges and Tradeoffs of “Good” Teaching: The Pursuit of Multiple Educational Outcomes; Ed Working Paper: 22-591; Annenberg Institute at Brown University: Providence, RI, USA, 2022. [Google Scholar] [CrossRef]
Roseman, J.E.; Linn, M.C.; Koppal, M. Characterizing curriculum coherence. In Designing Coherent Science Education: Implications for Curriculum, Instruction, and Policy; Kali, Y., Linn, M.C., Roseman, J.E., Eds.; Teachers College Press: New York, NY, USA, 2008; pp. 13–36. [Google Scholar]
Schmidt, W.H.; Wang, H.C.; McKnight, C.C. Curriculum coherence: An examination of US mathematics and science content standards from an international perspective. J. Curric. Stud. 2005, 37, 525–559. [Google Scholar] [CrossRef]
California Department of Education. Preschool Learning Foundations Volume 1. 2008. Available online: https://www.cde.ca.gov/sp/cd/re/documents/preschoollf.pdf (accessed on 17 June 2024).
Sarama, J.; Clements, D.H. Early Childhood Mathematics Education Research: Learning Trajectories for Young Children; Routledge: Abington, UK, 2009. [Google Scholar] [CrossRef]
Sarnecka, B.; Lee, M. Levels of number knowledge in early childhood. J. Exp. Child Psychol. 2009, 103, 325–337. [Google Scholar] [CrossRef]
Siegler, R.S.; Svetina, M. What leads children to adopt new strategies? A microgenetic/cross-sectional study of class inclusion. Child Dev. 2006, 77, 997–1015. [Google Scholar] [CrossRef] [PubMed]
Clements, D.H.; Sarama, J. Learning trajectories: Foundations for effective, research-based education. In Learning over Time: Learning Trajectories in Mathematics Education; Maloney, A.P., Confrey, J., Nguyen, K.H., Eds.; Information Age Publishing: Charlotte, NC, USA, 2014; pp. 1–30. [Google Scholar]
Silverman, R.D.; Crandell, J.D. Vocabulary practices in prekindergarten and kindergarten classrooms. Read. Res. Q. 2010, 45, 318–340. [Google Scholar] [CrossRef]
Wilson, P.H.; Sztajn, P.; Edgington, C.; Confrey, J. Teachers’ use of their mathematical knowledge for teaching in learning a mathematics learning trajectory. J. Math. Teach. Educ. 2014, 17, 149–175. [Google Scholar] [CrossRef]
Gill, B.; Shoji, M.; Coen, T.; Place, K. The Content, Predictive Power, and Potential Bias in Five Widely Used Teacher Observation Instruments (REL 2017–191); U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional As-sistance, Regional Educational Laboratory Mid-Atlantic: Washington, DC, USA, 2016. Available online: https://ies.ed.gov/ncee/edlabs/regions/midatlantic/pdf/REL_2017191.pdf (accessed on 17 June 2024).
Bostic, J.; Lesseig, K.; Sherman, M.; Boston, M. Classroom observation and mathematics education research. J. Math. Teach. Educ. 2021, 24, 5–31. [Google Scholar] [CrossRef]
Praetorius, A.K.; Charalambous, C.Y. Classroom observation frameworks for studying instructional quality: Looking back and looking forward. ZDM-Math. Educ. 2018, 50, 535–553. [Google Scholar] [CrossRef]
Bell, C.A.; Gitomer, D.H. Building the field’s knowledge of teaching and learning: Centering the socio-cultural contexts of observation systems to ensure valid score interpretation. Stud. Educ. Eval. 2023, 78, 101278. [Google Scholar] [CrossRef]
Bilge, C. Measuring the quality of early mathematics instruction: A review of six measures. Early Child. Educ. Res. 2020, 48, 507–520. [Google Scholar]
Cobb, P.; Stephan, M.; McClain, K.; Gravemeijer, K. Participating in classroom mathematical practices. J. Learn. Sci. 2001, 10, 113–164. [Google Scholar] [CrossRef]
Sarama, J.; Clements, D.H. Manual for Classroom Observation (COEMET)—Version 3; Manual of An Early Math Classroom Observation System; Unpublished Version; University of Denver: Denver, CO, USA, 2007. [Google Scholar]
Farran, D.C.; Meador, D.M.; Keene, A.; Bilbrey, C.; Vorhaus, E. Advanced Narrative Record Manual (2015 Edition); Vanderbilt University, Peabody Research Institute: Nashville, TN, USA, 2015. [Google Scholar]
Agodini, R.; Harris, B.; Thomas, M.; Murphy, R.; Gallagher, L. Achievement Effects of Four Elementary School Math Curricula: Findings for First and Second Graders; NCEE 2011-4001; National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education: Washington, DC, USA, 2010. [Google Scholar]
PreK-3 Alignment—DREME. Available online: https://dreme.stanford.edu/prek-3-coherence/#Math-Observation-Instrument (accessed on 17 June 2024).
Preston, C.C.; Colman, A.M. Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychol. 2000, 104, 1–15. [Google Scholar] [CrossRef] [PubMed]
Weng, L.-J. Impact of the number of response Categories and Anchor Labels on Coefficient Alpha and Test-Retest Reliability. Educ. Psychol. Meas. 2004, 64, 956. [Google Scholar] [CrossRef]
Ball, D.; Forzani, F. What Does it Take to Make a Teacher? Phi Delta Kappan 2010, 92, 8–12. [Google Scholar] [CrossRef]
Biernat, M.; Manis, M.; Nelson, T.E. Stereotypes and standards of judgment. J. Personal. Soc. Psychol. 1991, 60, 485–499. [Google Scholar] [CrossRef]
Thorndike, E.L. A constant error in psychological ratings. J. Appl. Psychol. 1920, 4, 25–29. [Google Scholar] [CrossRef]
Saal, F.E.; Downey, R.G.; Lahey, M.A. Rating the ratings: Assessing the psychometric quality of rating data. Psychol. Bull. Vol. 1980, 88, 413–428. [Google Scholar] [CrossRef]
Nesbitt, K.; Farran, D.C. Effects of prekindergarten curricula: Tools of the Mind as a case study. Monogr. Soc. Res. Child Dev. 2021, 86, 7–119. [Google Scholar] [CrossRef]
Coburn, C.E.; McMahon, K.; Borsato, G.; Stein, A.; Jou, N.; Chong, S.; LeMahieu, R.; Franke, M.; Ibarra, S.; Stipek, D. Fostering Pre-K to Elementary Alignment and Continuity in Mathematics in Urban School Districts: Challenges and Possibilities; Stanford University Policy Analysis for California Education: Stanford, CA, USA, 2018. [Google Scholar]
Hill, H.C.; Charalambous, C.Y.; Kraft, M.A. When rater reliability is not enough. Educ. Res. 2012, 41, 56–64. [Google Scholar] [CrossRef]
Kane, M.T. Validating the interpretations and uses of test scores. J. Educ. Meas. 2013, 50, 1–73. [Google Scholar] [CrossRef]
Bell, C.A.; Gitomer, D.H.; McCaffrey, D.F.; Hamre, B.K.; Pianta, R.C.; Qi, Y. An argument approach to observation protocol validity. Educ. Assess. 2012, 17, 62–87. [Google Scholar] [CrossRef]
Cook, D.A.; Brydges, R.; Ginsburg, S.; Hatala, R. A contemporary approach to validity arguments: A practical guide to Kane’s framework. Med. Educ. 2015, 49, 560–575. [Google Scholar] [CrossRef] [PubMed]
Briesch, A.M.; Swaminathan, H.; Welsh, M.; Chafouleas, S.M. Generalizability theory: A practical guide to study design, implementation, and interpretation. J. Sch. Psychol. 2014, 52, 13–35. [Google Scholar] [CrossRef] [PubMed]
Doabler, C.; Stoolmiller, M.; Kennedy, P.; Nelson, N.; Clarke, B.; Bearin, B.; Fien, H.; Smolkowski, K.; Baker, S. Do components of explicit instruction explain the differential effectiveness of a core mathematics program for kindergarten students with mathematical difficulties? A mediated moderation analysis. Assess. Eff. Interv. 2019, 44, 197–211. [Google Scholar] [CrossRef]
Clements, D.H.; Agodini, R.; Harris, B. Instructional Practices and Student Math Achievement: Correlations from a Study of Math Curricula (NCEE Evaluation Brief No. 2013-4020); National Center for Educational Evaluation and Regional Assistance, Institute of Education Sciences: Washington, DC, USA, 2013; Available online: https://www.researchgate.net/publication/258932990_Instructional_practices_and_student_math_achievement_Correlations_from_a_study_of_math_curricula (accessed on 17 June 2024).
Engel, M.; Claessens, A.; Finch, M. Teaching students what they already know? The misalignment between instructional content in mathematics and student knowledge in kindergarten. Educ. Eval. Policy Adm. 2013, 35, 157–178. [Google Scholar] [CrossRef]
DREME Math Classroom Observer—Vanderbilt University. Available online: https://lab.vanderbilt.edu/dremeobserver/ (accessed on 17 June 2024).

Table 1. EMC-PK2 items by section.

IMA	Cover	POST
IMA start and end time	Teacher	8 POST ratings
Observer notes	Visit ID
Mini/Full designation	Grade
Adult leading	Date of observation
Activity setting	School
Student practices checklist	Number of adults
Math content	Number of children present
7 full IMA ratings	Observation start and end time

Table 2. Observation information.

Grade	Classrooms	Observations	Average Length (min)	Number of Children	Number of Adults
PK	14	42	178.3	15.8	2.9
TK	6	17	52.1	16.2	2.2
K	32	96	51.5	19.9	1.8
1st	36	89	57.3	18.8	1.4
2nd	18	35	61.6	19.5	1.7

Table 3. Intraclass correlations of rating items.

IMA Rating	ICC
1. Teacher listens and responds	0.066
2. Teacher utilizes incorrect responses	0.043
3. Teacher asks questions	0.034
4. Teacher maintains cognitive demand	0.030
5. Student participation	0.068
6. Student engagement	0.234
7. Teacher adapts the task	0.058
POST Rating	ICC
1. Teacher reinforced math learning	0.019
2. Teacher accommodated range of abilities	0.137
3. Teacher communicated in multiple ways	0.148
4. Classroom environment was respectful	0.186
5. Behavior management did not impede instruction	0.181
6. Teacher tone was warm	0.087
7. Non-math theme-connected activities	0.000
8. Big math idea-connected activities	0.195

Table 4. Correlations between IMA ratings.

IMA Rating		2. Teacher Utilizes Incorrect Responses	3. Teacher Asks Questions	4. Teacher Maintains Cognitive Demand	5. Student Participation	6. Student Engagement	7. Teacher Adapts the Task
1. Teacher listens and responds	r	0.500 **	0.670 **	0.554 **	0.088 **	0.124 **	0.213 **
1. Teacher listens and responds	p	<0.0001	<0.0001	<0.0001	0.006	<0.0001	<0.0001
2. Teacher utilizes incorrect responses	r		0.426 **	0.450 **	0.177 **	0.254 **	0.189 **
2. Teacher utilizes incorrect responses	p		<0.0001	<0.0001	<0.0001	<0.0001	<0.0001
3. Teacher asks questions	r			0.628 **	0.093 **	0.102 **	0.115 **
3. Teacher asks questions	p			<0.0001	0.004	0.002	<0.0001
4. Teacher maintains cognitive demand	r				0.052	0.133 **	0.133 **
4. Teacher maintains cognitive demand	p				0.111	<0.0001	<0.0001
5. Student participation	r					0.596 **	0.059
5. Student participation	p					<0.0001	0.068
6. Teacher engagement	r						0.080 *
6. Teacher engagement	p						0.013

** p < 0.01, * p < 0.05.

Table 5. Correlations between POST ratings.

POST Rating		2. Teacher Accommodated Range of Abilities	3. Teacher Communicated in Multiple Ways	4. Classroom Environment Was Respectful	5. Behavior Management Did Not Impede Instruction	6. Teacher Tone Was Warm	7. Non-Math Theme-Connected Activities	8. Big Math Idea-Connected Activities
1. Teacher reinforced math learning	r	0.354 **	0.534 **	0.365 **	0.252 **	0.369 **	0.354 **	0.248 **
1. Teacher reinforced math learning	p	0	0	0	0	0	0	0
2. Teacher accommodated range of abilities	r		0.466 **	0.243 **	0.091	0.222 **	0.151 *	0.106
2. Teacher accommodated range of abilities	p		0	0	0.131	0	0.012	0.078
3. Teacher communicated in multiple ways	r			0.283 **	0.123 *	0.290 **	0.261 **	0.243 **
3. Teacher communicated in multiple ways	p			0	0.04	0	0	0
4. Classroom environment was respectful	r				0.657 **	0.675 **	0.132 *	0.089
4. Classroom environment was respectful	p				0	0	0.028	0.137
5. Behavior management did not impede instruction	r					0.409 **	0.081	0.065
5. Behavior management did not impede instruction	p					0	0.175	0.277
6. Teacher tone was warm	r						0.130 *	0.135 *
6. Teacher tone was warm	p						0.03	0.024
7. Non-math theme-connected activities	r							0.086
7. Non-math theme-connected activities	p							0.151

** p < 0.01, * p < 0.05.

Table 6. Final model of IMA subscales from the rotated factor matrix.

IMA Rating	Teacher Facilitation	Student Engagement
1. Teacher listens and responds	0.811	0.041
2. Teacher utilizes incorrect responses	0.582	0.212
3. Teacher asks questions	0.807	0.013
4. Teacher maintains cognitive demand	0.733	0.031
5. Student participation	0.069	0.704
6. Student engagement	0.122	0.832
7. Teacher adapts the task ¹	0.209	0.074

¹ The item “Teacher adapts the task” did not load well onto any factor.

Table 7. Final model of POST subscales from the rotated factor matrix.

POST Rating	Differentiation	Classroom Atmosphere
1. Teacher reinforced math learning	0.693	0.248
2. Teacher accommodated range of abilities	0.521	0.154
3. Teacher communicated in multiple ways	0.755	0.154
4. Classroom environment was respectful	0.174	0.984
5. Behavior management did not impede instruction	0.061	0.656
6. Teacher tone was warm	0.273	0.638
7. Non-math theme-connected activities ¹	0.380	0.067
8. Big math idea-connected activities ¹	0.308	0.036

¹ The items “Non-math theme” and “Big math idea” did not load well onto any factor.

Table 8. Correlations between visits for indicators of math instruction quality.

Subscale		Visit 1:2	Visit 1:3	Visit 2:3
Teacher Responsiveness	r	0.36 **	0.21	0.46 **
	n	103	67	68
Student Engagement	r	0.39 **	0.44 **	0.42 **
	n	103	67	68
Differentiation	r	0.47 **	0.42 **	0.58 **
	n	103	68	69
Class Atmosphere	r	0.43 **	0.57 **	0.61 **
	n	103	68	69

** p < 0.01.

Table 9. Indicators of math instruction quality by grade level.

Grade Level		Teacher Facilitation	Student Engagement	Differentiation	Classroom Atmosphere	Discussion Practices	Non-Math Storyline	Big Math Idea
PK	M	2.09	4.27	2.08	4.26	0.07	1.45	1.91
n = 14	SD	(0.39)	(0.30)	(0.51)	(0.59)	(0.07)	(0.53)	(0.67)
TK	M	2.14	4.09	1.76	3.55	0.08	1.11	1.94
n = 6	SD	(0.59)	(0.47)	(0.94)	(0.53)	(0.08)	(0.27)	(0.80)
K	M	2.46	3.96	2.24	3.69	0.43	1.45	3.50
n = 32	SD	(0.54)	(0.53)	(0.82)	(0.77)	(0.36)	(0.87)	(0.72)
1	M	2.48	3.91	2.13	3.63	0.56	1.07	4.11
n = 36	SD	(0.45)	(0.62)	(0.56)	(0.75)	(0.22)	(0.35)	(0.74)
2	M	2.52	3.99	2.15	3.82	0.77	1.06	4.11
n = 18	SD	(0.51)	(0.61)	(0.60)	(0.84)	(0.39)	(0.24)	(0.85)

Table 10. Characteristics of IMAs per observation for a given classroom ¹.

	% of IMAs
IMA Code	All (n = 106)	PK (n = 14)	TK (n = 6)	K (n = 32)	1 (n = 36)	2 (n = 18)
Adult leading (Full)
Lead teacher	61	52	46	66	67	54
TA	3	16	5	2	0	1
Other staff	1	4	0	1	1	0
Student-directed (Mini)	35	30	49	31	32	45
Activity type
Whole group w teacher	49	39	35	54	52	48
Small group w teacher	8	25	22	5	4	3
Small group	3	2	2	4	3	2
Pair	9	2	1	8	12	15
Teacher and 1 student	2	10	1	1	1	0
Independent	29	22	39	28	29	32
Student discussion practices
Talked with one another about math	18	4	2	20	19	32
Reflected on math ideas	3	0	1	6	3	0
Explained reasoning	14	3	5	13	15	24
Described steps to solve	14	1	0	7	23	27
Used math tools/visuals	52	82	71	56	44	31
Math content domain
Counting and cardinality	27	49	54	43	9	7
Operations and algebraic thinking	49	15	19	41	63	74
Measurement and data	12	15	6	8	12	19
Geometry	12	20	21	8	16	0

¹ Percentages may not all add up to 100. IMA frequencies expressed as a percentage of IMAs per observation and then averaged across fall, winter, and spring observations for a given classroom.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rainey, L.; Farran, D.C.; Durkin, K. EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms. Educ. Sci. 2024, 14, 1039. https://doi.org/10.3390/educsci14101039

AMA Style

Rainey L, Farran DC, Durkin K. EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms. Education Sciences. 2024; 14(10):1039. https://doi.org/10.3390/educsci14101039

Chicago/Turabian Style

Rainey, Luke, Dale Clark Farran, and Kelley Durkin. 2024. "EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms" Education Sciences 14, no. 10: 1039. https://doi.org/10.3390/educsci14101039

APA Style

Rainey, L., Farran, D. C., & Durkin, K. (2024). EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms. Education Sciences, 14(10), 1039. https://doi.org/10.3390/educsci14101039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EMC-PK2: An Experimental Observation Tool for Capturing the Instructional Coherence and Quality in Early Math Classrooms

Abstract

1. Introduction

2. Defining High-Quality Math Instruction

3. Developing a Research Observation Tool for Early Math Instruction

Review of Common Classroom Observation Measures

4. EMC-PK2 Dimensions

4.1. Intentional Math Activities (IMAs)

4.2. Cover and Post-Observation Rating Scale (POST)

5. Conducting Observations

5.1. Training Observers

5.2. Data Collection for the COHERE Project

6. Argument-Based Validity

6.1. Scoring

6.1.1. Accurate and Consistent Application of Scoring

6.1.2. Bias-Free Scoring

6.1.3. Appropriateness of Scoring

6.2. Generalization

6.2.1. Factor Analyses

6.2.2. Stability

6.3. Extrapolation

Summarizing Indicators of High-Quality Instruction across Grades

6.4. Implications

6.5. Summary of Results

7. Discussion

7.1. Challenges

7.2. Benefits

7.3. Limitations

7.4. Future Directions

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI