A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program

Borman, Geoffrey D.; Yang, Hyunwoo

doi:10.3390/educsci15111422

Open AccessArticle

A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program

by

Geoffrey D. Borman

^1,*

and

Hyunwoo Yang

²

¹

Mary Lou Fulton College, Arizona State University, Tempe, AZ 85281, USA

²

Faculty of Education, The Chinese University of Hong Kong, Hong Kong

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(11), 1422; https://doi.org/10.3390/educsci15111422

Submission received: 27 August 2025 / Revised: 18 October 2025 / Accepted: 20 October 2025 / Published: 23 October 2025

(This article belongs to the Special Issue Advances in Evidence-Based Literacy Instructional Practices)

Download

Browse Figure

Versions Notes

Abstract

The “summer slide,” the well-documented tendency for students to lose academic skills during the extended summer break, remains a persistent challenge for educational equity and achievement. Although traditional summer school programs can mitigate these losses, an emerging body of research suggests that summer book distribution initiatives, which provide students with free, high-quality books to read at home, represent a cost-effective and scalable alternative. This study presents results from a quasi-experimental evaluation of Kids Read Now (KRN), an at-home reading program designed to sustain elementary students’ literacy engagement over the summer months. The program’s central feature is the delivery of nine free books directly to students, supported by school-based components that foster home–school connections and promote shared reading between parents and children. Across two districts, five schools, four grade levels (1–4), and 110 KRN and 156 comparison students, we used propensity score matching and doubly robust regression analyses, indicating that KRN participants outperformed their non-participating peers, with an average effect size of nearly d = 0.15. Further, two-stage least squares regression analyses revealed that students who benefited from all nine books achieved an effect size of d = 0.21. These impact estimates correspond to approximately two months of additional learning for the average participant and more than three months for full participants. Collectively, the results contribute to a growing evidence base indicating that book distribution programs are an effective and sustainable means of mitigating summer learning loss and promoting continued growth in reading achievement.

Keywords:

reading; summer learning; books; achievement

1. Introduction

For more than a century, researchers have documented the “summer slide”—the tendency for students to lose ground academically during the months away from school (Cooper et al., 1996). Contemporary estimates vary depending on the dataset analyzed (Workman et al., 2023), yet recent work suggests summer setbacks can erase 30–40% of a school year’s learning in mathematics and 10–20% in reading (Briggs & Wellberg, 2022). Evidence also indicates that achievement growth during summer is far more variable than during the academic year (Atteberry & McEachin, 2021), consistent with the idea that schools provide structured learning opportunities that help equalize achievement, whereas less structured out-of-school environments yield more divergent outcomes (Downey, 2024).

Several research syntheses point to the potential of summer programs to offset summer learning losses. Cooper et al. (2000) synthesized 93 quantitative studies of school-based summer interventions in reading and math, which were typically led by teachers, university students, or researchers. They found that remedial summer programs produced average achievement gains of about one-fifth of a standard deviation (d = 0.19). More recently, Kim and Quinn (2013) reported that summer school participation was associated with effect sizes of 0.09 SD for total reading achievement and 0.25 SD for reading comprehension. Finally, Dujardin et al. (2022) reviewed 16 summer reading programs from 2012 through 2021 and found generally positive outcomes for reading and writing skills and social behaviors. Taken together, these findings indicate that well-designed summer programs can partially counteract the academic declines that often occur during extended school breaks.

An alternative to traditional summer school programs involves providing students with books to read at home during the three-month summer break. Many such initiatives also incorporate school-based supports and parental outreach to encourage reading, and some include incentives—such as prizes or other rewards—to increase the number of books students read. In their meta-analysis, Kim and Quinn (2013) reviewed studies of 41 summer reading interventions published between 2006 and 2010 and found mixed evidence: experimental impacts on overall reading outcomes ranged from −0.143 to 0.185 standard deviation units. The average effect size across all studies was d = 0.10 (95% CI: 0.04 to 0.15), reflecting a statistically significant positive effect on overall reading achievement. More recent randomized trials by Kim et al. (2016) and White et al. (2013) reported modest but positive effects, particularly for students in high-poverty schools.

As a longer-term intervention, Allington et al. (2010) evaluated a three-year spring book fair program in Florida that provided first- through third-grade students with 10 free books annually (30 books total) to encourage summer reading. Students assigned to the program achieved gains equivalent to 0.14 SDs on state reading assessments. Drawing on these results, Allington and McGill-Franzen (2021) argued that annually distributing 10–12 free books to economically disadvantaged students from grade 1 through at least grade 8 could close the reading achievement gap with their more advantaged peers. Collectively, these studies suggest that book distribution programs, particularly when sustained over time, may offer a cost-effective, home-based complement to in-person summer learning interventions.

1.1. Kids Read Now

Most summer learning programs reviewed by Cooper et al. (2000) and Kim and Quinn (2013)—and the majority currently in operation nationwide—are local, district-run initiatives that lack clear implementation criteria or an evidence-based framework (Borman et al., 2016) By contrast, Kids Read Now (KRN), the focus of this study, is a nationally disseminated model designed for broad replication. Since its launch in 2010, KRN—a 501(c)(3) nonprofit—has reached more than 468,000 K—3 students, distributing nearly 3.5 million new books (https://kidsreadnow.org/). The program aligns with the summer learning evidence base and is offered through direct purchase or selective matching grants to schools.

KRN integrates both school-based and home-based elements, a relatively uncommon design in summer reading interventions (Kim & Quinn, 2013). The program’s core home-based feature is the delivery of nine free books directly to students. These distributions are supported by school-based activities intended to strengthen home–school connections, challenging the traditional view of classrooms and homes as separate domains for children’s learning (Cooper et al., 2000; McCombs et al., 2011; Kim & Quinn, 2013).

At the close of each school year, teachers guide students in selecting nine books from an educator-curated “Wish List” of up to 150 titles, spanning fiction, nonfiction, bilingual, and multicultural options. Titles are organized by reading level to ensure an appropriate match to each student’s reading skills. During an end-of-year family reading night, students receive their first three books. Parents, guardians, and caregivers are invited to attend, where teachers provide direct guidance and a bilingual Parent Guide from Kids Read Now (KRN). Families are further encouraged to commit to supporting summer reading and helping students maintain progress.

The home-based component includes mailing up to six additional books over the summer. In addition to delivering the books, KRN is designed to promote the kinds of parent–child interactions shown to strengthen children’s oral language, vocabulary, and reading comprehension. Each book delivered through KRN includes a “Discovery Sheet” with structured conversation prompts that reflect well-established practices such as dialogic reading (Whitehurst et al., 1988), extended discussion (Dickinson et al., 2010; Snow, 2010), and elaborative reminiscing (Reese et al., 2010). The prompts encourage caregivers to ask open-ended “why” and “how” questions, invite personal connections to story events, and sustain multi-turn conversations over repeated readings. Together, the materials help children develop both code-focused and meaning focused literacy skills that advance reading comprehension (Dahl-Leonard et al., 2025). Through weekly reminders and the sequence of nine books, KRN fosters sustained home-based engagement that mirrors the interactive, elaborative language environments associated with gains in vocabulary and comprehension. These features align with evidence showing that summer literacy interventions are most effective when they pair access to books with explicit supports for parent–child talk and shared reading (Kim & Quinn, 2013; Wasik & Hindman, 2020).

From a theoretical standpoint, KRN operates on the premise that sustained, interactive parent–child dialogue around texts mediates the relationship between book access and reading growth. The program assumes that simply providing books, while necessary, is insufficient to yield robust literacy gains unless accompanied by conversational scaffolds that promote meaning making and oral language development. By embedding dialogic and elaborative prompts in each book and reinforcing engagement through regular parent communications, KRN cultivates the kinds of cognitively and linguistically rich exchanges that research has identified as proximal causes of literacy development (Whitehurst & Lonigan, 1998; Dickinson et al., 2010). These exchanges foster deeper comprehension monitoring, narrative reasoning, and vocabulary expansion—skills that, in turn, support the maintenance and acceleration of reading achievement over the summer months (Kim & Quinn, 2013; Mol et al., 2008). Thus, KRN’s theory of change positions family talk as an important ingredient linking book access with conversational engagement, language growth, and reading achievement, aligning its design with decades of research on the social-interactional foundations of literacy learning.

This book distribution model has similarities to the annual book fair approach of Allington et al. (2010). Though the prior model studied by Allington and colleagues included multi-year book distributions, the KRN model includes a similar yearly volume of 9 books with an arguably greater emphasis on establishing the home-school connection in order to complement and support increased reading engagement during the summer. One prior study by Borman et al. (2020) reported a mean effect size of d = 0.12 on post-test reading outcomes and a treatment-on-the-treated (TOT) effect of d = 0.18 for those students who received and read all nine books delivered by KRN. This study offers an opportunity to directly replicate this prior study with a new and larger cohort of students and schools.

1.2. The Current Study

Drawing on administrative data and reading achievement data provided by two Midwestern school districts for five participating schools, we analyze the literacy impacts of the KRN summer reading program. We apply propensity score matching methods to match participating KRN students with similar comparison students. Specifically, we answer four research questions:

What is the non-experimental impact of KRN on participating students’ literacy outcomes relative to their non-participating peers?
To what extent does active engagement in the KRN program, as measured by the number of books participating students received, predict students’ literacy outcomes?
Are impacts observed in response to questions 1 and 2 moderated by the district context and the students’ grade level?

In the following pages of this paper, we describe our methodology, including the student and school samples, information about how KRN was implemented in the two participating districts, and the measures that we used for matching and analysis of program impacts. Next, we present the analytical results, which first demonstrate that we achieved baseline equivalence between the KRN and non-KRN samples on all measures. Because treatment and comparison students were within 0.16 SDs on all three pretest measures, the analytical sample clearly meets the baseline equivalence criterion of 0.25 SDs for quasi-experimental studies established by the What Works Clearinghouse (2022). Our presentation of our estimates of the overall impacts of KRN and the estimates of the full impacts of the program follow. Finally, in our Section 4, we briefly contextualize the practical significance of the results.

2. Method

2.1. Sample

We employed data provided by the Troy City School District in Ohio and by the Battle Creek School District in Michigan to evaluate the effects of the KRN program on students’ reading achievement. The Battle Creek Public School system, which has three schools participating, is an urban school district and the Troy City School District, which has two schools participating in KRN, is located in a suburban setting. The three Battle Creek schools include Verona Elementary School, which serves 308 students in grades PreK-6, 59% of whom are minority students (16% Hispanic, 30% Black, and 13% multiracial) and 84% of whom are eligible for free or reduced-price lunch. Valley View Elementary serves 560 students in grade PreK-5, 55% of whom are minority students (22% Black, 13% Asian, and 11% multiracial) and 87% of whom are eligible for free and reduced-price lunch. Finally, Dudley Elementary school serves 391 students in grade 2–4, 17% of whom are minority students (10% Hispanic, 3% Black, and 4% multiracial) and 34% of whom are eligible for free and reduced-price lunch.

From Troy City, Hook Elementary enrolls 248 children in kindergarten through fifth grade. Of these, 16.1% identify as students of color, with the largest groups being multiracial (8.5%), African American (4.4%), Hispanic (2%), and Asian (1.2%). Roughly 40% of the student body qualifies for free or reduced-price lunch. Kyle Elementary, also part of the Troy City district, serves 212 students in the same grade span. Its student population is 20.3% minority, including 11.3% multiracial, 5.7% Hispanic, and 3.3% African American students, and 59% of its students are eligible for free or reduced-price lunch. Together, the two schools provide variation in district and school contexts as well as in student demographics. Table 1 presents the overall sample sizes for schools and students.

2.2. KRN Implementation in Troy City and Battle Creek

In both districts, staff made active efforts to encourage participation by reaching out to students and families, rather than relying solely on those who volunteered on their own. Participation, however, was ultimately based on student self-selection into KRN. Principals in Troy City and Battle Creek adopted different approaches to recruitment. In Troy City, the principal targeted enrollment toward students demonstrating weaker reading performance, prioritizing those with the highest needs. By contrast, Battle Creek opened the program to all students regardless of prior achievement. To address these implementation differences, our quasi-experimental design limited both the student matching process and outcome analyses to within-school comparisons.

Across all five schools, teachers were given a catalog of 120 titles to guide students in choosing nine books for the program. With teacher support and parental consent, students selected their books and were informed they could receive up to nine free, keep-at-home books over the summer, along with a prize for reporting their reading. Each school hosted a Family Reading Event where students collected their first three books and parents received guidance on supporting reading at home. Students were encouraged to complete book-specific comprehension activities, aligned with the Lexile level of each text and printed on stickers inside the covers. To sustain engagement, KRN contacted families weekly by phone, text, email, or app, requesting confirmation when a child had finished a book and completed at least one activity. Once a parent reported completion, KRN mailed the next preselected book directly to the child’s home. Students who completed all nine books were recognized with both a prize and a certificate when school resumed in the fall.

2.3. Transparency, Openness, and Research Ethics

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study following the American Psychological Association (APA) Journal Article Reporting Standards (JARS) (Kazak, 2018). Specifically, we determined the sample sizes in advance, we did not drop any variables, and we did not drop any conditions. Data were analyzed using STATA, version 16 (StataCorp, 2019). All data, analysis code, and research materials are available at https://deposit.icpsr.umich.edu/deposit/workspace?goToPath=/ddf/239082&goToLevel=project (accessed on 1 October 2025). This study’s design and its analysis were not pre-registered. No personally identifiable student information was provided to the researchers, the participating school districts consented to participate, and no institutional review board authorization was required.

2.4. Measures

2.4.1. Dependent Variable

We drew on students’ fall 2019 assessment data as the outcome measure of program impact. In the Ohio sites, literacy skills were gauged using aimswebPlus, which provides a broad assessment of early reading competencies including vocabulary, comprehension, and silent reading fluency (aimswebPlus, n.d.). In Michigan, students completed the Northwest Evaluation Association’s (NWEA) Measures of Academic Progress (MAP) Reading Fluency assessment, which evaluates oral reading fluency, comprehension, and foundational literacy skills (NWEA, n.d.). Because the two districts administered different tests that, nevertheless, captured related aspects of literacy achievement, we standardized scores by grade level within each district to a common mean and standard deviation, following the guidance of May et al. (2009). Specifically, the standardization procedures subtracted each student’s from the grade-specific mean within each district and divided the result by the district’s grade-specific standard deviation. Though all testing was performed within the first month of the school year, this procedure further accounted for slight differences in timing across grades and districts.

2.4.2. Independent Variables

We used three prior assessment scores as baseline measures of reading achievement: fall 2018, winter 2019, and spring 2019 results from either aimswebPlus (Ohio) or NWEA MAP Reading Fluency (Michigan). Consistent with the outcome variable, each pretest score was standardized within district and grade. These three standardized scores capture students’ reading performance before the KRN summer implementation. District records also supplied demographic and contextual covariates: sex (binary; 1 = female, 0 = male), race/ethnicity coded as separate indicators for Asian, White, Hispanic, Black, and multiracial, economically disadvantaged status (EDS; 1 = eligible for free/reduced-price lunch, 0 = not eligible), and each student’s school and grade. For all analyses, we employed a complete case analysis, excluding any cases with missing values on the dependent and independent variables.

For treatment students, KRN provided summer delivery logs indicating how many books each child requested and received. All participants were initially given three books at school and could receive up to six additional titles over the summer; the mean number received was 6.39 (SD = 2.36). We include the number of books received in models to examine possible dosage effects. Finally, following best practice for quasi-experimental matching, we included school and grade indicators in the propensity-score models to improve local comparability between treated and comparison students (Glazerman et al., 2002).

2.5. Analytical Strategy

2.5.1. Propensity Score Matching Methods

Because participation in KRN was voluntary at both the school and student level, simple posttreatment comparisons would risk yielding biased estimates of program impact due to self-selection. To strengthen causal inference, we applied propensity score matching (PSM) to construct analytically comparable treatment and comparison groups. Matching was performed on a rich set of baseline covariates, including three pre-intervention reading achievement scores, demographic indicators, and school- and grade-level identifiers. This approach increases the plausibility of equivalence between groups in expectation (Rubin, 2001). Consistent with prior methodological guidance (Cook & Steiner, 2010), we included prior achievement as a key matching variable. Our method placed particular weight on prior achievement by incorporating three pretest measures. This approach not only improves balance on the key construct of prior achievement but also mitigates the influence of measurement error that arises when relying on a single test.

The literature on propensity score matching suggests that impact estimation is most efficient and effective in situations with more non-treated comparison than treated subjects (Stuart, 2010; Pirracchio et al., 2016). The situation in the current study did provide a larger pool of comparison than treatment students. Specifically, we identified a total of 161 students as KRN participants in 1st grade through 4th grade, who had complete data from the two districts. A total of 440 students, who were enrolled at the five schools but did not participate in the KRN program, was identified as the comparison group pool.

We used one-to-three matching with replacement with a caliper of 0.3 on the propensity score distance. For instance, a treated student with a propensity score of 0.1 would be matched with only those comparison cases whose propensity scores are 0.4 or less. We chose one-to-three matching, a ratio matching method, rather than one-to-one nearest neighbor matching because we detected the comparison group’s propensity scores exhibited a highly right-skewed distribution. Due to this outcome, conventional one-to-one matching would result in suboptimal matches, with dissimilar non-KRN students being matched with many KRN students with propensity scores close to a value 1. As an example, when using a one-to-one nearest matching algorithm with replacement a single comparison student was matched to 19 of the 41 treatment students from a single grade-by-school block.

Because 1:k ratio matching can improve flexibility and reduce variance when comparison cases are concentrated in the tails of the propensity score distribution (Stuart, 2010; Lanza et al., 2013), we employed one-to-three nearest-neighbor matching with replacement. Our initial tests indicated that one-to-one and one-to-two matching were suboptimal given the skewed distribution of propensity scores; these specifications neither alleviated the concentration problem nor achieved satisfactory covariate balance. A one-to-four match yielded results highly similar to the one-to-three specification but offered no additional efficiency gains. Sensitivity analyses confirmed that the intent-to-treat (ITT) effect was highly stable across 1:2, 1:3, and 1:4 ratios. We therefore adopted the 1:3 ratio as our primary specification because it provided the best balance in the bias–variance trade-off—enhancing statistical power relative to 1:1 matching while minimizing potential bias from higher ratios. Each treatment student was matched to the three comparison students with the nearest propensity scores, allowing replacement so that comparison cases could serve as matches for multiple treated students (Caliendo & Kopeinig, 2008). To account for this in subsequent analyses, each comparison student was weighted by 1/3.

To calculate the conditional probability of receiving the treatment based on the predetermined covariates, which are called propensity scores, we used logistic regression. The logistic regression model included the covariates mentioned above and the statistical interactions between economic status and gender, race/ethnicity and gender, race/ethnicity and economic status race/ethnicity. Our matching model, shown below as Equation (1), was specified as follows:

\begin{matrix} l o g i t (T r e a t m e n t) & = & α + β_{1} {S p r i n g 2019}_{i} + β_{2} {W i n t e r 2019}_{i} + β_{3} {F a l l 2018}_{i} + β_{4} {F e m a l e}_{i} \\ + & β_{5} {E c o n D i s}_{i} + β_{6} {E c o n D i s}_{i} \times {F e m a l e}_{i} \\ + & \sum γ_{1} {R a c e}_{i} + \sum γ_{2} {R a c e}_{i} \times {F e m a l e}_{i} + \sum γ_{3} {R a c e}_{i} \times {E c o n D i s}_{i} \\ + & \sum γ {S c h o o l / g r a d e}_{i} + ε_{i}, \end{matrix}

(1)

Specifically, we estimate the log odds of participation in KRN as a function of students’ pre-treatment spring, winter, and fall test scores, gender, economically disadvantaged status, a vector of race/ethnicity indicators, the above-mentioned interaction terms, a vector of school-by-grade level indicators, and a student-specific error term, ε_i.

2.5.2. KRN Quasi-Experimental Impact Estimates

After constructing comparable groups through matching, we formulated two main models to gain estimates of the treatment effect of the KRN program on academic achievement. The first, shown below as Equation (2), was a doubly robust regression model to estimate the quasi-experimental intent-to-treat (ITT) effect estimate, as follows:

\begin{matrix} E (Y_{i s g}| P S) = α + & β_{1} {T r e a t m e n t}_{i s g} + β_{2} {S p r i n g 2019}_{i s g} + β_{3} {W i n t e r 2019}_{i s g} + β_{4} {F a l l 2018}_{i s g} \\ + β_{5} {F e m a l e}_{i s g} + β_{6} {E c o n D i s}_{i s g} + β_{7} {E c o n D i s}_{i} \times {F e m a l e}_{i} \\ + \sum γ_{1} {R a c e}_{i} + \sum γ_{2} {R a c e}_{i} \times {F e m a l e}_{i} + \sum γ_{3} {R a c e}_{i} \times {E c o n D i s}_{i} \\ + \sum γ {S c h o o l / g r a d e}_{i} + π_{s g} + ε_{i} \end{matrix}

(2)

Doubly robust estimation combines a matching model with a weighted ordinary least squares (OLS) model to estimate the causal effect of treatment on the outcome,

(Y_{i s g}| P S)

, producing a consistent estimate if either the propensity score model or the outcome model is correctly specified (Funk et al., 2011; Kang & Schafer, 2007; Tan, 2010). In this application, the doubly robust approach integrates outcome regression with a propensity score model for treatment assignment, allowing us to adjust for residual bias (Funk et al., 2011; Linden, 2017; Robins et al., 2007). To enhance precision, we include all covariates used in the matching procedure: the three pretest scores, indicators for gender and economically disadvantaged status, interaction terms, and vectors for race/ethnicity and school-by-grade combinations. This model, Equation (3), was specified as follows:

Y_{i j g} = β_{0} + β_{1} {Treat}_{i} + X_{i}^{'} γ + α_{j g} + ϵ_{i m}

(3)

where

Y_{i j g}

is the standardized fall 2019 test score for student in school-grade block,

β_{1} {Treat}_{i}

is an indicator variable equal to 1 if the student participated in the KRN program, and 0 otherwise, which provides the non-experimental impact estimate of interest. The term

X_{i}^{'} γ

is a vector of control variables, including pre-test scores, student demographics (race, gender), economic disadvantaged status, and all two-way interactions between them,

α_{j g}

represents the school-by-grade fixed effects, which control for all time-invariant characteristics at the school-grade level, and

ϵ_{i m}

is the matching-aware error term.

Weights for all analyses were derived from a 1:3 nearest-neighbor propensity score matching procedure conducted with replacement and a caliper of 0.3. In this scheme, all students participating in the KRN treatment received a weight of 1 and each matched comparison unit received a weight based on the frequency of its use as a match. Control units not selected as a match received a weight of 0 and are thus excluded from the weighted regression analysis. All models were estimated at the student level with cluster-robust (CR2) standard errors computed at the matched-set level to account for within-set correlation in residuals arising from the 1:3 nearest-neighbor matching design (Stuart, 2010; Pustejovsky & Tipton, 2018).

In addition to the doubly robust regression, we implement a two-stage least squares (2SLS) regression to estimate the treatment-on-the-treated (TOT) effect (Angrist & Imbens, 1995; Ichimura & Taber, 2001). This approach is particularly useful for two reasons. First, it accounts for variation in treatment uptake, as students differ in the number of books they read over the summer—some completing only the initial three books, others requesting and receiving up to six additional books. Second, the 2SLS model provides a refined estimate of the causal effect associated with each additional book received, allowing us to examine potential dosage effects of KRN participation.

To perform this analysis, the following models were estimated using a weighted 2SLS regression analysis. The first-stage model is labeled Equation (4), and the second-stage model is shown as Equation (5):

# B o o k s = α_{0} + α_{1} T r e a t m e n t + α X + δ

(4)

{P o s t t e s t}_{F a l l 2019} = β_{0} + β_{1} # B o o k s + β X + ε

(5)

In the two-stage least squares (2SLS) framework, the endogenous explanatory variable—the number of books received—is first regressed on the instrumental variable, which in this case is KRN program participation. Predicted values from this first-stage regression are then substituted for the actual number of books received in the second-stage regression, which predicts the posttest fall 2019 reading outcome. The second-stage coefficient, β₁, represents the average treatment effect for each additional book received, or the “treatment-on-the-treated” (TOT) dosage effect. For students in the treatment group, we use the actual number of books received. Because KRN’s records confirm that no comparison students received the book distributions, the variable is zero for all comparison cases. The overall set of covariates in the two stages of the model, noted as

α X

and

β X

, mirror those used in the matching and doubly robust analyses, including pretest scores, demographic indicators, race/ethnicity, and school-by-grade variables. These covariates account for any residual differences between treatment and comparison students and improve estimation precision.

We treat the number of books requested as endogenous because factors influencing requests—such as motivation, reading interest, or aptitude—may also affect posttest outcomes independently of KRN participation. By using program enrollment as an instrument for books received, we isolate exogenous variation attributable to the program itself. While participation was not randomly assigned, propensity score matching suggests that treatment and comparison students were balanced on observable baseline characteristics. Like other quasi-experimental intent-to-treat analyses, TOT estimates may still be biased if unmeasured factors correlate with both program participation and outcomes.

Nonetheless, this approach provides a more informative estimate than a simple regression of reading outcomes on books received. If the primary mechanism of KRN is the delivery of student-selected books, this TOT analysis offers a quasi-experimental estimate of the causal impact for students who received the intended “full dose” of the intervention (i.e., up to nine books). Because we do not expect KRN to affect fall posttest scores through channels other than the books themselves, the 2SLS framework yields a defensible estimate of the program’s dosage effect.

3. Results

3.1. Descriptive Statistics and Balance Checks

Table 2 summarizes the baseline characteristics of students in the treatment and control groups, both before and after matching. Reported variables include the three pretest scores along with the full set of demographic indicators. To assess treatment–control balance, we conducted t-tests for the continuous pretest measures and chi-square tests for the categorical covariates. Prior to matching, none of the differences on the three pretests reached statistical significance. On average, treatment group students scored slightly higher than control students on these assessments, with differences ranging from 0.09 to 0.14 standard deviations. However, as Table 2 shows, statistically significant differences were found between KRN and comparison students for many of the demographic characteristics and their corresponding interaction terms.

The right-hand panel of Table 2 presents descriptive statistics for the matched sample following the PSM procedure. While minor differences persisted, none of the mean differences—whether for pretest scores or demographic characteristics—was statistically significant. The final analytic sample comprised 110 of the 116 treatment students and 156 of the 440 comparison students. Arriving at this sample required several steps. First, 142 comparison students and 4 treatment students were excluded because within certain grade-by-school blocks, all students with a given characteristic (e.g., all Hispanic students) were KRN participants, leaving no variability for matching. From the remaining pool of 298 comparison and 112 treatment students, two treatment students were dropped due to the absence of suitable propensity score matches. The final matched dataset consisted of 110 treatment cases linked to 156 controls, with some controls matched to more than one treatment student. Specifically, one treatment student was matched to three comparison cases in 102 situations, five treatment students were matched to two comparison students in 5 cases, and three treatment students were linked to a single comparison student in 3 cases. Analytic weights were then applied so that the 156 controls and 110 treatment students were weighted to yield an effective sample size of 133 students in each group.

In Figure 1, the left-hand panel illustrates the distribution of propensity scores for the comparison group, while the right-hand panel presents the distribution for the treatment group. Within the comparison sample, cases that were not successfully matched to treatment students are shown in blue and labeled “Before Matching,” whereas the 133 weighted comparison cases retained after matching are displayed in red and labeled “After Matching.” The kernel density estimates demonstrate that, following matching, the treatment and control groups exhibit nearly identical distributions of propensity scores.

Table 3 reports the results of several balance diagnostics, including standardized mean differences, variance ratios, eta-squared statistics, and hypothesis tests of mean differences, to evaluate whether treatment and control students were statistically comparable after matching (Lee, 2013; Richardson, 2011; Zhang et al., 2019). These measures provide a more nuanced assessment of the quality of the PSM procedure by showing whether group differences were reduced, variance ratios approached unity, and eta-squared effect sizes diminished. As displayed in Table 3, the matching process substantially improved covariate balance, with no baseline measure or interaction term differing by more than 0.25 standard deviations.

3.2. Quasi-Experimental Estimates of Treatment Effects

Our primary analysis contrasts the fall posttest performance of KRN participants with that of the matched comparison group. The left panel of Table 4 reports results from the doubly robust regression model, which estimates the quasi-experimental impact of KRN participation on fall 2019 reading achievement while adjusting for covariates and school-by-grade fixed effects. Findings indicate that students in the treatment group scored significantly higher than their counterparts in the comparison group. The corresponding effect size, calculated by dividing the estimated coefficient by the pooled standard deviation of the outcome measure, was 0.145 (95% CI = [0.004, 0.286]).

The right panel of Table 4 presents estimates from the 2SLS model. Since students could only obtain KRN books by enrolling in the program, and because book receipt is the theorized channel through which KRN influences reading achievement, program participation serves as a strong instrument, with book receipt functioning as the primary mediator. To strengthen this exclusion restriction argument, we report a reduced-form check, the first-stage F-statistic, and instrument strength. The reduced-form (ITT) effect was 0.149 (p = 0.042). Further, the first-stage analysis showed that the coefficient on program assignment was 6.38 (p < 0.001). The cluster-robust F-statistic for the instrument was 254, far exceeding the conventional threshold of 10, and the partial R-squared was 0.807. These results confirm that our instrumental variable is very strong and appropriate for the analysis.

The results of the 2SLS model suggested that, after controlling for covariates and school-by-grade fixed effects, each additional book provided through KRN was associated with a statistically significant gain of 0.023 (95% CI = [0.002, 0.044]) standard deviations in fall 2019 reading scores. Given that students could receive up to six books beyond the initial three, the model projects that a participant who obtained the full set of nine books would experience an improvement of approximately 0.21 standard deviations on the fall reading outcome.

3.3. Supplemental Analyses

3.3.1. Impact Estimate Differences by Grade Level

Beyond the overall effects across all schools and grades, Table 5 reports marginal effect estimates by grade level (1–4) derived from formal interaction tests. Though the ITT effect appeared descriptively larger for first graders, formal interaction tests indicated that the differences across grade levels were not statitically significant (1st vs. 2nd: β = −0.131, SE = 0.216, p = 0.546; 1st vs. 3rd: β = −0.202, SE = 0.204, p = 0.326; 1st vs. 4th: β = −0.238, SE = 0.174, p = 0.175). Only one grade level treatment effect, for grade 1, reached statistical significance—likely due to the smaller sample sizes within each grade—and the ITT and TOT estimates both indicate that the descriptively largest impacts emerged for first graders. These results imply that, while the youngest students may have derived the greatest benefits, both in terms of overall achievement gains and in relation to the number of KRN books they received, we lack sufficient statistical power to conclude that these differences are definitive.

3.3.2. Impact Estimate Differences by District

Due to the differences in student recruitment into KRN across the two districts—with the two Battle Creek schools using open enrollment and the Troy City school focusing on lower-performing students—we examined whether program impacts varied by district.

We estimated interaction effects and calculated corresponding marginal effects from the interactions. The results reveal that the interaction of district is not statistically significant (β = −0.098, SE = 0.157, p = 0.533). However, we also find that the program’s effect was positive and statistically significant in the Battle Creek (targeted) district (β = 0.172, SE = 0.072, p = 0.044), whereas it was smaller and not significant in the Troy City (open enrollment) district (β = 0.101, SE = 0.109, p = 0.380). Similarly, the TOT estimates reveal a positive and statistically significant impact in the Battle Creek (targeted) district (β = 0.029, SE = 0.011, p = 0.005), whereas it was smaller and not significant in the Troy City (open enrollment) district (β = 0.013, SE = 0.014, p = 0.353). Again, though, because we restricted the matching of students and the analyses of their outcomes to within-school and within-grade-level comparisons, our analytical and matching methods statistically account for these differences across districts and schools.

4. Discussion

4.1. Connections to Prior Evidence

Our results suggest that KRN participants outperformed comparison group students, with a mean effect size of nearly d = 0.15. Additional model estimates of the impacts for those students who read more of the books provided by KRN revealed that those who received all 9 books realized an effect size of d = 0.21 relative to the outcomes for matched comparison students. Supplemental analyses revealed larger impacts in Battle Creek relative to Troy City. Finally, the estimated effects seemed particularly strong for first grade students.

How should one interpret the impact of KRN with respect to the overall research base on home-based book distribution and reading programs? The average effect size across all studies reviewed by Kim and Quinn (2013) was equivalent to an effect size of d = 0.10 (95% CI: 0.04 to 0.15), reflecting a statistically significant positive effect on overall reading achievement. The KRN impact of d = 0.15 places at the top of the 95% confidence interval of impacts noted in the meta-analysis by these authors.

Beyond the comparison to prior findings, the magnitude of the KRN program’s impact is, more broadly, educationally meaningful. Using commonly cited benchmarks for interpreting standardized achievement gains (Bloom et al., 2008; Hill et al., 2008), the estimated average impact of KRN corresponds approximately to more than two months of additional learning—representing about 23% of the academic progress a typical student makes during a nine-month school year. This translation provides a useful interpretive heuristic, though it should be understood as an approximate indicator of magnitude rather than a literal measure of instructional time. Similarly, for students who completed the full program and received all nine books, the estimated gain was equivalent to more than three months of learning, or approximately 33% of annual progress. Consistent with prior KRN findings (Borman et al., 2020), full participation appears sufficient to offset the typical summer learning loss among low-income students—an effect roughly equivalent to about two months of learning, based on widely used interpretive benchmarks.

The prior Borman et al. (2020) quasi-experimental impact study of KRN reported a mean effect size of d = 0.12 on post-test reading outcomes and a treatment-on-the-treated (TOT) effect of d = 0.18 for those students who received and read all nine books delivered through the program. The current findings seem to replicate those from the prior Borman et al. (2020) in terms both magnitude and procedure. In light of ongoing concerns about the “replication crisis” in psychology and education research (Makel et al., 2012, 2016; Makel & Plucker, 2014, 2015; Plucker & Makel, 2021; Pridemore et al., 2018), such consistency across two independent evaluations is both rare and noteworthy. Indeed, replication remains uncommon in education research, with estimates suggesting that only 0.13% of published studies are replications, and just 28% of those qualify as direct replications (Makel et al., 2012, 2016; Makel & Plucker, 2014, 2015; Plucker & Makel, 2021; Pridemore et al., 2018).

Even so, prior methodological work underscores that a single replication is insufficient to firmly establish an intervention’s effectiveness (Hedges & Schauer, 2019). Replicating KRN in additional contexts and with larger, more diverse samples would strengthen confidence in its impacts. Such efforts are essential not only for reducing the risks of false positives and inflated effect sizes (Camerer et al., 2018; Slavin & Smith, 2009) but also for building a cumulative evidence base that justifies broader adoption and investment of scarce educational resources (Plucker & Makel, 2021; Tyson, 2014). Further, the quasi-experimental estimates presented here are certainly not without their limitations. Future research with a true randomized design would offer the field greater confidence in the program’s true causal impacts.

4.2. Substantive and Theoretical Connections

From a more substantive and theoretical standpoint, the connection between books in the home and student achievement has been one of the most persistent empirical relationships observed in education research. As early as 1916, Holley, concluded:

“If a person wished to forecast, from a single objective measure, the probable educational opportunities which the children of the home have, the best measure would be the number of books in the home.”
(Holley, 1916, p. 100)

The Coleman Report (Coleman et al., 1966) identified the available reading materials in the home as one of the six key objective family background factors linked to student performance—a conclusion reaffirmed in later analyses (Borman & Dowling, 2010) and in subsequent research across economics, sociology, and education (e.g., Duncan & Magnuson, 2005; Evans et al., 2010; Fryer & Levitt, 2004; Hanushek & Woessmann, 2011; Linver et al., 2002; Manu et al., 2019).

Scholars have offered two primary interpretations of this association. From a cultural capital perspective (Bourdieu, 1986), books may act as a proxy for broader social, cultural, and material advantages, signaling a “scholarly culture” within the family that values and encourages reading, learning, and critical thinking (Evans et al., 2014). This alignment with the norms of formal schooling may ease academic transitions and confer advantages in navigating educational systems (Neuman & Moland, 2019). In contrast, skill development theory frames books in the home as a tangible resource that directly fosters academic growth by increasing reading volume—an activity consistently linked to higher reading achievement (Allington & McGill-Franzen, 2021; Mol & Bus, 2011). From this perspective, family investments in books exert independent effects on learning beyond parental education and other forms of cultural capital (Sikora et al., 2019).

Because much of the evidence to date has been purely correlational, the causal mechanisms underlying this relationship remain debatable. The quasi-experimental results presented here contribute to this debate by providing evidence more consistent with the skill-development perspective. Increasing access to books contributes to educational success and that outcome can be achieved via increased reading volume and literacy skill development for all students regardless of family background. Indeed, literacy skills are the foundation of education, and future investments in relatively low-cost and easily replicable book distribution programs, like KRN, may be extremely helpful to children and the schools they attend.

Author Contributions

Conceptualization, G.D.B.; Methodology, G.D.B. and H.Y.; Formal analysis, H.Y.; Data curation, H.Y.; Writing—original draft, G.D.B.; Writing—review & editing, H.Y.; Project administration, G.D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to this work is considered program evaluation for the Kids Read Now (KRN) program. All data were deidentified and provided to authors directly by the Kids Read Now organization. As such, it was a secondary data analysis of deidentified data in the context of a program evaluation. Such work does not require IRB approval and is not under the purview of an IRB. When authors originally did the work, they checked with their university’s IRB and received the correspondence (does not require IRB review/approval). The specific IRB exemption category (45 CFR 46.104) that applies to this educational program evaluation is “Category 1: Educational Practices”.

Informed Consent Statement

Student consent was waived because this work is considered program evaluation for the Kids Read Now (KRN) program. All data were deidentified and provided to authors directly by the Kids Read Now organization. As such, it was a secondary data analysis of deidentified data in the context of a program evaluation. Such work does not require IRB approval and is not under the purview of an IRB. When authors originally did the work, they checked with their university’s IRB and received the correspondence (does not require IRB review/approval). The specific IRB exemption category (45 CFR 46.104) that applies to this educational program evaluation is “Category 1: Educational Practices”.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

aimswebPlus. (n.d.). aimswebPlus overview. Available online: https://www.pearsonassessments.com/content/dam/school/global/clinical/us/assets/aimswebPlus-overview.pdf (accessed on 1 March 2020).
Allington, R. L., McGill-Franzen, A., Camilli, G., Williams, L., Graff, J., Zeig, J., & Nowak, R. (2010). Addressing summer reading setback among economically disadvantaged elementary students. Reading Psychology, 31(5), 411–427. [Google Scholar] [CrossRef]
Allington, R. L., & McGill-Franzen, A. M. (2021). Reading volume and reading achievement: A review of recent research. Reading Research Quarterly, 56(S1), S231–S238. [Google Scholar] [CrossRef]
Angrist, J. D., & Imbens, G. W. (1995). Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association, 90(430), 431–442. [Google Scholar] [CrossRef]
Atteberry, A., & McEachin, A. (2021). School’s out: The role of summers in understanding achievement disparities. American Educational Research Journal, 58(2), 239–282. [Google Scholar] [CrossRef]
Bloom, H. S., Hill, C. J., Black, A. B., & Lipsey, M. W. (2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1, 289–328. [Google Scholar] [CrossRef]
Borman, G. D., & Dowling, M. (2010). Schools and inequality: A multilevel analysis of Coleman’s equality of educational opportunity data. Teachers College Record, 112(5), 1201–1246. [Google Scholar] [CrossRef]
Borman, G. D., Schmidt, A., & Hosp, M. (2016). A national review of summer school policies and the evidence supporting them. In The summer slide: What we know and can do about summer learning loss (pp. 90–107). Teachers College Press. [Google Scholar]
Borman, G. D., Yang, H., & Xie, X. (2020). A quasi-experimental study of the impacts of the Kids Read Now summer reading program. Journal of Education for Students Placed at Risk, 26, 316–336. [Google Scholar] [CrossRef]
Bourdieu, P. (1986). The forms of capital. In J. Richardson (Ed.), Handbook of theory and research for the sociology of education (pp. 241–258). Greenwood. Available online: https://home.iitk.ac.in/~amman/soc748/bourdieu_forms_of_capital.pdf (accessed on 1 October 2025).
Briggs, D. C., & Wellberg, S. (2022). Evidence of “summer learning loss” on the i-Ready Diagnostic Assessment. The Center for Assessment, Design, Research and Evaluation (CADRE), University of Colorado Boulder. Available online: https://www.colorado.edu/cadre/2022/09/27/evidence-summer-learning-loss-i-ready-diagnostic-assessment (accessed on 1 October 2025).
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22(1), 31–72. [Google Scholar] [CrossRef]
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2, 637–644. [Google Scholar] [CrossRef]
Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity (Report No. OE-38001). U.S. Department of Health, Education, and Welfare, Office of Education. Available online: https://eric.ed.gov/?id=ED012275 (accessed on 1 October 2025).
Cook, T. D., & Steiner, P. M. (2010). Case matching and the reduction of selection bias in quasi-experiments: The relative importance of pretest measures of outcome, of unreliable measurement, and of mode of data analysis. Psychological Methods, 15(1), 56–68. [Google Scholar] [CrossRef]
Cooper, H., Charlton, K., Valentine, J. C., Muhlenbruck, L., & Borman, G. D. (2000). Making the most of summer school: A meta-analytic and narrative review. Monographs of the Society for Research in Child Development, 65(1), i–127. [Google Scholar]
Cooper, H., Nye, B., Charlton, K., Lindsay, J., & Greathouse, S. (1996). The effects of summer vacation on achievement test scores: A narrative and meta-analytic review. Review of Educational Research, 66(3), 227–268. [Google Scholar] [CrossRef]
Dahl-Leonard, K., Hall, C., Cho, E., Capin, P., Roberts, G. J., Kehoe, K. F., Haring, C., Peacott, D., & Demchak, A. (2025). Examining the effects of family-implemented literacy interventions for school-aged children: A meta-analysis. Educational Psychology Review, 37(1), 10. [Google Scholar] [CrossRef]
Dickinson, D. K., Golinkoff, R. M., & Hirsh-Pasek, K. (2010). Speaking out for language: Why language is central to reading development. Educational Researcher, 39(4), 305–310. [Google Scholar] [CrossRef]
Downey, D. B. (2024). How does schooling affect inequality in cognitive skills? The view from seasonal comparison research. Review of Educational Research, 94(6), 927–957. [Google Scholar] [CrossRef]
Dujardin, E., Ecalle, J., Gomes, C., & Magnan, A. (2022). Summer reading program: A systematic literature review. Social Education Research, 4(1), 108–121. [Google Scholar] [CrossRef]
Duncan, G. J., & Magnuson, K. (2005). Can family socioeconomic resources account for racial and ethnic test score gaps? The Future of Children, 15(1), 35–54. [Google Scholar] [CrossRef] [PubMed]
Evans, M. D. R., Kelley, J., & Sikora, J. (2014). Scholarly culture and academic performance in 42 nations. Social Forces, 92(4), 1573–1605. [Google Scholar] [CrossRef]
Evans, M. D. R., Kelley, J., Sikora, J., & Treiman, D. J. (2010). Family scholarly culture and educational success: Books and schooling in 27 nations. Research in Social Stratification and Mobility, 28(2), 171–197. [Google Scholar] [CrossRef]
Fryer, R. G., & Levitt, S. D. (2004). Understanding the Black–white test score gap in the first two years of school. Review of Economics and Statistics, 86(2), 447–464. [Google Scholar] [CrossRef]
Funk, M. J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly robust estimation of causal effects. American Journal of Epidemiology, 173(7), 761–767. [Google Scholar] [CrossRef]
Glazerman, S., Levy, D. M., & Myers, D. (2002). Nonexperimental replications of social experiments: A systematic review. Mathematica Policy Research, Inc. Available online: https://www.researchgate.net/publication/254430866_Nonexperimental_Replications_of_Social_Experiments_A_Systematic_Review_Interim_ReportDiscussion_Paper (accessed on 1 October 2025).
Hanushek, E. A., & Woessmann, L. (2011). The economics of international differences in educational achievement. In E. A. Hanushek, S. Machin, & L. Woessmann (Eds.), Handbook of the economics of education (Vol. 3, pp. 89–200). Elsevier. Available online: https://econpapers.repec.org/bookchap/eeeeduhes/3.htm (accessed on 1 October 2025).
Hedges, L. V., & Schauer, J. M. (2019). More than one replication study is needed for unambiguous tests of replica-tion. Journal of Educational and Behavioral Statistics, 44(5), 543–570. [Google Scholar] [CrossRef]
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177. [Google Scholar] [CrossRef]
Holley, C. E. (1916). The relationship between persistence in school and home conditions. University of Chicago Press. Available online: https://archive.org/details/relationshipbetw0005holl (accessed on 1 October 2025).
Ichimura, H., & Taber, C. (2001). Propensity-score matching with instrumental variables. American Economic Review, 91(2), 119–124. [Google Scholar] [CrossRef]
Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539. [Google Scholar] [CrossRef]
Kazak, A. E. (2018). Editorial: Journal article reporting standards. American Psychologist, 73(1), 1–2. [Google Scholar] [CrossRef] [PubMed]
Kim, J. S., Guryan, J., White, T. G., Quinn, D. M., Capotosto, L., & Kingston, H. C. (2016). Delayed effects of a low-cost and large-scale summer reading intervention on elementary school children’s reading comprehension. Journal of Research on Educational Effectiveness, 9(Suppl. 1), 1–22. [Google Scholar] [CrossRef]
Kim, J. S., & Quinn, D. M. (2013). The effects of summer reading on low-income children’s literacy achievement from kindergarten to grade 8: A meta-analysis of classroom and home interventions. Review of Educational Research, 83(3), 386–431. [Google Scholar] [CrossRef]
Lanza, S. T., Moore, J. E., & Butera, N. M. (2013). Drawing causal inferences using propensity scores: A practical guide for community psychologists. American Journal of Community Psychology, 52(3–4), 380–392. [Google Scholar] [CrossRef]
Lee, W. S. (2013). Propensity score matching and variations on the balancing test. Empirical Economics, 44(1), 47. [Google Scholar] [CrossRef]
Linden, A. (2017). Improving causal inference with a doubly robust estimator that combines propensity score stratification and weighting. Journal of Evaluation in Clinical Practice, 23(4), 697–702. [Google Scholar] [CrossRef] [PubMed]
Linver, M. R., Brooks-Gunn, J., & Kohen, D. E. (2002). Family processes as pathways from income to young children’s development. Developmental Psychology, 38(5), 719–734. [Google Scholar] [CrossRef]
Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43(6), 304–316. [Google Scholar] [CrossRef]
Makel, M. C., & Plucker, J. A. (2015). An introduction to replication research in gifted education: Shiny and new is not the same as useful. Gifted Child Quarterly, 59(3), 157–164. [Google Scholar] [CrossRef]
Makel, M. C., Plucker, J. A., Freeman, J., Lombardi, A., Simonsen, B., & Coyne, M. (2016). Replication of special education research: Necessary but far too rare. Remedial and Special Education, 37(4), 205–212. [Google Scholar] [CrossRef]
Makel, M. C., Plucker, J. A., & Hegarty, C. B. (2012). Replications in psychology research: How often do they really occur? Perspectives on Psychological Science, 7(6), 537–542. [Google Scholar] [CrossRef]
Manu, A., Ewerling, F., Barros, A. J. D., & Victora, C. G. (2019). Association between availability of children’s book and the literacy-numeracy skills of children aged 36 to 59 months: Secondary analysis of the UNICEF multiple-indicator cluster surveys covering 35 countries. Journal of Global Health, 9(1), 010403. [Google Scholar] [CrossRef]
May, H., Perez-Johnson, I., Haimson, J., Sattar, S., & Gleason, P. (2009). Using state tests in education experiments: A discussion of the issues (NCEE 2009-013). National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Available online: https://files.eric.ed.gov/fulltext/ED511776.pdf (accessed on 10 October 2025).
McCombs, J. S., Augustine, C. H., Schwartz, H. L., Bodilly, S. J., McInnis, B. I., Lichter, D. S., & Cross, A. B. (2011). Making summer count: How summer programs can boost children’s learning. RAND Corporation. [Google Scholar]
Mol, S. E., & Bus, A. G. (2011). To read or not to read: A meta-analysis of print exposure from infancy to early adulthood. Psychological Bulletin, 137(2), 267–296. [Google Scholar] [CrossRef]
Mol, S. E., Bus, A. G., De Jong, M. T., & Smeets, D. J. (2008). Added value of dialogic parent–child book readings: A meta-analysis. Early Education and Development, 19(1), 7–26. [Google Scholar] [CrossRef]
Neuman, S. B., & Moland, N. (2019). Book deserts: The consequences of income segregation on children’s access to print. Urban Education, 54(1), 126–147. [Google Scholar] [CrossRef]
NWEA. (n.d.). The MAP suite. Available online: https://www.nwea.org/the-map-suite/ (accessed on 10 October 2025).
Pirracchio, R., Carone, M., Rigon, M. R., Caruana, E., Mebazaa, A., & Chevret, S. (2016). Propensity score estimators for the average treatment effect and the average treatment effect on the treated may yield very different estimates. Statistical Methods in Medical Research, 25(5), 1938–1954. [Google Scholar] [CrossRef]
Plucker, J. A., & Makel, M. C. (2021). Replication is important for educational psychology: Recent developments and key issues. Educational Psychologist, 56(2), 90–100. [Google Scholar] [CrossRef]
Pridemore, W. A., Makel, M. C., & Plucker, J. A. (2018). Replication in criminology and the social sciences. Annual Review of Criminology, 1(1), 19–38. [Google Scholar] [CrossRef]
Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683. [Google Scholar] [CrossRef]
Reese, E., Leyva, D., Sparks, A., & Grolnick, W. (2010). Maternal elaborative reminiscing increases low-income children’s narrative skills relative to dialogic reading. Early Education and Development, 21(3), 318–342. [Google Scholar] [CrossRef]
Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6(2), 135–147. [Google Scholar] [CrossRef]
Robins, J., Sued, M., Lei-Gomez, Q., & Rotnitzky, A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science, 22(4), 544–559. [Google Scholar] [CrossRef]
Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services and Outcomes Research Methodology, 2(3–4), 169–188. [Google Scholar] [CrossRef]
Sikora, J., Evans, M. D. R., & Kelley, J. (2019). Scholarly culture: How books in adolescence enhance adult literacy, numeracy and technology skills in 31 societies. Social Science Research, 77, 1–15. [Google Scholar] [CrossRef] [PubMed]
Slavin, R., & Smith, D. (2009). The relationship between sample sizes and effect sizes in systematic reviews in education. Educational Evaluation and Policy Analysis, 31(4), 500–506. [Google Scholar] [CrossRef]
Snow, C. E. (2010). Academic language and the challenge of reading for learning about science. Science, 328(5977), 450–452. [Google Scholar] [CrossRef]
StataCorp. (2019). Stata statistical software: Release 16. StataCorp LLC. [Google Scholar]
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 25(1), 1–21. [Google Scholar] [CrossRef]
Tan, Z. (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika, 97(3), 661–682. [Google Scholar] [CrossRef]
Tyson, C. (2014, August 13). Failure to replicate. Inside Higher Education. Available online: https://www.insidehighered.com/news/2014/08/14/almost-no-education-research-replicated-new-article-shows (accessed on 1 October 2025).
Wasik, B. A., & Hindman, A. H. (2020). Increasing preschoolers’ vocabulary development through a streamlined teacher professional development intervention. Early Childhood Research Quarterly, 50, 101–113. [Google Scholar] [CrossRef]
What Works Clearinghouse. (2022). What works clearinghouse procedures and standards handbook, version 5.0. U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance (NCEE). Available online: https://ies.ed.gov/ncee/wwc/Handbooks (accessed on 10 October 2025).
White, T. G., Kim, J. S., & Foster, L. (2013). Replicating the effects of a teacher-scaffolded voluntary summer reading program: The role of poverty. Reading Research Quarterly, 49(1), 5–30. [Google Scholar] [CrossRef]
Whitehurst, G. J., Fischel, J. E., Lonigan, C. J., Valdez-Menchaca, M. C., DeBaryshe, B. D., & Caulfield, M. B. (1988). Verbal interaction in families of normal and expressive-language-delayed children. Developmental Psychology, 24(5), 690. [Google Scholar] [CrossRef]
Whitehurst, G. J., & Lonigan, C. J. (1998). Child development and emergent literacy. Child Development, 69(3), 848–872. [Google Scholar] [CrossRef]
Workman, J., von Hippel, P. T., & Merry, J. (2023). Findings on summer learning loss often fail to replicate, even in recent data. Sociological Science, 10, 251–285. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z., Kim, H. J., Lonjon, G., & Zhu, Y. (2019). Balance diagnostics after propensity score matching. Annals of Translational Medicine, 7(1), 16. [Google Scholar] [CrossRef]

Figure 1. Propensity Score Distributions Before and After Matching for Treatment and Control.

Table 1. Student Sample Sizes by School, Grade Level, and Kids Read Now Participation Status.

Grade	1		2		3		4
Grade	Non-KRN	KRN	Non-KRN	KRN	Non-KRN	KRN	Non-KRN	KRN
Battle Creek District
Dudley Elementary	35	6	33	8
Valley View Elem.	71	14			67	21	66	24
Verona Elementary							71	11
Troy City District
Hook Elementary			22	21	28	27
Kyle Elementary	23	11	24	18
Total	129	31	79	47	95	48	137	35

Table 2. Comparison of Baseline Student Control and Treatment Characteristics Before and After Matching.

Variables	Before Matching			After Matching
	Non-KRN Student	KRN Student		Non-KRN Student	KRN Student
	Mean (SD)	Mean (SD)	Mean Difference	Mean (SD)	Mean (SD)	Mean Difference
2018 Fall	0.00	0.09	−0.09	0.30	0.20	0.11
2018 Fall	(1.03)	(1.02)		(1.04)	(1.11)
2018 Winter	0.02	0.12	−0.10	0.38	0.22	0.16
2018 Winter	(1.00)	(0.97)		(0.91)	(1.00)
2019 Spring	−0.03	0.11	−0.14	0.34	0.21	0.13
2019 Spring	(1.02)	(0.97)		(0.91)	(1.01)
Female	0.41	0.50	−0.09 *	0.44	0.44	0.01
Economic disadvantage	0.79	0.50	0.29 ***	0.60	0.58	0.02
Black	0.45	0.11	0.34 ***	0.18	0.15	0.03
Asian	0.07	0.02	0.04	0.02	0.02	0.00
White	0.39	0.70	−0.30 ***	0.68	0.70	−0.02
Hispanic	0.05	0.04	0.01	0.07	0.05	0.02
Multiracial	0.05	0.14	−0.09 ***	0.06	0.08	−0.03
Minority	0.55	0.25	0.31 ***	0.25	0.24	0.01
Female × Economic disadvantage	0.35	0.28	0.07	0.28	0.27	0.01
Female × Black	0.17	0.06	0.11 ***	0.08	0.07	0.01
Female × Asian	0.03	0.02	0.00	0.02	0.02	0.00
Female × White	0.16	0.35	−0.19 ***	0.31	0.31	0.00
Female × Hispanic	0.03	0.02	0.01	0.03	0.04	0.00
Female × Multiracial	0.02	0.04	−0.02	0.00	0.00	0.00
Economic disadvantage × Black	0.36	0.09	0.26 **	0.16	0.13	0.03
Economic disadvantage × Asian	0.06	0.01	0.05 **	0.02	0.02	0.00
Economic disadvantage × White	0.30	0.26	0.04	0.30	0.31	−0.01
Economic disadvantage × Hispanic	0.04	0.04	0.00	0.07	0.05	0.02
Economic disadvantage × Multiracial	0.04	0.10	−0.06 ***	0.05	0.07	−0.03
n	440	161		133	133

Note: Statistical tests for mean pretest differences employ a t-test; and a chi-square test for all other binary covariates; * p < 0.05; ** p < 0.01; *** p < 0.001.

Table 3. Treatment–Control Balance Checks Before and After Matching.

	Before Matching				After Matching
	Mean Difference	Standardized Mean Difference	Eta-Squared Effect Size	Variance Ratio	Mean Difference	Standardized Mean Difference	Eta-Squared Effect Size	Variance Ratio
2018 Fall	0.09	0.092	0.002	0.983	−0.11	−0.102	0.002	1.143
2018 Winter	0.10	0.102	0.002	0.925	−0.16	−0.174	0.007	1.215
2019 Spring	0.14	0.137	0.004	0.913	−0.13	−0.148	0.005	1.239
Female	0.09 *	0.186	0.007	1.036	−0.01	−0.012	0.000	1.000
Economic disadvantage	−0.29 ***	−0.715	0.081	1.530	−0.02	−0.034	0.000	1.015
Black	−0.34 ***	−0.683	0.099	0.384	−0.03	−0.079	0.002	0.860
Asian	−0.04	−0.165	0.006	0.395	0.00	0.000	0.000	1.003
White	0.30 ***	0.619	0.072	0.891	0.02	0.042	0.000	0.968
Hispanic	−0.01	−0.049	0.001	0.793	−0.02	−0.059	0.001	0.798
Multiracial	0.09 ***	0.417	0.023	2.606	0.03	0.112	0.003	1.423
Minority	−0.26 ***	−0.524	0.054	0.814	−0.02	−0.043	0.000	0.964
Female × Economic disadvantage	−0.070	−0.148	0.004	0.889	−0.01	−0.02	0.000	0.983
Female × Black	−0.12 ***	−0.304	0.021	0.375	−0.01	−0.043	0.001	0.871
Female × Asian	−0.00	−0.015	0.000	0.917	0.00	0.000	0.000	1.003
Female × White	0.20 ***	0.532	0.045	1.716	0.00	0.007	0.000	1.008
Female × Hispanic	−0.010	−0.04	0.000	0.79	0.00	0.017	0.000	1.09
Female × Multiracial	0.020	0.139	0.003	1.88	.	.	.	.
Economic disadvantage × Black	−0.26 ***	−0.55	0.067	0.37	−0.03	−0.090	0.002	0.826
Economic disadvantage × Asian	−0.05 **	−0.198	0.010	0.222	0.00	0.000	0.000	1.003
Economic disadvantage × White	−0.04	−0.081	0.001	0.926	0.01	0.013	0.000	1.014
Economic disadvantage × Hispanic	−0.00	−0.018	0.000	0.918	−0.02	−0.059	0.001	0.798
Economic disadvantage × Multiracial	0.06 **	0.315	0.014	2.419	0.03	0.121	0.003	1.511

Note: Statistical tests for the mean differences are conducted with a t-test for pretest scores and a chi-square test for all other binary covariates; * p < 0.05; ** p < 0.01; *** p < 0.001.

Table 4. Doubly Robust Regression Outcomes for Intent-to-Treat Estimate and Two-Stage-Least-Squares Outcomes for Treatment-on-the-Treated Estimate.

	Intent-to-Treat			Treatment-on-the-Treated
	Coefficient	(Clustered SE)	Effect Size (d)	Coefficient	(Clustered SE)	Effect Size (d)
Treatment	0.149 *	(0.072)	0.145
Number of books				0.023 *	(0.011)	0.023
2018 Fall	0.200 **	(0.067)		0.197 **	(0.064)
2018 Winter	0.367 **	(0.086)		0.366 **	(0.082)
2019 Spring	0.338 **	(0.079)		0.339 **	(0.075)
Black	0.048	(0.202)		0.058	(0.177)
Asian	0.312	(0.426)		0.348	(0.407)
White (Ref.)
Hispanic	1.035 **	(0.374)		1.043 **	(0.356)
Multiracial	−0.127	(0.123)		−0.078	(0.109)
Female	0.095	(0.138)		0.094	(0.131)
Female × Black	−0.300	(0.324)		−0.280	(0.304)
Female × White (Ref.)
Female × Hispanic	−1.102 **	(0.405)		−1.084 **	(0.384)
Economic disadvantage	−0.148	(0.109)		−0.138	(0.104)
Econ disadv. × Black	0.163	(0.295)		0.154	(0.267)
Econ disadv. × White (Ref.)
Econ disadv. × Multiracial	−0.068	(0.291)		−0.123	(0.279)
Female × Econ disadv.	0.264	(0.200)		0.252	(0.191)
Constant	−0.222	(0.948)		−0.239	(0.892)
n	266			266

Note: Interactions of Female × Asian, Female × Multiracial, Economic disadvantage × Asian, Economic disadvantage × Hispanic are omitted because of collinearity; * p < 0.05; ** p < 0.01. Both models include school/grade fixed effects.

Table 5. Kids Read Now Estimated Effects by Grade Level, 1–4.

Panel A: Intent-to-Treat (ITT) Estimates
Grade	Coefficient	SE	Effect Size (d)	p-Value	N
1	0.309	0.155	0.302	0.049	63
2	0.178	0.151	0.174	0.241	32
3	0.108	0.141	0.106	0.446	84
4	0.071	0.082	0.069	0.387	87
Panel B: Treatment-on-the-Treated (TOT) Estimates
Grade	Coefficient	SE	Effect Size (d)	p-Value	N
1	0.049	0.024	0.048	0.037	63
2	0.032	0.027	0.031	0.227	32
3	0.015	0.018	0.015	0.423	84
4	0.013	0.015	0.013	0.361	87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Borman, G.D.; Yang, H. A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program. Educ. Sci. 2025, 15, 1422. https://doi.org/10.3390/educsci15111422

AMA Style

Borman GD, Yang H. A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program. Education Sciences. 2025; 15(11):1422. https://doi.org/10.3390/educsci15111422

Chicago/Turabian Style

Borman, Geoffrey D., and Hyunwoo Yang. 2025. "A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program" Education Sciences 15, no. 11: 1422. https://doi.org/10.3390/educsci15111422

APA Style

Borman, G. D., & Yang, H. (2025). A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program. Education Sciences, 15(11), 1422. https://doi.org/10.3390/educsci15111422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Quasi-Experimental Study of the Achievement Impacts of a Replicable Summer Reading Program

Abstract

1. Introduction

1.1. Kids Read Now

1.2. The Current Study

2. Method

2.1. Sample

2.2. KRN Implementation in Troy City and Battle Creek

2.3. Transparency, Openness, and Research Ethics

2.4. Measures

2.4.1. Dependent Variable

2.4.2. Independent Variables

2.5. Analytical Strategy

2.5.1. Propensity Score Matching Methods

2.5.2. KRN Quasi-Experimental Impact Estimates

3. Results

3.1. Descriptive Statistics and Balance Checks

3.2. Quasi-Experimental Estimates of Treatment Effects

3.3. Supplemental Analyses

3.3.1. Impact Estimate Differences by Grade Level

3.3.2. Impact Estimate Differences by District

4. Discussion

4.1. Connections to Prior Evidence

4.2. Substantive and Theoretical Connections

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI