1. Introduction
Students who do well in school are more likely to secure a stable career, reach financial success, and experience higher degrees of happiness than students with lower academic achievement (e.g.,
National Association of Colleges and Employers 2019;
Quinn and Duckworth 2007;
Rose and Betts 2004). Unsurprisingly, many people are invested in improving students’ academic performance. Parents aim to foster success in their children, educators work to support pupils’ achievement in the classroom, and policymakers pursue funding for interventions that improve student outcomes.
By aiming to foster improved academic performance and promote achievement, one idea in particular has gained massive popularity in schools: mindset theory (i.e., implicit theories;
Dweck 2000). According to mindset theory, “what students believe about their brains—whether they see their intelligence as something that is fixed or something that can grow and change—has profound effects on their motivation, learning, and school achievement” (
Dweck 2008, p. 110). That is, students who see their intelligence or another attribute (e.g., personality) as fixed, tend to focus on appearing smart rather than on learning, avoid effort when challenged, and give up when faced with a setback. In contrast, students who see their intelligence or another attribute as something that can grow and change, are eager to learn, work hard when challenged, and persevere when facing a setback (
Rattan et al. 2015). Given that these traits and behaviors are assumed to be important for student success, it is unsurprising that 88% of teachers in the U.S. believe the use of growth mindsets with students is important for student academic outcomes (
Yettick et al. 2016).
A small industry offering growth mindset interventions has flourished in recent years. The interventions typically aim to teach a growth mindset by explaining the concept through reading, presentation, or an interactive game (
Sisk et al. 2018). For example, MindsetWorks, LLC, sells a growth mindset intervention computer program, “Brainology”, that teaches students that intelligence can be developed with effort using lessons, online reflections, and activities. Mindset’s popularity has been described as a “revolution that is reshaping education” (
Boaler 2013, p. 143), with growth mindset interventions being implemented in classrooms around the world (
Sisk et al. 2018).
1.1. Are Academic Growth Mindset Interventions Effective? A High-Level Lens
Recently, two meta-analyses were published examining the efficacy of growth mindset interventions. One (
Macnamara and Burgoyne 2022) reported effects on academic performance and how studies’ adherence to best practices in study design, reporting, and avoiding bias might influence the size of those effects. They also examined a large number of theoretical and methodological moderators. No theoretical moderators yielded significant effects, and model effects were null once best practices in study design, reporting, and avoiding bias were taken into account.
Macnamara and Burgoyne (
2022) concluded that the effect of growth mindset interventions on academic performance might be rare if not spurious.
The other meta-analysis (
Burnette et al. 2022) reported effects on multiple outcomes, one of which was academic performance. They focused on two moderators: implementation fidelity and “focal group” status, where focal groups were identified by the original study authors as subgroups expected to benefit most from the growth mindset intervention. These focal groups ranged in their characteristics from students with fixed mindsets, to students with low grades, to students from ethnic minority backgrounds. The academic performance effect size for studies with high implementation fidelity on their targeted focal groups was
= 0.14; for non-focal groups, the effect was considerably smaller,
= 0.04.
Burnette et al. (
2022) emphasized the need to examine the heterogeneity of treatment effects. They concluded that while positive effects should be expected under some circumstances, that “null and even negative (in the case of academic achievement) effects are [also] to be expected in growth mindset interventions” (p. 27). We discuss the heterogeneity of treatment effects in further detail below.
1.2. The “Heterogeneity Revolution”
Heterogeneity of effects has become an avenue of interest as of late, and there are many different forms of heterogeneity. The one most often discussed in the growth mindset literature is the heterogeneity of intervention effects, in which researchers explore whether certain groups respond to the intervention to a greater degree than other groups. Beyond differences in treatment effects corresponding to different student characteristics, researchers from educational and organizational settings (e.g.,
Domitrovich et al. 2008;
Klein and Sorra 1996) have also advocated for considering differences in implementation quality at the intervention and the support levels, which might also account for the heterogeneity of effects.
A new argument among mindset proponents is that researchers should expect heterogeneity in treatment effects because of sample characteristics (i.e., at-risk groups;
Burnette et al. 2022;
Tipton et al. 2023). That is, rather than making claims about large gains in student achievement and performance outcomes overall (e.g.,
Dweck 2008), the focus has shifted to groups and circumstances where growth mindset interventions appear most effective (
Bryan et al. 2021;
Yeager and Dweck 2020;
Yeager et al. 2019). For example, mindset researchers have called for a “heterogeneity revolution” in behavioral science (
Bryan et al. 2021). In their call,
Bryan et al. (
2021) suggest that failures to replicate in psychology are due to failures to recognize heterogeneity in treatment effects. They propose that systematic examinations of heterogeneity will lead to more comprehensive theories, dependable guidance to policymakers, and a generalizable science of human behavior that will change the world.
In support of the need to examine heterogeneity,
Bryan et al. (
2021) described the varying effects of the National Study of Learning Mindsets, a large-scale growth mindset intervention. In the first report of the National Study of Learning Mindsets, the authors focused on students performing below their school’s median performance (
Yeager et al. 2019) and found a significant effect for this subgroup. A follow-up report later included the whole sample from this study but focused on treatment effects depending on students’ math teachers’ mindsets. When students’ math teachers had relatively higher growth mindsets, they found positive treatment effects on students’ academic achievement, but not when their math teachers had relatively lower growth mindsets (
Yeager et al. 2022). The different effects of the National Study of Learning Mindsets, depending on how the researchers selected subsamples and measures from the same study, served as a demonstration of the importance of examining heterogeneity (
Bryan et al. 2021).
In addition to the National Study of Learning Mindsets,
Bryan et al. (
2021) also described a hypothetical growth mindset intervention. They illustrate that, depending on whether one examines the whole sample or particular subsamples, the treatment effect ranges from null to moderately positive. They argue that researchers should capitalize on heterogeneity to deepen our theoretical understanding and to improve interventions.
These recent calls to examine heterogeneity, particularly of treatment effects, provide examples of effects for some groups and no effects for others. However, more examinations of heterogeneity that account for the full range of treatment effects, such as individuals in subgroups who may exhibit detrimental outcomes, would be valuable for understanding the extent of treatment effect variability in academic performance. Here, we focus on individual-level variation in treatment effects in terms of observed numerical benefits, no observed numerical impact, and observed numerical detriments to academic performance.
1.3. Examining the Whole Picture of Heterogeneity
We agree that there is a need to examine heterogeneity, particularly in growth mindset intervention studies. In line with
Burnette et al.’s (
2022) conclusion that positive, null, and negative effects are to be expected, we believe the whole range of effects needs to be considered. It is important to know how many students see positive impacts, how many students see no impact, and how many students see negative impacts from a given treatment. Aggregate results alone, consisting of group-level performance, do not capture the full scope of intervention effects varying from person to person. That is, statistical inferences and effect sizes for a group of individuals may not accurately describe how the individuals within the group responded to the treatment (
Grice et al. 2020;
Lamiell 2013;
Molenaar 2004).
Persons as Effect Sizes
In addition to reporting inferential statistics and group-level effect sizes (e.g.,
d, η
2), individual responses and behaviors can be quantified as effect sizes. How might this work? A recent paper (
Grice et al. 2020) described the process of examining individual-level data to determine how many participants in a study behaved or responded consistently with theoretical expectation. The answer can be computed as a percentage. For example, if a researcher reports that 85% of participants responded according to expectations, this evidence is stronger than if the researcher had reported that only 45% of participants responded according to expectations. This straightforward percentage can be easily understood as an effect size by scientists, policymakers, and laypeople alike.
Persons as effect sizes (
Grice et al. 2020) is an approach that better describes individual responses than aggregate analyses.
Grice et al. (
2020) provide several case studies as demonstrations. In one example,
Siegel et al. (
2018) conducted a study where they hypothesized that presenting a face with positive, neutral, or negative affect would influence their ratings of a paired neutral face. In accordance with their hypothesis, Siegel et al. found a significant main effect of affect condition on mean rating,
p < 0.001, a Bayes factor of 62.9, and a η
2 = of 0.32. Pairwise differences between the three conditions were also significant in the hypothesized directions: positive > neutral > negative. However, when examining the individual-level data, Grice et al. found that only 11 of 45 participants, or 24.44%, matched the hypothesized pattern. The majority of participants rated faces differently than the expected positive > neutral > negative rating. The picture painted by individual-level patterns countered the conclusions, based on aggregate analyses, that had previously been made.
The persons as effect sizes approach better represents the performance of individuals within a group than aggregate effect sizes, which may poorly represent the individuals within the groups (
Grice et al. 2020;
Lamiell 2013;
Molenaar 2004). Persons as effect sizes is not an inferential statistic. That is, the method has no bearing on statistical significance. Rather, this approach is descriptive and provides an additional effect size that captures how many participants behaved or responded in line with theory. Person-centered effect sizes, in conjunction with aggregate effect sizes, can present a more complete picture of the results.
1.4. The Present Study
Here, we evaluate growth mindset intervention studies that allow for further understanding of individual-level heterogeneity of a treatment effect. As
Grice et al. (
2020) note, traditional measures of effect size allow researchers to answer questions about intervention-based differences and changes at the group level, but do not provide information as to the number of individuals who behaved consistently with theoretical expectation. By examining individual-level responses, researchers can examine previously overlooked patterns in the data that can provide information for fine-tuning a theory. Using individual-level descriptive analyses, we examine three papers that had previously tested and presented claims using aggregate group-level analyses about the benefits of growth mindsets on academic performance.
In the present study, we evaluate individual mindset and academic performance outcomes associated with growth mindset interventions by extending the persons as effect sizes method developed by
Grice et al. (
2020). The higher the percentages of individuals within samples who behave or perform in line with theoretical expectations, the stronger the evidence for the original papers’ claims that growth mindset interventions change mindsets and/or are beneficial for students’ academic performance. According to
Grice et al. (
2020), 50% of participants responding in line with theory is expected due to chance. To illustrate, imagine a very large group of students whom we randomly divide into two groups. Neither group receives a treatment. In this scenario, on average, we should not observe any differences between the two groups. If we select a student from Group 1 and another student from Group 2, it should be a 50/50 chance that the student in Group 1 has a higher grade than the student in Group 2 and vice versa. Thus, if the percentage of participants responding according to theoretical expectation hovers around 50%, the results are not very impressive, as the probability of participants responding in line with theory is equal to that of the probability of participants responding inconsistently with expectation.
The persons as effect sizes approach can be expanded to describe the full extent of heterogeneity. Rather than only determining the number of participants who behaved or responded consistently with theory (e.g., treatment students who numerically outperformed their control-group counterparts, treatment students whose grades numerically improved), as in Grice et al.’s work (
Grice et al. 2020), we can also calculate the number of participants who experienced no change in their behavior or response (e.g., treatment students with identical grades to their control-group counterparts, treatment students with identical pre- to post-treatment grades), and the number of participants who responded or behaved opposite to the theory (e.g., treatment students who performed numerically worse than their control-group counterparts, treatment students whose grades numerically decreased). This approach provides insights for developing a complete theory by examining variation in individual responses that might otherwise be lost in an aggregate approach. We therefore extend
Grice et al.’s (
2020) approach. Rather than presenting a single percentage, the percentage of those who responded according to expectations, and treating ties and those responding counter to expectations as the remainder, we separately report ties and the percentage of those responding opposite of the study claims. Thus, we present three percentages for each research question: the percent who behaved or responded according to study claims (i.e.,
Grice et al.’s 2020, percent correct classification), the percent who exhibit no numerical difference in their response, and the percent who behaved or responded counter to study claims.
Examining the percentage of students who behave in line with, who demonstrate no difference, and who behave counter to expectations is necessary for understanding how well the aggregate effects represent the individuals in the sample. A significant effect with a given aggregate effect size can occur from multiple patterns of individual results, from only a few individuals behaving in line with expectations, to the majority of individuals exhibiting expected treatment effects. In analyzing individual and aggregate results jointly, evidence may offer novel implications about cost–benefit tradeoffs and about potential risks. Only by understanding the full range of outcomes, from benefits to detriments, can we develop complete theories and provide better guidance to parents, educators, and policymakers.
3.1. Introduction
Yeager et al. (
2016) conducted two growth mindset (incremental theory) interventions with the aim of improving adolescent stress and coping during evaluative social situations. The intervention, a reading and writing exercise, lasted approximately 25 min and taught students that people do not have fixed personalities but instead have the potential to change their beliefs and motivations. Students in the active control condition received a reading and writing exercise of the same duration but focused on a topic unrelated to personality and adjustment (i.e., how areas of the brain help individuals adjust to new physical environments). In their first study,
Yeager et al. (
2016) evaluated the effect of a growth mindset intervention on Trier social stress test performance, threat appraisals, and cardiovascular and neuroendocrine responses. No measures of academic performance were included in this study; therefore, we do not consider study 1 here.
In study 2,
Yeager et al. (
2016) administered the same approximately 25-min-long growth mindset intervention as in study 1 and measured grade point averages of 319 ninth grade students. The intervention was administered to ninth-graders in their first semester and grades were assessed at three time points: (1) pre-intervention, a composite of
z-scored values for prior grades and test scores in core subjects; (2) post-intervention semester 1, a composite of grades in core subjects at the end of the semester that included the intervention; and (3) post-intervention semester 2, a composite of grades in core subjects at the end of the following semester. According to
Yeager et al. (
2016), “the incremental-theory manipulation improved grades up to 7 months after intervention” (p. 1089) and that “[s]tudents who received the intervention also had better grades over freshman year than those who did not” (p. 1078).
Yeager et al. made no other claims about intervention effectiveness on academic achievement in this study. Further, they did not make any claims about intervention effectiveness for specific subgroups. Readers should not confuse this study with another study authored by Yeager et al. in the same year, where they made specific claims about subgroups. That study does not have open student-level data associated with it and, therefore, cannot be included here.
3.2. Methods
Prior to analysis, we first needed to make several assumptions due to a lack of details in
Yeager et al.’s (
2016) methods. First, Yeager et al. do not state when the pre-intervention grades were assessed. We assume that prior grades are from the semester immediately prior to the semester in which the intervention was administered. Second, we are not sure why a
z-scored composite of grades and test scores was used for prior grades, whereas a composite of only grades (no test scores and no mention of
z-scoring) was used for post-intervention. Nonetheless, grades at each time point appeared to be on the same scale. Thus, we assume that prior grades are comparable to post-intervention grades. Third,
Yeager et al. (
2016) do not provide information as to when within the semester the intervention took place. Based on their claim that the intervention improved grades up to seven months later, we infer that the post-intervention semester 2 grades were assessed seven months after the intervention, as the first post-intervention time point was at the end of the same semester in which the intervention was administered, and semesters last approximately 4 months.
We first sought to assess the strength of the evidence for the claim that “the incremental-theory manipulation improved grades up to 7 months after intervention” (p. 1089). We examined this within the context of both a within-subjects analysis, i.e., treatment students improved their grades, and a between-subjects analysis, i.e., relative to the control students, treatment students improved their grades more.
There are three possible time point comparisons where change up to seven months post-intervention can be assessed. We first calculated comparisons from pre-intervention to post-intervention semester 1. This timeframe captures the change in grades from pre-intervention baseline, presumably the spring before the intervention, to the end of the semester following the intervention. This period is within the seven-month timeframe during which Yeager et al. claim to see improvements. That said, though Yeager et al. claim that the intervention improved grades up to seven months after the intervention, it may be that they meant the intervention improved grades only after seven months following the intervention. In this case, we would not expect to see impressive results for this period of pre-intervention to post-intervention semester 1.
Next, we compared pre-intervention to post-intervention semester 2 grades, the longest timeframe available. This timeframe may be the best assessment of Yeager et al.’s claim because it captures pre-intervention grades as well as grades seven months (we assume) after the intervention. Following, we compared post-intervention semester 1 to post-intervention semester 2 grades. If pre-intervention grades are not comparable (recall that pre-intervention achievement scores included test scores whereas post-intervention achievement scores did not), then this latter timeframe would offer the best test of Yeager et al.’s claim that grades improved up to 7 months after intervention. However, including pre-intervention (i.e., baseline) grades offers a better assessment of change due to the intervention.
We also assessed the strength of the evidence for the claim that “[s]tudents who received the intervention also had better grades over freshman year than those who did not” (p. 1078), which is a between-subjects claim. We examined all possible treatment–control pairs at post-intervention semester 1 and semester 2.
We used pairwise deletion for missing values (e.g., a missing pre-intervention or semester grade) to retain as many data points as possible for change and comparison analyses.
3.3. Descriptive Results
Pre-intervention grades were a composite of course and test scores, which appeared to be on the same scale as post-intervention course grades; post-intervention grades were a composite of course grades. Pre-intervention grades ranged from 1.68 to 3.98; post-intervention semester 1 grades ranged from 1.25 to 4.00; and post-intervention semester 2 grades ranged from 1.00 to 4.00.
Yeager et al. (
2016) reported that a total of 303 participants consented to complete the intervention and questionnaires, and to have their school records analyzed. However,
Yeager et al. (
2016,
https://osf.io/fm5c2/ accessed on 15 July 2022) provide the pre-intervention grades for 316 students (intervention
n = 160, control
n = 156), the post-intervention semester 1 grades for 305 students (intervention
n = 152, control
n = 153), and the post-intervention semester 2 grades for 306 students (intervention
n = 152, control
n = 154).
3.4. Treatment Students’ Changes in Grades
We first examined students’ grades prior to and after the intervention. To assess the claim that the mindset manipulation improved grades up to seven months after the intervention, we first examined change in grades from pre-intervention to post-intervention semester 1 for the students who received the growth mindset intervention and had grades available at both time points.
Table 1 provides the results of the percentage of treatment students who experienced improved grades (a numeric increase from pre-intervention to post-intervention), the percentage of students who experienced no change in grades (identical grades at both time points), and the percentage of students who experienced a decline in grades (a numeric decrease from pre-intervention to post-intervention). We repeated these calculations for pre-intervention to post-intervention semester 2, and for post-intervention semester 1 to post-intervention semester 2.
As can be seen in
Table 1, the results from the analyses on treatment students’ changes in grades over time largely run opposite to the claim that the manipulation changed grades for the better up to 7 months after the intervention. The majority of students who received the growth mindset intervention demonstrated numeric declines in grade point averages from before to after the intervention. When examining post-intervention grades only, the majority of intervention students experienced a numeric increase in grades from semester 1 to semester 2, though a large proportion (roughly a quarter) of participants who received the intervention experienced a numeric decline in grades following the intervention.
3.5. Treatment–Control Pair Comparisons of Grade Changes
We next compared changes in grades between treatment and control students. We first assessed treatment students’ change in grades from pre-intervention to post-intervention semester 1 relative to control students’ change in grades during the same time period. We calculated change scores for students, examining all possible treatment–control pairs. This assessment can provide a better comparison than within-subjects comparisons alone, as there could be a general trend of decreasing grades that the growth mindset intervention helps ameliorate. We then compared the change in grades from pre-intervention to post-intervention semester 2 and post-intervention semester 1 to post-intervention semester 2, examining all possible treatment–control pairs. As can be seen in
Table 1, the results of the individual-level comparisons were near chance (i.e., 50%) as to whether the treatment student demonstrated greater numerical improvement relative to their control student counterpart.
3.6. Treatment–Control Pair Comparisons of Post-Intervention Grades
We next assessed the evidence for the claim that students who received the intervention had better grades over freshman year than students in the control group. We examined all possible treatment–control student pairs with grades at post-intervention semester 1 and post-intervention semester 2 (the two time points in students’ freshman year), respectively. As can be seen in
Table 1, the results were close to chance (i.e., 50%) as to whether the treatment student had numerically higher grades than their control student counterpart or whether the treatment student had numerically lower grades than their control student counterpart.
3.7. Discussion
Though
Yeager et al. (
2016) implied that treatment students experienced a positive change in grades, they did not conduct any analyses examining grades changes. We, therefore, cannot compare the persons as effect sizes of grade change with effect sizes from aggregate analyses. Yeager et al. did, however, report that the intervention condition demonstrated higher GPAs at post-intervention semester 1 than the control group,
p = 0.016,
d = 0.279, and that the same intervention effect on GPA outcomes was observed in the following semester (post-intervention semester 2),
p = 0.020,
d = 0.269. These aggregate-level results support Yeager et al.’s claim that students who received the intervention had better grades over freshman year than students in the control group. However, by examining persons as effect sizes, we can observe that it was a nearly 50/50 chance that a given treatment student had numerically higher grades than a given control student.
The near-chance results suggest either that (a) the growth mindset intervention had little bearing on student grades (i.e., variability was mostly random), or (b) that the intervention was nearly equally likely to increase some students’ grades as it was to decrease other students’ grades (or some combination of the two). If variability was random and the intervention had little to no effect on student grades, this raises the question as to whether mindset interventions akin to that of
Yeager et al. (
2016) are worthwhile, considering the time and money spent. If the intervention is likely to decrease the grades of many students, teachers and policymakers must decide if raising the grades of some students is worth lowering the grades of others.
4.1. Introduction
Ehrlinger et al. (
2016) conducted a series of three studies examining the role of mindset on overconfidence and preferential attention.
Ehrlinger et al. (
2016) argued that both overconfidence and attention allocation are important avenues for academic research, as they can help elucidate ways to improve students’ learning trajectories. Students were assessed on overconfidence and attention allocation during the completion of practice problems on the GRE, a common standardized exam assessing aptitude and achievement outcomes for prospective graduate students.
Ehrlinger et al. (
2016) argued that overconfidence is most prevalent among students with fixed mindsets, because their mindset “leads them to forego learning opportunities in order to maintain positive beliefs regarding their competence” (p. 95). That is, they reported that students who view their intelligence as fixed (i.e., hold a fixed mindset) are likely to overestimate their performance because they focus on easy problems, which in turn may lead to reductions in learning. In contrast, they argued that students who view their intelligence as malleable (i.e., hold a growth mindset) have better self-insight because they are more willing to focus on difficult problems. Here, we focus on the claims associated with mindsets and attention allocation, as this is the presumed antecedent to learning and academic performance.
In their first and third studies, Ehrlinger et al. examined the relationship between pre-existing student mindsets and overconfidence. These studies do not include a growth mindset manipulation and are not included here. In their second study, Ehrlinger et al. attempted to manipulate mindsets and measure group differences on attention allocation and overconfidence. We focus on
Ehrlinger et al.’s (
2016) study 2, where the study authors made claims about the effect of a mindset manipulation on attention allocation. Results on overconfidence can be found in the
Supplemental Materials File S1.
In study 2,
Ehrlinger et al. (
2016) sought to experimentally manipulate mindset to test its effect on overconfidence, and whether differences in attention mediated the effect. Ninety-four university students were either assigned to read an article designed to teach students that intelligence is stable (fixed-mindset condition) or an article designed to teach students that intelligence is malleable (growth mindset condition). Attention allocation was determined by the number of seconds allocated to the easy problems and the difficult problems each relative to the overall time spent on the GRE practice problems.
Ehrlinger et al. (
2016) reported that “[t]eaching a growth mindset makes students open to difficulty” (p. 94). Ehrlinger et al. further explained that “[p]articipants who were randomly assigned to a condition in which they were taught an entity [i.e., fixed] (vs. incremental [i.e., growth]) view of intelligence subsequently allocated less time to difficult problems” (p. 98).
4.2. Methods
To assess the strength of the evidence that “[t]eaching a growth mindset makes students open to difficulty” (p. 94), we examined within-subjects comparisons by calculating the percentage of students taught a growth mindset who spent numerically more of their total time on difficult problems than on easy problems, the percentage who spent identical amounts of time on easy and difficult problems, and the percentage who spent numerically more of their total time on easy problems than on difficult problems.
We do not include a comparison of difference scores on difficult relative to easy attention allocation between students taught a growth mindset relative students taught a fixed mindset for two reasons. First, differing from growth mindset interventions where claims about improvements to student outcomes are often contextualized relative to a control group, this study did not include a control group. Thus, claims about the effect of a growth mindset manipulation on students should not be compared with a fixed-mindset condition unless specifically contextualized as such. Any results from a growth–fixed comparison cannot disentangle effects from a growth mindset treatment from effects from a fixed-mindset treatment. Second, Ehrlinger et al. provide their own between-subjects claim, which we assessed next.
To assess the strength of the evidence that “[p]articipants who were randomly assigned to a condition in which they were taught an entity [i.e., fixed] (vs. incremental [i.e., growth]) view of intelligence subsequently allocated less time to difficult problems” (p. 98), we examined each possible fixed-mindset condition–growth mindset condition pair and assessed the percentage of pairs in which the student in the fixed-mindset condition allocated a numerically smaller proportion of the total time spent to difficult problems compared with their growth mindset condition counterpart, the percentage of pairs who allocated an identical proportion of time to difficult problems, and the percentage of pairs in which the student in the fixed-mindset condition allocated a numerically greater proportion of the total time to difficult problems compared with their growth mindset condition counterpart.
Two students in the growth mindset condition were missing values for the number of seconds taken on certain easy problems, thus impacting their reporting of total time that were allocated to easy problems. These students were excluded from the analyses.
4.3. Descriptive Results
Ehrlinger et al. (
2016) reported that the overall time students spent on the test ranged from 159 to 929 seconds, with an average of 356.90 total seconds. Students spent between 7.60 and 80.00 seconds on the easy problems (
M = 25.40 s), and between 9.60 and 48.40 s on the difficult problems (
M = 21.19 s).
Ehrlinger et al. (
2016) reported that students in the fixed-mindset condition allocated less attention to difficult problems,
M = 15.70 (
SE = 1.026), than those in the growth mindset condition,
M = 17.22 (
SE = 1.023). Those in the fixed-mindset condition also allocated more attention to easy items,
M = 20.94 (
SE = 1.028), than those in the growth mindset condition,
M = 18.54 (
SE = 1.026). In both conditions, on average, students directed more time and attention toward easy problems than difficult problems.
4.4. Growth Mindset Condition Attention Allocation
Ehrlinger et al. (
2016) claimed that “[t]eaching a growth mindset makes students open to difficulty” (p. 94). To assess the strength of the evidence that teaching a growth mindset makes students open to difficulty, we examined the amount of time students who received the growth mindset intervention spent on difficult, relative to easy, problems. As can be seen in
Table 2, the results of this analysis indicate that the majority of students taught a growth mindset spent more time on the easy problems, contradicting the implication that these students would take more time to focus on the difficult problems than on the easy ones.
Ehrlinger et al.’s aggregate results also do not support their claim that teaching a growth mindset makes students open to difficulty. Students in the growth mindset condition spent a similar amount of time on easy and difficult problems. The claim appears to be based on attention allocation relative to students who were taught a fixed mindset. Ehrlinger et al. observed a significant mindset manipulation × attention allocation interaction, with students in the fixed-mindset condition spending more time on easy compared with difficult problems (p < 0.001, ηp2 = 0.32) than students taught a growth mindset (p < 0.10, ηp2 = 0.04). Without a control group, we cannot know whether a growth mindset manipulation makes students (relatively more) open to difficulty or if a fixed-mindset manipulation makes students less open to difficulty.
4.5. Fixed vs. Growth Mindset Condition Pair Comparisons
To assess the claim that being taught a fixed mindset reduces attention to difficult problems compared with being taught a growth mindset, we examined how often students in the fixed-mindset condition spent numerically less of their total time on difficult problems compared with students in the growth mindset condition. As shown in
Table 2, there is some evidence to support the claim that many of those in the fixed-mindset condition were more likely to allocate less attention to difficult problems than students in the growth mindset condition. Student behavior was in line with prediction for about two-thirds of the pairs and counter to prediction for about one-third of the pairs.
4.6. Discussion
Upon examination of the individual-level data, we reveal that the majority of participants taught a growth mindset spent more time on easy problems than on hard problems. However, when comparing students taught a fixed vs. a growth mindset, the majority of pairs, about two-thirds, behaved according to Ehrlinger et al.’s claim. Future researchers may wish to disentangle whether the growth mindset manipulation, the fixed-mindset manipulation, or both are driving the effect. Additionally, future researchers may wish to investigate Ehrlinger et al.’s suggestion that attention allocation to difficult problems is critical for student long-term learning. No evidence is reported in
Ehrlinger et al. (
2016) to support this suggestion; therefore, we cannot evaluate that claim here. Taken together, more evidence is needed to evaluate the importance of mindset on attention allocation.
6. General Discussion
Our study of three mindset interventions is a step toward answering the call for a heterogeneity revolution. We considered the full range of treatment effect heterogeneity at the student level, from benefits to detriments. Examining the complete range at this level allows readers to better understand heterogeneity of outcomes.
We used a recently developed individual-level effect size. The persons as effect sizes method developed by
Grice et al. (
2020) is designed to calculate the percentage of participants who behaved or responded according to theory and the percentage who did not. We extended this method by differentiating the percentage of participants who demonstrated no numerical change/difference in response or performance from the percentage of participants who behaved or responded counter to expectations. Exploring the whole range of outcomes at the individual level is often obscured in aggregate analyses; yet, it is important for understanding variation in outcomes and making well-informed policy decisions.
We examined three papers that claimed, based on aggregate analyses, that growth mindset interventions can improve academic performance of students. Using aggregate analyses,
Yeager et al. (
2016) claimed that the intervention improved students’ grades. Examining individual-level data, we found that the majority of students who received the intervention conducted by
Yeager et al. (
2016,
https://osf.io/fm5c2/ accessed on 15 July 2022) experienced a decline in grades from pre- to post-intervention. When comparing treatment and control participants, we found that nearly half the pairs showed that the treatment student performed worse than their control student counterpart. These results suggest that either change in grades is largely due to chance, or that the intervention is about as likely to benefit some students as it is to harm other students.
Likewise, using aggregate analyses,
Ehrlinger et al. (
2016) claimed that students taught a growth mindset spend more time solving difficult problems. Examining individual-level data, we found that nearly two-thirds of students taught a growth mindset spent less time on difficult problems than on easy problems. When comparing growth-mindset and fixed-mindset condition participants, we found that about two-thirds of the growth-mindset condition students focused more on difficult problems than their fixed-mindset condition counterpart, and about one-third of students taught a growth mindset spent more time on the easy problems than their fixed-mindset condition counterpart. These results suggest that teaching a growth mindset may influence some students to shift their focus more to difficult problems, though it may be that teaching a growth mindset has no impact but that teaching a fixed mindset is what influences them to focus on easy problems.
Porter et al. (
2022) made five major claims. In most cases, individual-level effect sizes generally did not strongly corroborate their conclusions. For example, their aggregate analyses led them to claim that administering the intervention shifted teachers’ mindsets toward growth. However, the largest share of teachers had identical mindsets before and after the intervention. Likewise, Porter et al. claimed that undergoing the intervention shifted students’ mindsets toward growth. However, nearly half the students reported no change or reported less of a growth mindset following the growth mindset intervention. In addition, when comparing treatment–control student pairs in
Porter et al. (
2022,
https://osf.io/z2nvy/ accessed on 20 July 2022), in most of the classes, around half the time the treatment student performed worse than their control student counterpart. These results suggest that either results are due to chance, or that the intervention benefits some students while harming others.
For one claim, some of the individual-level effect sizes supported the aggregate analyses. We found that a large majority of lower-achieving students, defined as students with pre-intervention grades one standard deviation or lower than the mean, improved from pre- to post-intervention. However, many lower-achieving students in the control group also saw an improvement. When comparing changes in grades between the two groups, the effect sizes no longer strongly supported Porter et al.’s conclusions.
Finally, some of Porter et al.’s claims were made based on treating teachers disagreeing with fixed mindset statements, but not strongly, as being synonymous with agreeing with fixed-mindset statements. The most notable claim made by
Porter et al. (
2022) was that the student intervention was most impactful when teachers initially held a fixed mindset. There was only one teacher who initially agreed with fixed-mindset statements. Within this teacher’s classroom, only a quarter of the students improved their grades from pre- to post-intervention. Furthermore, Porter et al. claimed that lower-achieving students benefited most from the intervention when their teachers initially held fixed mindsets, but there were zero lower-achieving students as defined by Porter et al. in the class of the sole teacher who initially endorsed a fixed mindset.
Across the evaluations of the claims in these three papers, we have shown that when individual-level effect sizes are considered, the evidence is not as strong as is implied when one examines aggregate results alone, even when examining sub-samples hypothesized to see the most benefit. When considering students and teachers with disparate backgrounds, beliefs, and needs, our effect sizes allow us to conclude that some individuals who are promised to see the greatest improvements may, in fact, experience no positive outcomes or even detriments.
Limitations and Future Directions
Our study of persons as effect sizes within the growth mindset intervention literature is not a comprehensive or systematic review. Rather, our study was based on published literature with preexisting and available open data. We only included published studies to evaluate public claims. We made this decision because the claims in unpublished manuscripts may evolve prior to publication, and unpublished datasets without an accompanying manuscript would not have claims for us to evaluate.
Our primary barrier encountered to including more studies was the literature itself. Of the relevant studies, few shared open data at the individual level. Though we searched for other relevant studies with open individual-level data, there may be studies that we missed. We know of no other studies that met our inclusion criteria other than these three studies. Future researchers may want to contact researchers to ask for data to analyze that was not publicly posted. Only including studies with posted open individual-level data may not be representative of the wider literature.
There are multiple levels of heterogeneity one can examine. The present study primarily focused upon heterogeneity of individual responses (i.e., mindset and academic outcomes) from the intervention. However, some occupational researchers, educational researchers, and psychologists (i.e.,
Burnette et al. 2022;
Domitrovich et al. 2008;
Klein and Sorra 1996;
Macnamara and Burgoyne 2022) have advocated for considering the quality of the intervention itself and/or the strength of the support system surrounding the intervention. The studies that we examined did not provide sufficient information about the support system at the school or macro levels for us to evaluate. We present this as an avenue for examination by researchers in the future, to explore heterogeneity at all levels of the implementation of mindset interventions (macro, school, and individual levels).
The three studies reported here used disparate methods, samples, and approaches. We recommend that researchers attempt to replicate these and other studies.
Bryan et al. (
2021) suggest that attempted replications fail because the replication does not take into account heterogeneity. Replication and heterogeneity need not be at odds with one another. Rather, researchers should attempt to replicate studies and examine heterogeneity systematically.
Currently, intervention implementation and subgroups often vary from study to study, or, in the case of
Porter et al. (
2022), intervention implementation varied from teacher to teacher in the same study. With little consistency in domain, groups, method of delivery, or setting, it is difficult to establish replicable effects and difficult to examine heterogeneity systematically. In the present paper alone, the three studies differed in mindset domain (intelligence, personality), age group (ninth grade, university, or sixth and seventh grade), and outcomes (GPA, attention allocation). In addition, methods of delivery and settings for the intervention differed across these three studies: two 25-min interventions delivered to classrooms; an article intervention delivered in the laboratory; and an intervention delivered by teachers within a course subject. As it stands, understanding the underlying mechanisms and the conditions under which effects appear is hampered when methods and outcomes of mindset interventions vary from study to study. Researchers should strive to replicate effects under specific conditions to better understand heterogeneity of effects.
Finally, we recommend that researchers report individual-level effect sizes along with aggregate analytical results. The persons as effect sizes approach is not a replacement for inferential statistics and cannot test for statistical significance. Rather, this approach provides an effect size based on individual-level responses to consider how many participants are responding as expected alongside other statistics. Aggregate analyses, aggregate effect sizes, and individual-level effect sizes can be reported together to provide an additional layer of evidence for researchers and readers to evaluate.
7. Conclusions
We considered heterogeneity in two ways. First, we analyzed three growth mindset interventions at the individual level. How does this increase our understanding of heterogeneity? Aggregating across students or subsamples of students leads to information loss and can obfuscate variance in responses to treatments. We used a straightforward, easy-to-understand approach to reveal how many students in each study or subgroup responded to the intervention in line with the study authors’ claims, and how many did not.
Second, we considered the full range of individual-level heterogeneity. We examined how many students responded positively (i.e., according to claims), how many appeared unaffected (i.e., not according to claims), and how many responded negatively (i.e., counter to claims). Understanding the full spectrum of effects informs educators’ decisions when weighing the costs, benefits, and risks of implementing interventions.
Our analyses suggest that growth mindset interventions might benefit some students while harming others. Academic performance decrements must be acknowledged and investigated to better understand the heterogeneity of growth mindset effects on academic outcomes. Interventions that improve the outcomes of some students—at the expense of others—are not ideal solutions, especially when considering disparities in academic outcomes.
We propose a call for a transparent and complete view of heterogeneity. First, we encourage mindset researchers in the future to openly share their data at the individual level. Second, we urge researchers to consider examining heterogeneity in multiple ways, and, along with aggregate statistical tests and aggregate effect sizes, to include the percentage of participants who behaved according to theory and those who behaved counter to theory. Providing both aggregate and individual effect sizes allows readers to contextualize the heterogeneity of the effects that mindset interventions can elicit—the beneficial effects, the lack of effects, and the detrimental effects—as well as whether these person-to-person effects are consistent with claims from aggregate analyses. Only by fully addressing heterogeneity can we deepen our theoretical understanding and improve interventions.