1. Introduction
Science education is an important part of the curriculum in many countries [
1,
2]. Starting in primary school, children learn about the underlying principles and causal relationships of science domains as well as the processes through which this knowledge is created. This process of intentional knowledge-seeking is known as scientific reasoning [
3,
4] and is important for children because it prepares them for a society where science and the outcomes of scientific research are embedded in the culture [
5]. In a school setting, scientific reasoning skills are particularly important for successful inquiry learning: ‘minds-on’ scientific reasoning skills [
6] are instrumental to achieving meaningful outcomes from a ‘hands-on’ inquiry.
Scientific reasoning consists of multiple component skills, namely, hypothesizing, experimenting and evaluating evidence, the latter of which can be further divided into inferencing, evaluating data and drawing conclusions [
7,
8]. These component skills emerge at a different age, tend to develop at a different pace and are known to vary greatly between same-age children (e.g., [
9]). However, most existing research either treats scientific reasoning as a unitary construct or looks at one specific component of scientific reasoning—most often experimenting [
10]. Therefore, the inter- and intra-individual differences are not yet well understood, and to this date, few guidelines exist for addressing these differences in primary science classrooms.
An important challenge in understanding individual differences in scientific reasoning is the valid measurement of its component skills. Even though scientific reasoning is often taught in hands-on settings, it is mostly measured by paper-and-pencil tests. As performance-based testing circumvents many of the problems typically associated with written tests (see, for an overview, Harlen [
11]), it might shed new light on the development of scientific reasoning in children. Using one such performance-based test, the current study set out to advance our insight into children’s proficiency in different component skills of scientific reasoning, when applied in a practical, coherent inquiry setting in order to ultimately aid the development of teaching materials for various groups of learners in primary education.
1.1. Variation in Scientific Reasoning
As mentioned above, scientific reasoning comprises the skills of hypothesizing (the articulation of ideas about possible outcomes of an investigation), experimenting (the skills to design and perform experiments to test these hypotheses) and evaluating evidence (i.e., drawing valid conclusions). Evidence evaluation, in turn, involves inferencing (making a verbal interpretation of the gathered data), evaluating data (assessing measurement quality, for instance, to decide whether there are enough data to base a conclusion on), and drawing conclusions (using this information to make causal statements to answer the research question).
This multidimensionality is confirmed by psychometric models [
12] and studies investigating one or more component skills point to substantial variation. Experimenting, for example, is relatively easy for children to learn: most pre-schoolers are capable of some systematic testing [
13,
14], and older children can be taught this skill successfully by both direct instruction [
15,
16] and guided inquiry [
17,
18]. Hypothesizing is more difficult for young children to learn [
9,
13,
19], whilst evidence evaluation is the most difficult skill for them to acquire [
4] and is also experienced as such [
20].
Most of the studies on which this tentative order of difficulty is based examined a single skill at a single time point. A positive exception is the study by Piekny and Maehler [
9], who inferred the age at which children learn hypothesizing, experimenting and evidence evaluation from cross-sectional data collected with children from kindergarten to grade 5 and found a similar build-up as described above. Still, this study used different types of tasks for the different component skills rather than one task that encompassed all component skills. Thus, the relative difficulty of the component skills of scientific reasoning is not fully understood yet.
Other studies indicate that not all children develop scientific reasoning proficiency at the same pace. In a large-scale cross-sectional study using written tests in grades 4–6, Koerber, Mayer, Osterhaus, Schwippert and Sodian [
21] distinguished between naïve, intermediate and advanced conceptions of scientific reasoning and found that, although older children more often had advanced conceptions and less often naïve conceptions than younger children, all proficiency levels were present at all participating grade levels. The results of Piekny and Maehler [
9] further suggest that this variation increases with age. For example, both the means and standard deviations of ‘hypothesizing’ were low in kindergarten but increased from grade 1 onward. This finding indicates that, although children’s hypothesizing skills grow, the inter-individual variation increases accordingly. Thus, although children improve in scientific reasoning over the years, not all children improve equally or at the same time as their peers. Acknowledging and understanding these differences is vital for good science education.
To conclude, the component skills of scientific reasoning improve considerably during the primary school years [
9,
21], albeit with substantial variation. As not all subskills emerge at the same point in time and not all children develop their scientific reasoning proficiency at the same pace, the teaching of scientific reasoning in primary education is a challenging task. A profound understanding of how the component scientific reasoning skills develop can help teachers make scientific reasoning accessible for all children.
1.2. Explaining Variation in Scientific Reasoning
Although differences in the development of scientific reasoning are known to exist, the roots of the differences between children as well as differences in developmental patterns within children (i.e., differences across skills) are less clear. Children’s cognitive characteristics account for part of the variation in scientific reasoning proficiency. Previous research provides evidence that reading comprehension, numerical ability and problem-solving skills contribute to scientific reasoning.
Reading comprehension most consistently explains children’s overall scientific reasoning performance on written tests [
21,
22] as well as their ability to set up unconfounded experiments using the Control-of-Variables Strategy [
23,
24]. Van de Sande, Kleemans, Verhoeven and Segers [
25] found that reading comprehension explained the variance in all component scientific reasoning skills, albeit not to the same extent: effect sizes ranged from medium (
r = 0.30) for experimentation and drawing conclusions to large (
r = 0.47) for hypothesis validation. Why reading comprehension is such a strong predictor is not entirely clear. Possibly, reasoning ability transcends the domains of reading and science [
24,
25], or a general understanding of the language of science is important for science learning [
26]. However, it is also possible that the influence of reading comprehension is a consequence of test item format: most of the studies cited above used written tests that likely call upon children’s reading skills, even though questions were sometimes read out loud. In light of these findings, reading comprehension can be considered an important predictor of scientific reasoning, but because past research heavily relied on the use of paper and pencil tests, further scrutiny of its role is warranted.
Numerical ability is often named as a prerequisite for scientific reasoning by national curriculum agencies [
27,
28] as well as scientists [
29]—likely because scientific reasoning, in particular the evidence evaluation skills, involves reasoning about numerical data [
9,
22,
30,
31]. Yet, empirical evidence for this relation is scarce. Early work by Bullock and Ziegler [
32] demonstrated that numerical intelligence predicts the growth of experimentation skills in primary schoolchildren, explaining almost 35 percent of the variance in a quadratic growth model. More recent studies found significant correlations between numerical ability and scientific reasoning [
10,
33]. However, as the latter studies treated scientific reasoning as a unitary construct, it is yet unclear whether numerical ability also predicts children’s scientific reasoning, and if so, if it predicts all component skills of scientific reasoning to the same extent.
Children’s problem-solving skill is another possible predictor of scientific reasoning. Klahr and Dunbar [
34] characterized scientific reasoning as a process of rule induction, which inherently involves problem-solving. One could even argue that scientific reasoning is a form of problem-solving in itself: the problem is a need for specific knowledge, which is resolved through a systematic process of knowledge-seeking. Furthermore, as with the previous predictors, it seems plausible that problem-solving calls upon a person’s reasoning skills and therefore predicts scientific reasoning. Although upper-primary schoolchildren are still incapable of formal abstract reasoning, they can solve problems that involve reasoning with concrete objects such as the nine-dots problem and the Tower of Hanoi [
35]. Recent research supports these ideas: Mayer et al. [
22] found that problem-solving predicted a substantial portion of the variance in children’s scientific reasoning. Van de Sande et al. [
25] further showed that this effect does not apply to all subskills: hypothesis validation and experimenting depended on problem-solving, whereas generating conclusions did not. As such, problem-solving may explain some but not all component scientific reasoning skills, and the extent to which the different component skills are predicted is yet unclear.
1.3. Research Questions and Hypotheses
Although the cited literature points to notable differences in children’s scientific reasoning, most studies either addressed scientific reasoning as a single, albeit multifaceted construct or examined one of its subskills in isolation. Furthermore, most extant research has been conducted using written tests. These instruments neither resemble the learning context nor scientific practice and therefore may not accurately gauge children’s true ability in scientific reasoning [
36]. Moreover, written tests of scientific reasoning can confound with reading comprehension, as children with better reading comprehension might perform better on such tests because the test itself involves reading. In order to extend our understanding of the relations between scientific reasoning and the cognitive characteristics discussed above, the subskills should be studied in tandem, preferably in an authentic whole-task setting that does not require children to read.
This study, therefore, aimed to identify and explain differences in children’s ability to reason scientifically by means of a performance-based task so as to maximize authenticity and minimize the influence of reading skills. A sample of 160 upper-primary schoolchildren performed this task to gauge their proficiency in five scientific reasoning skills: hypothesizing, experimenting, making inferences, evaluating data and drawing conclusions. Performance differences were related to reading comprehension, numerical ability and problem-solving skills in order to answer the following research questions:
What amount of variation can be found in children’s scientific reasoning?
To what extent is this variation explained by reading comprehension, numerical ability and problem-solving skills?
Based on previous research using written tests, it was expected that children would differ considerably in their overall scientific reasoning proficiency. Differences across the five subskills were also predicted to occur. Specifically, children were expected to be most proficient in experimentation, less proficient in hypothesizing and least proficient in the three evidence evaluation skills (inferencing, evaluating data and drawing conclusions). Reading comprehension, numerical ability and problem-solving skills were expected to explain a unique portion of the variance in scientific reasoning. Considering the alleged differences across subskills, these characteristics were expected to have differential effects.
3. Results
In order to determine the extent to which scientific reasoning ability differs between children, the means and standard deviations of children’s test scores were examined. Overall test scores ranged from 2 to 13 points with an average of 8.00 (SD = 2.23). Scores on the subskills ranged from 0 to 3 except for inferencing, where the minimum score was 1 point. Means and standard deviations confirmed this differential ability and warranted further exploration as to what could explain this difference in scientific reasoning proficiency.
The mean scores in
Table 2 point to variation in proficiency on the different subskills: on average, children appeared to be most proficient in experimenting and least proficient in evaluating data and drawing conclusions, while hypothesizing and inferencing held the middle ranks. A within-subject ANOVA, controlling for gender and grade, was conducted to test whether these differences were statistically significant. Multivariate results revealed an overall effect of subskill (Pillai’s trace = 0.46,
F(4, 153) = 32.50,
p < 0.001), but no interaction effects of subskill with gender (Pillai’s trace = 0.02,
F(4, 153) = 0.60,
p = 0.665), and grade (Pillai’s trace = 0.03,
F(4, 153) = 1.23,
p = 0.300). The differences between subskills were further explored in univariate analyses. Scores on experimenting were significantly higher than scores on all other subskills (
p < 0.01). Scores on hypothesizing were significantly higher than scores on inferencing, evaluating data and drawing conclusions (
p < 0.05). Scores on inferencing were significantly higher than scores on evaluating data (
p < 0.01), but not scores on drawing conclusions (
p = 0.214). Drawing conclusions and evaluating data, the two subskills with the lowest scores, were not significantly different from one another (
p = 0.993).
Having established that there is variation in the extent to which children master the five scientific reasoning subskills, the next set of analyses sought to explain these differences from children’s reading comprehension, numerical ability and problem-solving skills. As shown in
Table 3, the total scientific reasoning score correlated with all three factors, albeit moderately. Correlations at the subskill level paint a mixed picture. Reading comprehension was associated with all subskills except hypothesizing, numerical ability only correlated with evaluating data and problem-solving did not correlate with any of the subskills.
Multivariate multiple regression was used to further scrutinize the relations between the three predictor variables and the five scientific reasoning subskills. Multivariate test results showed no main effect for the control variables gender, Pilai’s trace = 0.01,
F(5, 150) = 0.35,
p = 0.882, partial
η2 = 0.01, and grade, Pilai’s trace = 0.05,
F(5, 150) = 1.60,
p = 0.164, partial
η2 = 0.51. Regarding the explanatory variables, a significant contribution of reading comprehension on scientific reasoning was found, Pilai’s trace = 0.17,
F(5, 150) = 6.28,
p < 0.001, partial
η2 = 0.17. Neither numerical ability, Pilai’s trace = 0.02,
F(5, 150) = 0.57,
p = 0.725, partial
η2 = 0.02, nor problem-solving skills, Pilai’s trace = 0.02,
F(5, 150) = 0.61,
p = 0.694, partial
η2 = 0.02, explained scientific reasoning to a significant degree. The between-subject effects of reading comprehension in
Table 4 showed that reading comprehension accounted for a significant proportion of the variance in experimenting, inferencing, evaluating data and drawing conclusions, but not in hypothesizing. The regression coefficients further indicate that experimenting was most influenced by reading comprehension. Of the significantly predicted subskills, inferencing was least influenced by reading comprehension. Thus, although reading comprehension remains an important explanatory factor, it did not explain all scientific reasoning subskills uniformly.
4. Discussion
This study aimed to identify and explain differences in children’s ability to reason scientifically. To this end, a performance-based scientific reasoning task was administered, and measures of reading comprehension, numerical ability and problem-solving skills were collected in a sample of 160 upper-primary children. Their scientific reasoning scores varied considerably, which indicates that not all children are equally proficient in performing these skills. Observed differences within children further suggest that the five scientific reasoning skills are not equally difficult to perform. These intra-individual differences were partially explained by reading comprehension but not by numerical ability or problem-solving skills.
Results regarding the first research question confirm the existence of variation in children’s scientific reasoning: the inter-individual spread in total scores was considerable, and marked intra-individual differences were found for some subskills. The hypothesized proficiency pattern was confirmed: children in our sample were most proficient in experimenting, less proficient in hypothesizing and least proficient in inferencing, evaluating data and drawing conclusions. This is particularly important because, as Koerber and Osterhaus [
10] argued, previous research has studied these component skills separately, often through written tests [
22,
25]. The present study thus confirms the differences in subskill difficulty during a comprehensive performance-based scientific reasoning task and suggests that children’s relative proficiency at the subskill level is stable across test modalities (cf. [
6]).
Of particular interest is that the component scientific reasoning skills were consistently but moderately associated with total task scores. This result raises the question as to what accounts for the error variance in these correlations. Part of it could be due to the psychometric qualities of the scientific reasoning task. As mentioned in
Section 2.2.1, each component skill was assessed by only three items, so a meaningful analysis of internal scale consistency was deemed impossible. In the absence of this information, the magnitude of the correlations should be considered with some caution. A more substantive interpretation is that the proficiency pattern described above does not apply similarly to all children: some will develop the component skills in the indicated order, whereas others will show a deviating developmental trajectory. As a consequence, fine-grained measures of separate component skills, if reliably measured, give a more accurate impression of children’s proficiency in scientific reasoning than global measures and should be the preferred approach when assessment serves diagnostic purposes, for instance, to inform the design of instruction.
The observed variation in scientific reasoning was independent of children’s grade level. This equivalence of task performance might be due to the fact that our sample had few opportunities to practice their scientific reasoning skills—the school offered them only five inquiry projects per year, whereas the daily language and math classes lead to grade differences in reading comprehension and numerical ability. A related explanation is that scientific reasoning develops slowly in general and in the upper-primary grades in particular (e.g., [
9]). Although most children at this age advance in scientific reasoning [
19], the inter-individual variation is considerable and prevents the minor cross-grade growth differences from becoming statistically significant. Alternative research methods such as longitudinal designs and person-centered approaches to data analysis are more sensitive to capturing developmental growth and are increasingly being applied in scientific reasoning research [
41].
Reading comprehension explained part of the variance in scientific reasoning. This result is consistent with hypotheses and complements previous research that administered written tests of scientific reasoning (e.g., [
22,
25,
26]). Thus, why did reading comprehension predict scientific reasoning on a performance-based test that makes minimal demands on reading skills? One explanation is that scientific reasoning and reading comprehension both draw on general language comprehension processes, in particular when scientific reasoning is measured through interactive dialogue. Another interpretation could be that reading comprehension is a proxy of general intelligence or academic attainment, which, in turn, is associated with scientific reasoning (e.g., [
42]). In addition, relations have been found between scientific reasoning and verbal reasoning [
24], as well as nonverbal reasoning [
25] and conditional sentence comprehension [
43]. In line with these findings, language-centered scientific reasoning interventions have been proposed [
25,
43] and have been found to be effective [
44].
Our results further show that reading comprehension does not explain all component scientific reasoning skills to the same extent, which underscores the importance of assessing the constituent skills separately rather than merging them in a single overarching construct. The most striking finding in this regard is that hypothesizing was not related to reading comprehension, even though one would intuitively expect verbal reasoning to be associated with this skill. Although it is not entirely clear why hypothesizing and reading comprehension were not related, a possible explanation may lie in what children need to reason about: their own ideas about the world (as in hypothesizing) as opposed to building a situation model from given information (as in reading [
45] as well as in interpreting outcomes). In hypothesizing, misconceptions and naive beliefs may interfere with the reasoning process, whereas the chance of such ‘illogical’ thoughts could be less pronounced when reasoning with given information.
Numerical ability did not predict children’s scientific reasoning. Although there were sound theoretical reasons to assume that numerical ability would predict scientific reasoning, empirical evidence on this relation is either scarce and relatively recent [
10] or involved a different math strand [
32]. Thus, while numerical ability as operationalized in this study does not explain individual differences in scientific reasoning, future research might examine whether this independence generalizes across tasks and settings. Future research could also investigate whether different math skills (e.g., number sense, measurement) contribute to performance on a scientific reasoning task.
Children’s problem-solving skills did not predict scientific reasoning either, possibly because of task incongruence. Jonassen [
46] argued that the ease with which a problem is solved relies on individual differences between problem solvers and problem characteristics. A scientific inquiry is an ill-defined problem that requires a problem solver to combine strategies and rules to come to an unknown solution, whereas the Tower of Hanoi is a well-defined problem with a constrained set of rules and a known solution. Thus, although the Tower of Hanoi does involve problem-solving, it may be insufficiently sensitive to distinguish weak from strong problem solvers. Beyond problem characteristics, the problem representation [
46] might explain why Mayer et al. [
22] found that the very similar Tower of London problem explained scientific reasoning. Mayer et al. [
22] used a multiple-choice paper-and-pencil version of this problem in which all manipulations had to be completed mentally, thus making a relatively straightforward problem rather difficult to solve. As such, this test may not have identified all children who could solve a Tower of London problem, but only those who were sufficiently good at reasoning to complete the problem mentally. The current study, by contrast, used a less demanding task that allowed for real-time manipulation and was programmed to make invalid moves impossible. This difference in task demands might explain why the current study did not show a relation between problem-solving and scientific reasoning while previous research showed such a relation. As understanding what explains specific subskills is only a recent endeavor [
10,
25], more research is needed to understand which component skills can be explained as well as why differential effects are found.
4.1. Limitations
This study has some limitations, which include the homogenous sample in terms of parental background and education, with highly educated parents being overrepresented. As these parents are more likely to intellectually stimulate their children, for example, by taking them to science museums [
47], this might have given the participants in the current study a certain advantage compared to children whose parents are less educated. The observed variation in scientific reasoning was nevertheless considerable and would probably have been even more diverse if a more heterogeneous sample had been used. Future research should therefore incorporate more diverse samples to find out whether the present conclusions generalize to more typical groups of upper-primary schoolchildren.
Another limitation lies in the task used to assess numerical ability. Because there was no precedent as to what type of math skills would predict scientific reasoning, a lean task that assessed basic numerical operations was chosen because it seemingly matched the type of operations children had to carry out during the scientific reasoning task (e.g., counting, direct comparisons). A further advantage of this task was that it did not make demands on reading skills, which is particularly important because previous studies did not allow for untangling of scientific reasoning and reading comprehension. However, although the current task resembled the types of operations children had to carry out during the scientific reasoning task, no reasoning was required. The absence of any significant results suggests that numerical ability may not be the most relevant math skill to predict scientific reasoning, and further research is needed to identify if and what math skills relate to scientific reasoning.
4.2. Implications
The current study confirms that scientific reasoning is a multifaceted construct. This is not only evident from differences in children’s proficiency in the component skills but also from the asymmetry in the extent to which reading comprehension predicts these skills. How children of different proficiency levels learn scientific reasoning in a classroom setting and can be taught to reach their best potential is something that needs to be attended to in future research. Studying all scientific reasoning skills together is particularly important. Previous research has predominantly focused on a single skill, most often experimenting [
48], which stands to reason because experimenting is such a fundamental skill. At the same time, these focused investigations do not capture the complexity of scientific inquiry, the relative proficiency of children in the different subskills and the relations between these skills. Therefore, future research should focus more on scientific reasoning in authentic inquiry settings while still distinguishing subskills.
The absence of grade-level differences suggests that scientific reasoning develops slowly in the upper-primary years and implies that sustained practice is needed to boost this development. In preparing weekly or bi-weekly inquiry-based science lessons, teachers should attend to differences between children and among subskills. Most children will be able to perform the relatively easy skill of experimenting themselves with minimal guidance, whereas more teacher guidance is needed in generating hypotheses. Inferencing, evaluating data and drawing conclusions, which are the most difficult subskills, should initially be taken over by the teacher, who can demonstrate the skills to the class and gradually decrease their involvement as the lesson series progresses.
Results of the multiple regression analysis imply that teachers who start an inquiry-based curriculum can infer children’s entry levels from their reading comprehension scores—children’s basic numerical skills and ability to solve mind puzzles that resemble the Tower of Hanoi (e.g., tangrams, sudokus) should not be used for this purpose because both are poor predictors of scientific reasoning. The regression data also suggest that proficient readers need less guidance in scientific reasoning, so teachers can devote more attention to the average and poor readers in the class. Teachers should, of course, monitor the progress of all children and adjust the level of guidance just-in-time on an as-needed basis. A final practical suggestion concerns the scheduling of inquiry-based science classes. As these lessons are often taught by specialist teachers with part-time contracts, schools can opt for flexible scheduling and combine the fifth- and sixth-grade lessons because the proficiency levels in these classes are comparable. Alternatively, the same lessons can be delivered in both grades, perhaps with some minor adjustments in the amount of guidance, which will ease the teachers’ burden in lesson preparation.