*Article* **Professional Knowledge and Self-Efficacy Expectations of Pre-Service Teachers Regarding Scientific Reasoning and Diagnostics**

**Dagmar Hilfert-Rüppell 1,\* , Monique Meier <sup>2</sup> , Daniel Horn <sup>2</sup> and Kerstin Höner <sup>1</sup>**


**Abstract:** Understanding and knowledge of scientific reasoning skills is a key ability of pre-service teachers. In a written survey (open response format), biology and chemistry pre-service teachers (*n* = 51) from two German universities claimed central decisions or actions school students have to perform in scientific reasoning in the open inquiry instruction of an experiment. The participants' answers were assessed in a quality content analysis using a rubric system generated from a theoretical background. Instruments in a closed response format were used to measure attitudes towards the importance of diagnostics in teacher training and the domain-specific expectations of self-efficacy. The pre-service teacher lacked pedagogical (didactics) content knowledge about potential student difficulties and also exhibited a low level of content methodological (procedural) knowledge. There was no correlation between the knowledge of student difficulties and the approach to experimenting with expectations of self-efficacy for diagnosing student abilities regarding scientific reasoning. Self-efficacy expectations concerning their own abilities to successfully cope with general and experimental diagnostic activities were significantly lower than the attitude towards the importance of diagnostics in teacher training. The results are discussed with regard to practical implications as they imply that scientific reasoning should be promoted in university courses, emphasising the importance of understanding the science-specific procedures (knowing how) and epistemic constructs in scientific reasoning (knowing why).

**Keywords:** professional knowledge; scientific reasoning skills; self-efficacy; students' difficulties; diagnostic competencies

### **1. Introduction**

Inquiry-based teaching is seen as contributing to content, procedural, and epistemic learning goals of science education [1]. Therefore, a basic understanding of the systematic approach to conducting science investigations is required in the competencies of scientific inquiry (e.g., in the US [2] and in the UK [3]). In Germany, these are reported for biology, chemistry, and physics with similarly formulated educational standards [4–6], allowing these competencies to be promoted in a networked manner both vertically within a subject and horizontally across subjects [7]. This applies also to the field of school education and in the training of pre-service teachers. Both areas are in turn directly related, i.e., the knowledge and skills (for scientific inquiry) of the teacher shape the teaching with learning opportunities (for inquiry-based science education) and thus the potential learning success on the part of the students [8]. As a result, at the international and national levels, standards for the teaching profession are also being formulated and science teaching competencies described for teacher education [9–11]. Common to them are the requirements for future teachers to build up science education about and through scientific inquiry as their own

**Citation:** Hilfert-Rüppell, D.; Meier, M.; Horn, D.; Höner, K. Professional Knowledge and Self-Efficacy Expectations of Pre-Service Teachers Regarding Scientific Reasoning and Diagnostics. *Educ. Sci.* **2021**, *11*, 629. https://doi.org/10.3390/ educsci11100629

Academic Editors: Moritz Krell, Andreas Vorholzer and Andreas Nehring

Received: 8 August 2021 Accepted: 6 October 2021 Published: 11 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

competence, as well as to learn how to teach scientific inquiry in order to be able to promote corresponding competencies in students. For example, (ongoing) teachers have to be competent themselves in designing empirical approaches to test hypotheses, and need knowledge as well as skills in hypothesis-led experimentation [12]. Studies on the development of pre-service science teachers' scientific reasoning competencies show that explicit reflections about scientific reasoning (i.e., learning about science; [13]) contributes more to the development of scientific reasoning competencies than only doing science without reflecting about it [14,15]. In the course of this, knowledge about and understanding of scientific inquiry and scientific reasoning is relevant [16], and can also be used as constructs for the analysis and assessment of learning activities and learning outcomes [17].

#### *1.1. Scientific Reasoning*

In many studies, the terms scientific reasoning and scientific thinking are used interchangeably because the boundary between reasoning and thinking is blurred (e.g., [18–20]). Scientific reasoning thus forms the basis for critical thinking and is only one, albeit very significant, aspect in the process of thinking about (scientific) facts (e.g., [21–24]). Scientific reasoning can therefore be interpreted as a subset of critical thinking skills (cognitive and metacognitive processes and dispositions) that are essential for scientific procedures in the problem-solving process, the evidence of information in scientific disciplines, and the epistemological incorporation of scientific methods and paradigms [25]. There are a range of views on the structure of scientific reasoning and on the number of its components. Two groupings can be distinguished: while one emphasises scientific reasoning as a broad and complex component representing a particular skill, understanding, or competence [26], the other grouping opposes the advocacy of multidimensional theories [27].

Since the 1970s, according to Dunbar and Fugelsang [28], scientific reasoning has also been viewed as a way of solving problems, with great efforts being made to identify strategies that scientists use to solve problems. Competencies required for scientific reasoning are viewed as a complex construct, encompassing both the skills required for scientific problem solving and the ability to reflect on problem solving at a meta-level [29,30]. The scientific discovery process is best conceptualised as involving both reasoning and problem-solving skills, with the ultimate goal of generating, testing, and then evaluating a hypothesis about a causal or categorical relationship based on the results. Both of these skills—strategy development and reasoning processes—require knowledge to identify key features of a problem at hand [31]. Problem solving requires three types of knowledge in a complex construct (knowing that, knowing how, knowing why, cf., [29,32,33]. Explicit procedural knowledge, in turn, addresses the execution level and thus the "knowing how" and practical implementation of actions to solve problems [32,33]. In his structural model of scientific reasoning, Mayer [34] identified, in addition to personal variables, such as (prior) knowledge and cognition, four process-related skills/subcompetencies:


In the execution of these process steps in problem-oriented and inquiry-based teaching, content knowledge is to be applied and methodological knowledge is to be developed and applied in equal measure [19]. Experimentation is considered to play a central role in the process of scientific inquiry as a content and method in science education [36]. "Scientific reasoning is defined as the inquiry processes [ . . . ] [and] the reasoning skills involved in experimentation, evidence evaluation, and inference making addressed to scientific understanding" [37] (p. 106). In the present study, scientific reasoning is predominantly defined from a procedural perspective and thus addressed in the most open-ended problemsolving process possible using the method of experimentation.

In order to perform diagnostics and scaffolding in the (experimental) inquiry process, knowledge about scientific reasoning, as well as about the implementation and associated

difficulties on the part of the learners of scientific reasoning, plays a central role for (preservice) teachers. Given this significance, the key difficulties faced by students will be outlined below in a summary of the literature.

#### *1.2. Literature Summary on Student Difficulties in the Experimental Problem Solving Process*

Experimentation demands and promotes a wide range of cognitive, psychomotor, and social skills in students [35]. In this context, the SDDS model (Scientific Discovery as Dual Search; [22]) as well as the structural model for scientific reasoning [34] with their anchoring in problem-solving research is relevant for both the conceptualisation and the assessment of competencies in the science subjects of biology [37–41], chemistry [42–44], and physics [45–47], respectively, and is central in natural science studies [7,48]. In promoting the competencies underlying each model, the inquiry-based learning approach is shown to be superior to direct instruction on these [49]. In inquiry-based learning, the degree of student activity or, respectively, the open-endedness in the experimentation can be designed differently (levels of inquiry [50]) and thus influence learning success. In the literature, on the one hand, the high value of independence in experimentation compared to teacher demonstration experiments is emphasised (e.g., [51,52]) and, on the other hand, the guided inquiry approach with targeted support of learners in the phases for experimental problem solving is attributed the highest effectiveness [53]. Vorholzer and von Aufschnaiter [54] identify three main dimensions in which the implementation of guidance can vary: (a) the degree of autonomy, (b) the degree of conceptual information, and (c) the cognitive domain of guidance. Independent experimentation can be understood as a relatively complex cognitive problem-solving process that is particularly challenging for students (cf., [55,56]) and consequently requires scaffolding in the different dimensions, process steps, and/or competencies. Accordingly, pre-service science teachers must be provided with learning opportunities to acquire these competencies, to foster their own reasoning in science alongside how to teach students how to reason. They should also be enabled to plan and implement targeted lessons that enable their students to acquire the scientific reasoning competencies for experimentation and experimental problem solving required by the educational standards of many countries [4–6,57]. Neither mere boilerplate imitation of experiments nor participation in teacher demonstration experiments by students leads to the desired mastery of the scientific reasoning process [58]; nor are approaches opened too early effective [59,60]. The latter can create numerous difficulties/barriers for students to overcome during experimentation. According to the structural areas/phases of hypothesis, planning/execution, and testing, as well as conclusions and evaluation of results in the models of Klahr [22] and Mayer [34], the misconceptions and student difficulties described here descriptively and proven empirically are summarised in Table 1.

**Table 1.** Overview of process-related misconceptions and difficulties of students in experimentation published in the literature (references in brackets on the right side) structured and summarised according to the three phases for scientific reasoning of the SDDS model [22].



#### **Table 1.** *Cont.*

#### 1.2.1. Formulating/Search Hypothesis

When generating hypotheses, students often find it difficult to make educated guesses. They do not consider preconditions such as justification or verifiability—in principle or with the given experimental materials. The procedure that several hypotheses are to be set up is mostly unknown to them or only the one they think is correct is considered (e.g., [64,81]).

#### 1.2.2. Design and Execution of the Experiment

It is often observed that the students do not plan out their approach to the experiment, but instead carry it out immediately and instinctively with no pre-determined method in mind. This can result in the steps in the experiment not being purposeful and the procedure being changed several times [72]. Hammann and colleagues [40] describe this unstructured trial-and-error approach as "no plan, change all". The most important scientific reasoning skill describes the control of variables strategy [70]. The difficulties of students here lie in identifying the dependent and independent variable ([82,83], among others), and in the fact that the variable control is often not considered and confounding variables are not excluded [63]. Moreover, a control approach is usually missing, and measurement repetitions are rarely performed [39,69,82,83]. Sometimes students try to create an effect rather than conduct a goal-directed experiment [75,78]. Inefficient experimentation is also evident, e.g., the same experiment is repeated multiple times [81]. Kraeva [44] was able to identify six different approaches of students in conducting chemistry experiments through video analysis, classified in terms of the attributes "plan" versus "try" as well as "maintain", "develop", and "discard". "Revision" as a planning-reviewing strategy was described as relatively successful, whereas 'imitation' (exploratory-alternative-less) was described as a relatively less successful strategy. This corresponds with the courses of action taken by students when experimenting to clarify a biological phenomenon in the sense of a process-oriented or explorative type [39].

#### 1.2.3. Evaluation of the Evidence

The generated data are partly disregarded, while conclusions are drawn illogically and for the most part not related back to the hypothesis [67]. Hammann ([84], p. 200) refers to the hasty termination of the search for possible hypotheses and thus to the wrong conclusion as "positive capture". He calls the "most robust finding in the literature" the "confirmation bias" [62,79], i.e., the tendency to confirm hypotheses while ignoring contradictory data [84]. Dunbar [85] distinguishes between two approaches to data evaluation: the "find-evidence-goal" approach, i.e., looking for results that confirm the hypothesis, and the "find-hypothesis-goal" approach, i.e., looking for new hypotheses after non-confirming results that then reflect the results. Evaluation of the evidence in the form of referring back to the hypothesis rarely occurs, as does discussion of error [39].

The probability of success for a specific experimentation process depends on the one hand, on the requirements resulting from the individual phases of the experimentation process [86], and, on the other hand, on personal characteristics, such as interest in the subject or general cognitive performance of the students [55]. For diagnosis, in this case the assessment of students' abilities and performance in experimental problem solving, the discrepancy between students' conceptions and students' actions on scientific concepts for experimentation is used [87,88]. Draude [89] was able to demonstrate deficits regarding the diagnosis of student difficulties in experimentation for physics teachers, whereby the necessary prerequisites are hardly developed and promoted in the teacher training programs [75,90]. In a longitudinal two-year study with biology pre-service teachers, insights into the structure and change of their diagnostic competence and possible influencing factors were obtained [91]. Diagnoses of student difficulties and performance in experimental problem solving can only succeed if pre-service teachers themselves have the appropriate (professional) knowledge of the subject matter to be diagnosed. Otherwise, diagnostic processes will be impaired by the existing knowledge gaps [92]. Before a promotion in this area can and should be targeted and discussed as a training element in teacher education, we pursue with this study the concern to describe pre-service science teachers' scientific reasoning competencies in order to derive the relevance of possible curricular implementations in subject, subject didactics, and educational science for diagnostics in experimentation. As described, the latter is of importance in all study elements, but is located differently and requires a theory-related consideration in the following.

#### *1.3. Relevance of Diagnostic Competencies for (Pre-Service) Teachers*

Diagnostic competence of teachers describes both the ability to successfully cope with the diagnostic tasks arising in the teaching profession and the quality of the diagnostic performance [93]. Making efficient instructionally relevant decisions is impossible without being able to identify, understand, and even predict instructionally relevant situations and events [94]. Thus, the investment of a (subject-related) diagnostic competence of (preservice) teachers seems to be an indispensable prerequisite for the teaching profession [95]). In Shulman's three main domains of professional knowledge (content knowledge (CK), pedagogical content knowledge (PCK), pedagogical knowledge (PK) [96], which is the most widely used classification in the literature, knowledge about assessment and diagnosis is classified in the domain of general didactical knowledge (pedagogical knowledge, PK) [97]. Kramer et al. [95] describe either PK (more generic: e.g., teaching disorder [98]) or the subject-specific facets CK or PCK (e.g., diagnosing biology instruction, [99]) as relevant to the application of diagnostic activities and diagnostic accuracy, depending on the diagnostic focus. Results of path analyses utilising Rasch measures showed that both PCK and PK were statistically significantly in relation to pre-service teachers' diagnostic activities. Additionally, biology teachers' PCK was positively related to diagnostic accuracy [95].

Divergent assumptions exist about what comprises teachers' diagnostic competence, stemming from the fact that different aspects such as subject matter, method, and target are modeled. The conceptualisation of diagnostic activities in which knowledge is applied in order to solve specific problems can be seen as equivalent to scientific reasoning

skills [100]. Crucial for a sustainable diagnostic cycle is the transformation of competence into performance mediated by situational skills of perception (P), interpretation (I), and (action) decision (D) in the sense of Blömeke, Gustafsson, and Shavelson's [101] model, in which teachers' competence is viewed as a continuum with multiple transitions (P-I-D model of competence transformation). In this context, diagnostic competence presupposes the correct perception of relevant classroom features (noticing) and their evaluation with reference to theoretically grounded, pedagogical action knowledge (reasoning) [102]. Draude [89] distinguishes between predictive and action diagnostic competence of physics teachers. While the predictive competence measured the extent to which teachers could predict students' experimental difficulties in a particular physics experiment, the actionaccompanying diagnostic competence measured the extent to which teachers diagnosed difficulties during the students' experimentation process. He also found deficits in both areas, so that a promotion of (pre-service) teachers' skills in this regard seems to be indicated [89]. In the present study, therefore, the predictive diagnostic competence of biology and chemistry pre-service teachers with regard to student difficulties, misconceptions, and the necessary central decisions students have to make during open-ended experimentation was assessed by means of a text-based description of a teaching scenario for a student experiment. In terms of examining diagnostic competence, self-report is common in research, so there is a need for tools to survey diagnostic and reflective skills in a natural setting [103,104].

#### *1.4. Self-Efficacy Expectations*

Self-efficacy expectations are considered another major aspect of teachers' professional competence; following Baumert and Kunter's [105] model of professional competence, self-efficacy expectations are considered relevant in addition to knowledge and attitudes. Self-assessments related to motivation, personal engagement, or self-efficacy also appear to be of value in better understanding the interplay between motivational and affective states and diagnostic activities. Self-efficacy, first introduced by Bandura [106] as an aspect of social cognitive learning theory, is described as the strength of one's belief in one's ability to perform a particular task or achieve a particular outcome. Thus, assessing self-efficacy is less about what skills and abilities individuals possess and more about what they believe they can do with the skills and abilities they possess [107]. In this regard, competent performance is guided in part by higher-level self-regulatory abilities [108]. These include general abilities to diagnose task demands, construct and evaluate alternative courses of action, set perspective-close goals to guide one's efforts, and create self-incentives to maintain engagement in stressful activities and manage stress and distracting thoughts [109]. Selfefficacy correlates with academic performance [110,111], task persistence, motivation [112], and resilience in academic contexts [113]. Self-efficacy varies depending on the situation and therefore needs to be considered or captured in a domain- and context-specific manner [114]. Students' self-efficacy in science education has been studied in the science subjects of mathematics [115–118], physics [117], and chemistry [118,119]. With regard to problem solving in mathematics, it was shown that even when students have the ability to solve problems, those who have a strong self-efficacy expectancy are more effective problem solvers [116,117].

Regarding the self-efficacy expectancy of (pre-service) teachers in science, there are some less empirical findings [120–122]. In a study by Yürük [120], pre-service teachers who had taken more science courses in college, felt better prepared to teach science content and had higher levels of self-efficacy. Riese and Reinhold [123] addressed the relationship between physics teachers' CK and PCK (compare [96], see above) and their general and classroom self-efficacy. They found a significant positive correlation between teachingrelated self-efficacy and CK. Kurbanoglu and Akim [121], meanwhile, show that low self-efficacy expectancy regarding the subject of chemistry predicts chemistry laboratory anxiety and has a negative effect on freshmen's attitudes toward chemistry. In biology, there have been very few studies on self-efficacy (compare [114,122,124]). The findings of

Mahler, Großschedl, and Harms [124] indicate that teacher education in college, attending professional development courses, and self-study provide learning opportunities to promote self-efficacy and enthusiasm for teaching. In addition, the authors found that self-efficacy and subject-specific enthusiasm were positively related to PCK.

While self-efficacy or self-efficacy expectations are increasingly a focus of inquiry in teacher education, very few studies can be identified that address the application of selfefficacy theory to diagnostic skills or self-reported perceptions of self-efficacy as a predictor of actual diagnostic skills in pre-service teachers. Motivation, attitude, and knowledge were found to be significant positive predictors of diagnostic skills with respect to learning behavior [125], whereas reflection on experience and self-efficacy were not found to be relevant. In a study of German secondary mathematics teachers at two measurement points, a causal effect of teacher self-efficacy expectations on subsequent instructional quality (selfreported teachers' self-efficacy and instructional quality) was partially found [126]. Given the primarily heterogeneous and partly contradictory findings, the need for research on self-efficacy or self-efficacy expectations in the context of diagnostic skills becomes clear.

#### *1.5. Claim and Research Questions*

In addition to content knowledge about the addressed context in an experiment and about experimentation (CK), the teacher needs pedagogical content knowledge about typical difficulties of students in the experimental implementation of the context as well as about possibilities for action in the experimental instructional setting (PCK) [127,128]. In this study, these dimensions of professional knowledge are addressed as important components in the formation of diagnostic competence. Consequently, the promotion of diagnostic competence requires that these areas of knowledge are either developed or are already present in the pre-service teachers. Especially in the first third of university studies, it is important to clarify what previous knowledge students bring with them or do not bring with them in the development of diagnostic competence for the assessment of students' skills to experiment or are not developed in the basic subject didactic training. On this basis, university teaching-learning programmes can be optimized and tailored to the target group. Since research studies in science education with a focus on the evaluation of diagnostic competence in combination with subject-specific pedagogical knowledge are rare so far [129,130], an explorative approach was chosen for the present study. In a chemistry course and a biology course, respectively, at two institutions at the University of Braunschweig and the University of Kassel the first step was to qualitatively investigate the professional knowledge of methodological difficulties and central decisions/actions of students in science experimentation among pre-service teachers in the first third of their university teaching studies. For this purpose, the following explorative-qualitative research questions were considered:

RQ1: Which difficulties in experimentation pre-service teachers with the subject biology and/or chemistry are able to describe on the basis of their pedagogical content knowledge (PCK) in the first third of their university studies?

RQ2: To what extent can pre-service teachers with the subject biology and/or chemistry describe central methodological contents and actions for experimentation in the first third of their university studies and what (methodological) content knowledge is predominant here (CK)?

A number of individual and contextual factors may influence the willingness and ability of (pre-service) teachers to implement diagnostic activities (while experimenting) in the teaching profession (cf., e.g., [125,131]). Attitudes toward the relevance and importance of diagnostic content in teacher education and, more broadly, the teaching profession are difficult to predict due to overlaps in the addressed knowledge domains. As a content element of pedagogical/didactic and/or educational study elements, both higher and lower attitude expressions would be expected according to the findings of Cramer [132]. Similarly, for the area of self-efficacy expectations with the specification of diagnostic competence in subject-related settings of experimentation, there is a lack of empirical

findings that can be used to make an educated guess about the expression in the sample studied here. Consequently, the following additional descriptive-quantitative questions were examined in order to draw statements about the expression of diagnosis-related attitudes and self-efficacy expectations among pre-service teachers:

RQ3: What attitudes towards the relevance of diagnostics in teacher training and what self-efficacy expectations for diagnostics in experimental settings show pre-service teachers in the first third of their university studies and how are they related in terms of expression?

RQ4: Is there a relation between pre-service teachers' attitudes towards the relevance of diagnostics in teacher training rsp. self-efficacy expectations for diagnostic activities in experimental settings:


#### **2. Materials and Methods**

#### *2.1. Procedure*

The written survey on which this paper is based was conducted at the beginning of two regular obligatory courses identified in the module plan from the 2nd semester of the (bachelor/teacher) degree programme in chemistry at the University of Braunschweig and biology at the University of Kassel. The procedure and information provided, as well as the time frame for completion (approximately 30 min) were identical in the cohorts. An online questionnaire was used to collect (a) demographic and academic information, (b) pedagogical content-related diagnostic knowledge about student difficulties and central actions or decisions of students in experimentation, and (c) the relevance of diagnostics in teacher training as well as domain-specific self-efficacy expectations. When recording the self-efficacy expectations, the participants were free to specify them in relation to the subject biology or chemistry. They had to make a selection beforehand and were assigned to the chemistry or biology sub-sample according to this selection (see Section 2.2).

Participation in the survey was anonymous and voluntary. For the release of the socio-demographic, quantitative, and qualitative data of the closed-ended and open-ended questions, the participants provided a declaration of consent for anonymous analysis and publication.

#### *2.2. Participants*

The sample consisted of 51 pre-service teachers of biology and/or chemistry. Of these, 66.7% were female and the average age was 22 ± 2.6 years. Of the participants, 34 were studying to become teachers at grammar schools and 17 were studying to become teachers at secondary schools. The relevant sociodemographic data for both the total sample and the sub-samples are presented in Table 2. In relation to the sub-sample, the proportions of pre-service teachers in their second or fourth semester were 47% in biology and 62% in chemistry. While a further 43% of pre-service teachers in biology were in their 6th semester, the remaining proportion in chemistry was distributed among the higher semesters. The average number of semesters is similar in both sub-samples, as is the distribution of gender and the type of school targeted in the study programme (see Table 2).


**Table 2.** Demographic characteristics of the participants.

Annotation: *n* = number; *M* = mean; *SD* = standard deviation; GYM = "Gymnasium" (grammar schools); HR = "Haupt-/Realschule" (secondary schools).

> Approximately three-quarters of the participants had taken one or two courses in subject didactics at the time of the survey (biology 67%, chemistry 77%); all others had already taken three or more courses in subject didactics. At the University Braunschweig, the largest proportion of pre-service teachers recruited were those from the chemistry subsample. Only two participants of this sub-sample completed the survey in the context of a chemistry didactic course at the University Kassel. Further participants from this course could not be included in the analysis due to too high a semester number and missing data. Accordingly, a description of the university teaching and pre-conditions for the chemistry sub-sample focuses on the curricular structures at the University Braunschweig. Here, pre-service teachers take subject courses in the first semesters, in which investigations tend to follow detailed instructions in laboratory practicals. Content knowledge and skills in natural sciences working methods and techniques and handling laboratory materials are to be developed. In the 4th semester, pre-service teachers usually attend courses with didactic content for the first time, in which, among other things, the hypothetical-deductive processes for scientific inquiry and experimental problem solving are addressed. The data collection took place at the beginning of the seminar "Simple scientific experiments".

> The biology sub-sample mainly consists of pre-service teachers with biology as a subject from the University of Kassel. Here, too, there is an exception to two participants who come from the University of Braunschweig and who chose biology rather than chemistry in the survey section on domain-specific self-efficacy expectations. However, since more than 90% participated in the survey at the beginning of a course in the didactics of biology at the University Kassel, the curricular structures available here are used to describe the university teaching- and pre-conditions for the biology sub-sample. The first two semesters of the biology teaching programme at the University of Kassel are also dominated by subject-related biology courses (incl. laboratory). In addition, the basic module "Introduction to Biology Didactics" with lecture and exercise should be completed in this stage. In this module, two individual sessions provide basic background and information on the scientific inquiry process and the associated subject-methodological (procedural) knowledge. In each subsequent semester there are further subject didactic courses with different emphases. The survey took place at the beginning of the course "Scientific inquiry methods and lab techniques in biology teaching", which is central to the content area of scientific inquiry; it can therefore not be assumed that the participants have received in-depth training in this area.

#### *2.3. Instruments*

Some authors critique a lack of validity evidence for instruments to assess scientific reasoning competencies (e.g., [133,134]) and point out that multiple choice assessment can hardly be seen as situations closely representing real life (e.g., [135]). In some studies with students response processes have also been examined qualitatively (through thinking aloud, eye-tracking studies, video recordings and written recordings), and these studies confirmed that respondents use procedural and epistemic knowledge (e.g., [136,137]).

Moreover, as science is constituted among others by specialized language [138], a central part of this study is a qualitative instrument to determine pre-service teachers' scientific reasoning knowledge concerning student' difficulties.

#### 2.3.1. Student Difficulties/Misconceptions and Actions in Experimentation

The central concern of the open-ended questions posed here in two parts is to record the pre-service teachers' Pedagogical (didactic) Content Knowledge (PCK) about typical students' difficulties/misconceptions as well as their Content (methodological) Knowledge (CK) about the processing of an experimental task for an authentic teaching scenario. The latter is presented to the subjects in the form of a short progression plan from the perspective of the teacher and the experimental task given to the students via a task sheet (see Supplementary Materials S1). The phenomenon to be investigated here for dissolving sugar is visualized to the (fictitious) students as well as to the pre-service teachers by a video with a dialogue between two friends at and about a cup of tea ('tea conversation') and used to derive the question "How does the time needed to dissolve sugar depend on different influencing factors?" This task stem is followed by two open-ended questions or subtasks. In part 1, the central difficulties of the students in accomplishing the set experimental task or the planning and execution of an experiment on the dissolving time of sugar are examined with regard to the content knowledge and the experimental implementation. Part 2 explicitly aims to attain methodological (procedural) knowledge for the design of experiments in the sense of the scientific inquiry process and to also indirectly overcome possible obstacles to master this process for students. In this part, four central decisions or necessary actions of the students to accomplish the experimental task are to be articulated by the pre-service teachers in writing.

#### 2.3.2. Relevance of Diagnostics in the Teacher Training Program

The measurement instrument for assessing attitudes toward the importance of diagnostics in teacher education includes five items, the language and content of which were adapted to fit the focus of the present study, based on Lorenz [139]. Based on the characteristic values obtained in this sample for Cronbach's alpha [140] (pp. 281–302) with 5 items, α = 0.64, and a low discriminatory power for the item: it is important for teachers to be able to correctly assess students' performance in experimentation (*r*it = 0.165), a reduction of the scale to four items is made (α = 0.70). A complete overview of all items can be found in Appendix A.

#### 2.3.3. Domain-Specific Self-Efficacy Expectations

Based on the concept of self-efficacy as devised by Schwarzer and Jerusalem [141], an instrument was developed for the domain-specific assessment of subjective certainty in coping with diagnosis-related teaching activities in general, as well as in instructional experimental settings. In addition to an explicit focus on this selected activity domain, the item formulation describes the perceived ability to be assessed here as "proficiency". Moreover, the ability of an action is specified in its effectiveness via the inclusion of challenges or obstacles or barriers to action in the item formulation [109]. The items used in the present survey [142] were administered and piloted to 98 pre-service teachers majoring in biology over three semester cohorts. Based on the piloting data, the dimensionality of the scale was tested using exploratory factor analysis (principal component analysis with varimax rotation) and internal consistency was tested using Cronbach's alpha. Factor analysis revealed two factors with an eigenvalue > 1 that explained 59% of the variance. On the first factor, four items loaded highly (factor load (ajq) > 0.649). One additional item loaded similarly on both factors; based on content, this item was assigned to the second factor. Thus, factor one (experiment-related diagnostic activities) is represented by four items whose content is explicitly diagnostic and/or experimental. On the second factor, another three items loaded similarly highly (ajq > 0.709). Together with the content-related item, this factor describes diagnostic activities implicitly and without the inclusion of

experimentation/processes (general diagnostic activities). The reliability of the two scales formed according to the two-factor model was in the good range (α ≥ 0.70). The items and (factor analytic) findings for piloting the instrument on domain-specific self-efficacy expectations can be found in Appendix B.

In the survey presented in this article, a further subject-related specification took place in the item formulation and content orientation. In accordance with the sub-samples, the subjects were provided with either biology or chemistry items to assess their self-efficacy expectations. Using a 4-point Likert scale ranging from "does not apply at all" (1) to "fully applies" (4), the pre-service teachers' self-reports were recorded in relation to the respective items. The empirically derived and validated scales from the pilot were adopted; the biology sub-sample showed in the very good range (α ≥ 0.83) and the chemistry sub-sample showed acceptable Cronbach's alpha values (α ≥ 0.61).

#### *2.4. Data Analysis*

#### 2.4.1. Qualitative Analysis Methods

The written student responses on knowledge about student difficulties and actions/ procedures during experimentation were analysed qualitatively according to the procedure of a summarising content analysis with deductive-inductive category formation [143] using the programmes MAXQDA 2020 and Excel 2016. For the first part with written pre-service responses on student difficulties and misconceptions, the formation of content categories was initially carried out deductively with a theoretical foundation based on research findings, which primarily explicate the prior knowledge of the researchers involved in the study [39,47], as well as on results from a wide range of literature sources (see Table 1). Where possible, the references were classified in the process-related sub-steps of experimentation in Mayer's [34] structural model of scientific reasoning, whereby the formulation of the question was already specified in the material. In the second part (procedure for experimenting), the central decisions listed were sorted into subcategories inductively formed on the material on the basis of similarities in content. These were then classified by subsumption into supercategories based on the hypothetical-deductive procedure in the experiment [34]. This systematic inclusion of deductively and inductively formed categories served to identify and explore further meaning components in the process of creating the category system up to the category definition through anchor examples [144]. The assignment of the student responses to the categories for the 1st and 2nd part was carried out independently by two trained raters. Interrater reliability was estimated using Cohen's Kappa with the programme IBM SPSS Statistics (version 27) and is 'substantial' in the 1st part (Cohen's κ = 0.80; *p* ≤ 0.001) and 'almost perfect' in the 2nd part (Cohen's κ = 0.903; *p* ≤ 0.001) [145]. Finally, the relative frequency for each code in each category was calculated for each group of biology or chemistry students.

#### 2.4.2. Quantitative Methods of Analysis

In order to analyse the attitudes towards the importance of diagnostics in teacher education and domain-specific self-efficacy expectations towards diagnostic skills/actions in teaching-learning settings for science experimentation, descriptive procedures, representations, and associated characteristic values were used and analysed with the programme IBM SPSS Statistics (version 27). In the first step, deductively derived as well as newly constructed items for the assessment of self-efficacy expectations were used in a pilot sample in order to test them accordingly factor-analytically and in their reliability (see Section 2.3.3). A further reliability test was also carried out for the two empirically derived scales on domain-specific self-efficacy expectations as well as for the newly constructed scale on the significance of diagnostics in teacher training in the sample on which this study is based. Taber [146] was used to assess the Cronbach's for internal consistency. In the next step, the sample was analysed in the subjects included here and the constructs examined in each case (Mann–Whitney U test; Wilcoxon test) in order to uncover any differences in attitudes and self-assessments that could influence possible correlations

with the results on the open-ended task and allow conclusions to be drawn about subject specifics. According to Cohan [147], possible effects associated with this are rated as insignificant for *r* < 0.10, weak for *r* = 0.10–0.30; medium for *r* = 0.30–0.50 and strong for *r* > 0.50. In accordance with the goal of describing a comprehensive picture of the extent of subject-specific methodological (procedural) knowledge of experimentation in combination with diagnosis-related competence assessments in this area, frequencies and mean values in the respective characteristics are reported and, in the last step, the correlations between the quantitative and qualitative data are exploratively tested via correlations (Spearman rank correlation). Due to the small samples that predominate, especially in the subject sub-samples, and a violation of the criterion for normal distribution in selected scales, non-parametric procedures were used throughout.

#### **3. Results**

In total, 49 fully completed questionnaires by the pre-service teachers for the first subtask (student's difficulties) and 50 for the second subtask (procedure for experimenting/key decisions) were included in the predictive diagnostics.

(RQ1) To analyze participants' responses to the first subtask, process-related difficulties and misconceptions among students described in the theoretical literature (see Table 1) were applied to the material. The pre-service teachers' responses were paraphrased prior to analysis, which involved rewriting the responses' core content in a concise descriptive form [143]. This then allowed them to be reliably assigned to the categories. For example, the statement "... The students do not know what things are important/what one needs to pay attention to—no connection to the research question—no hypotheses, ..." was reformulated into the core components "creating a link to the research question" and "no hypothesis generated". The statement "... should be clarified and potentially what a conjecture/thesis entails" was reduced to "technical term conjecture/thesis not known". A total of 293 statements could be coded from the 49 answer sheets examined. Of these, 166 statements referred to process-related difficulties and/or misconceptions by students, which corresponds to roughly 57% (Table 3). In addition, the respondents frequently mentioned difficulties arising from the "instructional setting", specifically from the openended task structure and materials pool, which overwhelmed students and required them to make decisions. Statements referring to students' decisions about how to divide up tasks and/or disagreements within the group were assigned to the category "social format". Statements referring to teachers' instructional planning were assigned to the category "teacher". A total of 94 statements (32%) were assigned to these non-process-related categories. Difficulties related to subject-specific content knowledge or technical terms were mentioned in 33 statements (11%).

In the following, the difficulties that students have to deal with during each sub-steps within the hypothetico-deductive scientific inquiry process according to the pre-service teachers' statements are explained (see Table 3).

In terms of the **phenomenon**, the pre-service teachers mentioned a lack of or differences in prior knowledge. (The students will not yet be familiar with the phenomenon of diffusion; [...] was not worked through as a group, meaning that the students will not be able to think deeply about the phenomenon. It also seems that no ideas were taken up or discussed. Hence, there is no bridge to their prior knowledge [...]).

Potential difficulties in dealing with the provided **research question** to be investigated in the fictitious experimental instruction setting (see Supplementary Materials S1) were also identified. These difficulties concerned the students failing to understand or refer back to the research question ("The students might not be able to make connections between the conducted experiment and the research question").

**Table 3.** Overview of the students' process-related misconceptions and difficulties faced during experimentation identified by the pre-service teachers (incl. paraphrases) according to the three phases for scientific reasoning of the SDDS model [22].



**Table 3.** *Cont.*

Annotation: *h* = absolute frequency of mentions from a total of *n* = 49 pre-service science teachers.

Only a few statements mentioned difficulties regarding **hypothesis generation** compared to the later steps of the process. Three participant statements referred to the aspects of having no hypothesis, hypothesis formulation, and generating multiple hypotheses, while one statement referred to the link between the supposition and research question. No pre-service teachers mentioned formulating justifications for hypotheses as a difficulty faced by students. One pre-service teacher wrote: "To me, the connection between technical terms and phenomena observed in everyday life does not seem to be pronounced enough in sixth grade in order to bridge the gap from multiple hypotheses [...] to independently conducting an experiment to test these hypotheses". This statement addresses the association between the hypothesis and the need to plan an experiment that successfully tests the hypothesis.

With regard to **planning**, no respondents mentioned potential problems connecting the hypothesis and experiment, i.e., planning an experiment that is able to actually test the proposed hypothesis. In contrast, difficulties with planning a meaningful experiment and selecting appropriate materials (e.g., utilization of the materials pool) were mentioned very frequently ("It is possible that the students might become overwhelmed by the various materials on offer"). Application of the control-of-variables strategy was also described by many respondents ("Moreover, various factors might be unwittingly coupled with one another, meaning that only a single factor is not investigated"). Three mentioned difficulties concerned trying things out in an unstructured way ("no plan-change all variables") ("that the students change their minds while conducting the experiment if they get the feeling they have selected the 'wrong' factor").

**Variable selection** was included as a unique overarching category, since it accounted for 25% of process-related statements and was mentioned by 28 participants ("It could be the case that no influencing factors are selected, but only changes that exert no influence"; Ultimately, the students need to know and/or be able to identify the influencing factors in order to independently develop an experimental approach"; "A difficulty for students would be recognizing the difference between loose sugar and sugar cubes").

With respect to **conducting the experiment**, general problems were mentioned: "Difficulties arise in conducting the experiment". The spectrum of statements referring to students' difficulties with respect to failing to or incorrectly operationalizing variables, from imprecise measurements to measurement errors, ranged from taking imprecise measurements with the stopwatch to the use of different amounts and/or volumes of the

chemicals and measurement differences between students ("When conducting the experiment, replicability or precision of measurements will pose problems for the students."; "constantly switch the person doing the measurement, which can lead to imprecise measurements"). Students' difficulties regarding documentation also fall under this category ("that the results need to be documented; the learners might not do this and forget the times they measured, for example"). With respect to evaluating evidence, only one respondent mentioned that students might attribute unexpected results to an error made in conducting the experiment ("They might change their results when they have the feeling that something is not right or they have done something incorrectly."). Insufficient reflection by the students on the experimental results was mentioned once ("Errors are not considered—No analysis takes place (causal conclusions are not possible)."). The most frequently mentioned difficulty was that no conclusions are possible because the variables were varied unsystematically ("Students conduct the experiment with two influencing factors simultaneously, which does not lead to an unambiguous result"; "Focusing on just one aspect is probably difficult for some groups of students, so they try to investigate multiple factors simultaneously and only realize at the end that they cannot draw any conclusions from the investigation").

Problems with **evaluating evidence** due to lack of a comparison or blinded sample were also mentioned ("Furthermore, students often forget to create blinded or comparison samples and thus cannot draw any concrete conclusions from their results").

Overall, these problems were attributed to students being overwhelmed by the "lack of guidance" in the self-directed procedure, which did not involve following prescriptive, externally prescribed experimental steps ("Moreover, it is not precisely described how the experiment should proceed"; "that the students can become overwhelmed by this freedom and autonomy"). These statements were assigned to a non-process-related overarching category, the "instructional setting", which the pre-service teachers referred to 71 times, with particular focus on obstacles stemming from the open-ended nature of the task. Additionally included in this category was students' lack of experience in dealing with experimental materials, which was mentioned 10 times ([...] difficulty correctly using the given materials"; [...] that the students are not familiar with all the materials and how they are used"). Overall, 94 statements were coded into one of three such non-process-related categories, the "instructional setting", "social format" and "teacher". Twelve statements concerning students' decisions about how to divide up tasks and/or disagreements within the group were assigned to the category "social format" ("Students cannot come to an agreement within the group"). Eleven statements addressing the teacher's instructional planning were assigned to the "teacher" category ("[ . . . ] handouts to assist students or opportunities to repeat explanations are not included in the experiment, which could lead to excessive questions"; "Since no thermometer is available, the students cannot make any statements about the water temperature"). Beyond these overarching categories and subcategories, difficulties related to subject-specific content knowledge and technical terms were also reported. Lack of familiarity with technical terms was mentioned 14 times, with the term "influencing factor" coming up seven times, "conjecture" three times and other technical terms four times ("have problems finding out what influencing factors are exactly"; "even the term 'influencing factor' should be clarified and possibly also what a conjecture/thesis entails") ("Effect on dissolution speed is difficult in sixth grade"). That subject-specific content knowledge might be lacking, insufficient or fragmentary was mentioned 19 times ("With respect to subject-specific content knowledge, the difficulty might arise that some students do not know that sugar dissolves more quickly and easily at higher temperatures"; "Background subject-related content knowledge is not yet present").

To summarize, the respondents considered here (*n* = 49) mentioned four of the nine overarching categories on average (*M* = 4.00, *SD* = 1.26; min = 2; max = 7) in their written responses asking about potential difficulties faced by students, including roughly two of the five process-related overarching categories (phenomenon, hypothesis generation/research question, planning, variable selection, conducting the experiment, and evaluating evidence) (*M* = 2.33, *SD* = 1.18; min = 0; max= 4). Three pre-service teachers did not mention any process-related difficulties. Statements falling under the overarching category of hypothesis generation were by far the least frequent (2% of the total number of statements in the process-related categories). Considered together with the phenomenon and the research question, this rose to 10% of all statements in the process-related categories, which is similar to the share of statements referring to evaluating evidence (15%). Planning was mentioned most frequently by the respondents (29% of all statements in the process-related categories), with the lion's share referring to selecting appropriate materials and the control-of-variables strategy, each of which made up 11% of all statements in the process-related categories.

Differences in the quality of participants' statements were related to their subjects of study, with the pre-service chemistry teachers using technical terms like "familiarity with the RGT rule" (authors' note: Reaction velocity-Temperature-Regulation), "materials surface", "solubility product", and "law of mass action" more frequently than the preservice biology teachers, who used more general formulations like "dissolution time depends on the temperature", "recognizing the difference between loose sugar and a sugar cube", "factors like the solubility or saturation of a liquid and corresponding effect on the dissolution speed". The only further differences between the biology (Mdn = 6.00) and chemistry pre-service teachers (Mdn = 4.50) uncovered concerned the total number of difficulties identified in all nine overarching categories (Mann–Whitney U-test: *z* = −2.377, *p* = 0.017; *r* = 0.34). Examining the frequencies of the process-related and non-processrelated categories revealed that this difference reflected a larger share of non-process-related categories in the biology pre-service teachers' statements. Consequently, no significant differences in the number of process-related difficulties mentioned were found. A total of 24% of the biology pre-service teachers (*n* = 29) referred to the groups' social fabric as it related to completing the tasks, compared to just 10% of the chemistry pre-service teachers (*n* = 20). A similar pattern was found for the "teacher" category, which was mentioned by 24% of biology pre-service teachers but only 5% of chemistry pre-service teachers.

(RQ2) The university pre-service teacher statements about four key decisions students need to make when solving problems experimentally could be assigned to seven superordinate categories and 26 subcategories (Table 4). A total of 325 statements were analyzed, of which 298 were assigned to the following process-related superordinate categories, which referred to phases of the experimentation process [22,34]: decisions about the "phenomenon", the "research question and/or hypothesis", "working with and identifying variables", "planning", "conducting inclusive documentation", "analysis and interpretation". All statements by participants were analysed, regardless of the number of decisions a respondent mentioned, which ranged from 2 to 14 in total (both process-related and non-process-related).

On average, the pre-service teachers mentioned decisions about a bit more than three (3.34) process phases/elements (=superordinate categories). In addition, five pre-service teachers made six mentions of decisions by teachers, such as ensuring appropriate group composition or an appropriate task. None of the participants mentioned all of the process phases/elements considered here in their responses. Turning to the subcategories, the pre-service teachers most frequently mentioned decisions related to the influencing factors (around 20% of the 298 total mentions of process-related decisions) and the experimental materials (12%), in line with the difficulties students were considered to face in this area (see Table 3). Among decisions about "variables" and "planning", the aspects of avoiding confounders, mentioned twice by two different pre-service teachers, and including a control group, mentioned three times by two different pre-service teachers, were grossly underrepresented. This was also the case for estimating the experimental validity within the superordinate category of "analysis and interpretation", which was mentioned by three pre-service teachers one time each.




**Table 4.** *Cont.*

Annotation: *h* = absolute frequency of mentions from a total of *n* = 50 pre-service science teachers; *f* = relative frequency of mentions in the subordinate category (%); each respondent could identify multiple influencing factors.

> Overall, the pre-service teachers frequently adopted a results-focused perspective: "[...] After conducting the experiment, the students should collect the results, discuss them as a group [...]"; "How can I answer my question for others. How can I be sure of my result, does the experiment I have conducted really answer my question". Around half of the pre-service teachers adopted an understanding-oriented perspective and explicitly mentioned the need for students to "understand the research question/assigned task" and the need to make decisions about "selecting appropriate materials" or the "order of work steps". In this context, references were made to the open-ended structure of the task, the third most commonly mentioned difficulty for students in Part 1 (see Table 3), ("The learners need to familiarize themselves with which work steps they will conduct in which order"; "... it could be difficult if they do not have sufficient practice in conducting experiments and the students forget steps like documenting the results"; How do I conduct the experiment, which work steps logically follow one another") as well as decisions about selecting materials from the materials pool ("They need to ascertain what materials are necessary to match the selected influencing factor"; "They need to decide what materials they want to use and which ones they don't need or don't want to use"). Three preservice teachers focused exclusively on deciding on an influencing factor, such as water temperature or water volume, or deciding whether to stir or shake the samples during the experiment.

> When comparing the two subsamples, it was found that the biology pre-service teachers (Mdn = 4.00) made reference to more process steps (=superordinate categories) when discussing decisions that need to be made in carrying out the example experiment than the chemistry pre-service teachers (Mdn = 3.00; Mann-Whitney U-Test: *z* = −2.639, *p* = 0.008; *r* = 0.37). A similar pattern was found for the total number of process-related decisions across all subcategories among the biology (Mdn = 6.00) and chemistry preservice teachers (Mdn = 5.00). The biology students reported more unique process-related decisions here as well; the difference just failed to reach significance and represented a small effect (*r* = 0.27; Mann–Whitney U-Test: *z* = −1.931, *p* = 0.056).

> (RQ3) The university pre-service teachers' attitudes toward including diagnostic elements and promoting diagnostic skills in university teacher education were positive and quite pronounced. The full-sample mean was substantially above the scale midpoint, at *M* = 3.13 (*SD* = 0.395), (see Appendix B). No significant differences in the strength of these attitudes were found between the two subsamples (MdnChemistry = 3.00; MdnBiology = 3.25; *z* = −1.414, *p* = 0.157). Likewise, there were no significant differences between the two subsamples of pre-service teachers in perceived domain-specific self-efficacy with respect to general (MdnChemistry = 2.50; MdnBiology = 2.50) as well as experimentation-related diagnostic activities (MdnChemistry = 2.50; MdnBiology = 2.25). However, there was a significant difference in the full sample between the two perceived self-efficacy scales and attitudes towards the importance of diagnostics in university teacher education. The

pre-service teachers' subjective perception of their own skills in successfully carrying out general and experimentation-related diagnostic activities (MdnSelf-efficacy factor1/2 = 2.50; MdnRelative importance of diagnostics = 3.00) was lower than their perception of the importance of this teaching and learning topic for university teacher education (Self-efficacy factor 1: *z* = 5.612, *p* < 0.001; Self-efficacy factor 2: *z* = 5.345, *p* < 0.001; *n* = 50). The effect sizes can be considered large, *r* > 0.75. However, no correlation between these two motivational constructs was found. The pre-service teachers saw potential for improvement in the diagnostic competences referred to in the perceived self-efficacy scale, as they tended to "somewhat disagree" or "somewhat agree" to these items (e.g., *M*Self-efficacy factor1-Bio. = 2.38, *SD* = 0.61; Table 5).

**Table 5.** Item scores for the domain-specific perceived self-efficacy scales.


Annotation: *n* = sample size, *M* = mean, *SD* = standard deviation, *r*it = discriminatory power.

(RQ4) Quantifying the qualitative findings into frequency scores for the difficulties students face and the different process-related decisions/actions during experimentation made it possible to test the associations between these and self-efficacy expectations and attitudes. (a) With respect to the first (qualitative) subtask, the <sup>a</sup>number of difficulties mentioned in all superordinate categories, number of difficulties in the process-related superordinate categories, and number of process-related superordinate categories addressed were included in the analysis. There are no significant correlations between these knowledge-based expressions and attitudes toward the importance of diagnostics (e.g., <sup>a</sup>Spearman's *ρ* = 0.005, *p* = 0.974). The same is found in comparison to the domain-specific self-efficacy expectations. In this sample, the expression of self-efficacy expectations with respect to general as well as experimentation-related diagnostic activities is not related to the pre-service teachers' knowledge of student difficulties in experimentation. (b) For

the second (qualitative) subtask, the <sup>b</sup>number of process-related superordinate categories addressed in the mentioned decisions and the number of decisions mentioned in the process-related superordinate categories were included. These results also did not correlate with perceived domain-specific self-efficacy or <sup>b</sup>attitudes towards the importance of diagnostics in university teacher education in the present sample (e.g., <sup>b</sup>Spearman's *ρ* = 0.038, *p* = 0.790).

#### **4. Discussion**

What began in the 1990s with a call for "Science for All" [148] has led to the setting of obligatory educational goals in countries' and states' curricula and standards regarding scientific inquiry and scientific reasoning as a component of scientific literacy (e.g., [4–6,149]). In this respect, school curricula in the natural sciences (e.g., NGSS [149], University location federal state 1 [150,151], University location federal state 2 [152,153]) require students to be able to conduct and reflect on scientific inquiry processes. Subject-specific content on promoting scientific reasoning in the classroom are also anchored in curricular standards for teacher education (e.g., [9,11] and thus must be taught in university teacher education. Appropriate teaching-learning concepts for scientific inquiry in university education can give future science teachers a better understanding of the difficulties and the (mis)understandings, alternative ideas or misconceptions students experience during experimentation, in order to be able to include those diagnosis-related aspects. Accordingly, there are lines of research focusing on (pre-service) teachers, e.g., measuring and assessing pre-service teachers scientific reasoning competencies in higher education [133,154,155], verification of validity [156], evaluation of translated versions [157], but these are rather limited and predominantly of a quantitative nature (e.g., [158]). Many studies also focus predominantly on developing and testing concepts and materials to promote subject-specific, pedagogical content knowledge and pedagogical knowledge related to scientific reasoning (e.g., video vignettes as a support for scientific reasoning, including video vignettes as a tool to promote students' learning: e.g., [48,98,99,159]; seminar concepts: e.g., [160]). However, the first step is to identify pre-service teachers' knowledge state in order to appropriately adapt university courses and to develop and/or employ alternative teaching approaches. Consequently, the present study did not focus on developing teaching-learning concepts for the subject didactic training of pre-service teachers; instead, the primary focus lies on potential prerequisites for learning such diagnostic activities, not only in the knowledge areas mentioned (CK; PCK), but also in terms of attitudes and self-efficacy regarding diagnostics and scientific inquiry (with a focus on experimentation). To summarize, it became clear that the pre-service teachers in the present study were only able to identify and cite a varying yet small number of potential student difficulties in the experimental problem-solving process from the comprehensive catalogue, presumably based on their own experience in school and basic training in subject didactics. The same was true of procedural knowledge in carrying out an investigation of an illustrative experimental phenomenon ("knowing how" [29,30]). In contrast, the teacher education students had a strong sense of the importance of diagnosing students' experimentation skills and the hurdles students face, as well as a moderate level of self-efficacy in carrying out such diagnostic activities. The research questions underlying these findings can be answered and discussed in detail as follows:

(RQ1) The pre-service teachers cited difficulties students face in all steps of the experimentation process (see Table 1), but with very different frequencies (see Table 3). They predominantly described difficulties in the planning phase, including variable selection, followed by the implementation phase. Contrary to the usually higher self-assessment of scientific reasoning abilities [133], only a small number of unique difficulties were described with respect to evaluating evidence and most of all, formulating a research question and hypothesis, and the number of participants mentioning these areas was very low as well. With respect to the research question, this is possibly due to the structure of the task and teaching scenario applied in the study, in which students were provided with an

overarching research question (see Supplementary Material S1). Moreover, it can be concluded that pre-service teachers not only lack pedagogical (didactics) content knowledge about potential student difficulties in these phases, but also exhibit a lower level of content methodological (procedural) knowledge, e.g., with respect to formulating hypotheses. Students in the early semesters of university teacher education are still largely unfamiliar with the procedure and content of hypothesis formulation [155], which is at least partially due to the lack of or minimal opportunities to learn how to formulate research questions and hypotheses in school-based science instruction [42].

Overall, the pre-service teachers in the present study frequently attribute students' difficulties to the instructional setting, e.g., the open-ended nature of the task, working with the experimental materials, coming to an agreement and dividing up roles within the group during experimentation are most frequently mentioned. The participants' knowledge of PCK with regard to potential student difficulties and misconceptions hardly extends to aspects of subject-specific methodological concepts, such as planning an experiment that actually tests the hypothesis(es), the question of confounding variables or dealing with unexpected data. Overall, the respondents used rather general terminology, with scientific terminology [161] used only to a very small extent. Comparing the sub-samples, however, the pre-service chemistry teachers used technical terms more frequently, which may be due to the specific experiment selected in the example teaching unit (sugar's dissolution time) at the university location Braunschweig at the time of the survey. In contrast, it can be seen that the pre-service biology teachers cite more difficulties students face than the pre-service chemistry teachers. However, this difference is only due to a higher proportion of non-process-related categories mentioned by the pre-service biology teachers. As with the use of scientific terminology, this may be due to the number of subject didactics courses already completed by each group at the universities in which the study was conducted. Even though the study was conducted before any courses on scientific reasoning at both locations, the pre-service biology students in Kassel had already briefly dealt with this content area in their introductory lecture and tutorial.

(RQ2) With regard to key decisions (procedure for experimenting), the university preservice science teachers tended to return to aspects they had also mentioned with respect to difficulties and misconceptions. The majority of respondents referred to decisions about working with and identifying variables in the experiment, planning, and actually conducting the experiment (including documentation). Interestingly, however, more than a third of the pre-service teachers also mentioned decisions about the research question or hypothesis, i.e., experimental steps that were underrepresented in the responses to the first subtask. It is possible that the pre-service teachers made greater reference here to hurdles had experienced themselves in their practical laboratory training at university. For example, one participant writes: "Now the students should not be thrown into the experiment like that. Forming hypotheses is often still a sticking point in tasks, even for university students, albeit at a higher level. [...] However, gaining knowledge should always be the goal of the experimental phase. Finally, in order to consolidate knowledge, the hypotheses must be verified or falsified based on the knowledge gained during the experiment." Results- and understanding-oriented statements are also made here, presumably due to the fictitious, open-ended experimental situation. With appropriate adaptive support (scaffolding), preservice science teachers could be encouraged to further develop their process-oriented and results-oriented thinking patterns in this regard [162]. Considering the statements from both subtasks together, the pre-service biology teachers describe significantly more decisions in the process-related superordinate categories and also tend to describe more process-related decisions in the subcategories than the pre-service chemistry teachers. Nevertheless, subject-specific methodological (procedural) knowledge regarding scientific reasoning is rather low in the overall sample, similar to other studies in this area (e.g., [160]).

(RG3) The pre-service teachers rated diagnostics or diagnostic activities concerning experimentation as highly important for university teacher education, corresponding to the high perceived importance of diagnostics in teacher training more generally [163]. However, in-service teachers have a divergent view. Only around one-third of the inservice teachers surveyed by Lorenz [139] believed that diagnostic training is required to correctly assess students' competencies. It is possible that in-service teachers see their university studies in retrospect as focused on theoretical content and content related to their school subject, while diagnostics can only be performed when actually teaching in schools and thus can be learned in practice. Despite an increase in diagnostic knowledge during their studies compared to the beginning of their studies, pre-service teachers still rate it as below average compared to their knowledge of their subject and pedagogical knowledge [91,164]. In addition, diagnosing students' performance and difficulties in experimentation is more challenging than in other areas of science education, as it can only be assessed to a rather limited extent through written tasks. However, knowledge of the experimental competencies of the students being assessed—in particular, explicit knowledge of difficulties and misconceptions, which enables science teachers to assess student achievement against curricular expectations—is fundamental in science subjects [163]. This knowledge goes along with higher self-efficacy [124], which is in turn positively related to the successful transfer of content from teacher training (e.g., [165–167]) and ultimately to student achievement [168].

(RQ4a/b) This study's results concerning self-efficacy show that pre-service science teachers' subjective perceptions of their ability to successfully carry out general and experimentation-related diagnostic activities are significantly lower than their attitudes regarding the importance of diagnostics in teacher education. No correlations between knowledge of students' difficulties and key decisions in experimentation procedures and self-efficacy in diagnosing students' abilities in scientific reasoning acquisition were found in the sample studied here. In a study of 495 pre-service biology teachers [161], advanced students' (Master's or State Examination degree ≥ 7) self-efficacy to plan and conduct biology lessons correlated with the knowledge of what data to collect when assessing experimentation skills. No such correlation was found for pre-service teacher students at the undergraduate level. The present study examined pre-service teachers at an early stage of university teacher education, in the first third of their studies, which may be associated with a low level of knowledge in assessing self-efficacy and its associated dimensions. It is also possible that our findings were due to the surveyed students' low assessment of their own scientific inquiry competence. Conducting such surveys at the beginning of students' studies is therefore highly relevant in order to be able to promote such competences in a targeted way based on the obtained results. Khan and Krell [169] investigated the scientific reasoning competencies of pre-service Canadian science teachers and how they improved as a result of an intensive methods course including a 15-week internship. This intervention significantly improved the students' competencies in planning experiments and testing models, demonstrating that these are trainable through instruction on the scientific method including an internship with the opportunity to model, engage in, and reflect upon inquiry instruction in the science classroom. Further intervention studies demonstrate an increase in scientific reasoning competences as a result of combining different scaffolding formats [12,48,170]. In this context, it also seems interesting that, in addition to training (through scaffolds), greater background knowledge of scientific reasoning may influence pre-service teachers' competencies. Participants with prior university degrees (in other subjects) performed better in a multiple-choice questionnaire surveying scientific reasoning competencies than participants with no prior university degrees, perhaps because the former group could draw on greater background knowledge [169].

#### *Limitations*

Although the sample size was small, especially when it came to the biology and chemistry subsamples, and the findings require replication among physics education students and students at other universities, the study was able to provide insight on the views of pre-service biology and chemistry teachers in the first third of university teacher education on students' difficulties, misconceptions and key decisions in experimentation. This sample

was specifically selected because the pre-service science teachers were still at a relatively early phase of their teacher training, i.e., they were attending their first course on subject didactic training, and prior internships in their subject of study (biology/chemistry) and introductory lectures were not expected to have had much influence. The present study's sample therefore made it possible to assess the status quo of pre-service science teachers' professional knowledge and self-efficacy regarding scientific reasoning and diagnostics, in order to derive possible support measures.

At both university locations where the surveys took place, teaching education in the natural sciences focuses on scientific inquiry in research and teaching, and this context is likely to be significantly affected the students' specific content and pedagogical content knowledge. The second subject the pre-service teachers were studying might have also exerted an influence. For example, a measurement instrument for upper secondary students uncovered a distinction between "experimental ways of thinking and working" and content knowledge [170]. Pre-service teachers studying two science subjects have more opportunities to learn how to conduct scientific work, i.e., how to apply and reflect on scientific reasoning skills in different contexts and with different strategies [68], which leads to higher levels of competence [14,171].

The present survey of perceived difficulties and key decisions students face in experimentation had respondents refer to an example experimentation context (see Supplementary Material S1). While it would be possible to evaluate and carry out scientific reasoning skills in a context-free manner, it cannot be generally assumed that the same pre-service science teachers will provide the same responses on a task with the same format referring to a different context. Even though declarative knowledge of the context has only a minor influence on the successful completion of scientific reasoning test items, students with similar levels of procedural knowledge perform differently in such tasks (for an example regarding hypothesis-testing skills: [171]), which may be due to their different levels of declarative knowledge. Similarly, a different strategy can be used with each phenomenon or scientific problem to be discovered and researched, even in the same domain [68].

Another methodological limitation that deserves mention here concerns the scales measuring the students' attitudes and self-efficacy, which should be interpreted in comparison to other findings on these constructs. Neither follows the recommendation to apply scales with a large number of points [108], as they only contain four levels, potentially limiting the range of respondents' assessments. However, other studies have assessed self-efficacy with equally short Likert scales (e.g., [172]), while some studies have applied more comprehensive scales (e.g., [173]). In the future, it should be examined whether the number of scale points influences assessments of abilities and attitudes in a given area.

#### **5. Conclusions**

This study surveyed knowledge of students' difficulties, misconceptions, and key decisions in experimental problem-solving in the context of an authentic lesson plan, attitudes concerning the importance of diagnostics in teacher training, and self-efficacy related to diagnostics in experimental settings via an online questionnaire with a sample of pre-service biology and chemistry teachers. The developed instruments/tasks make it possible to survey pre-service teachers' knowledge and attitudes towards diagnosing experimentation skills within university teacher education in the subjects of biology and chemistry. They might be also transferable to physics education, and could serve as a foundation for conceptualizing university teaching-learning settings concerning scientific reasoning as well as for future intervention studies within subject didactics training regarding diagnostic knowledge of experimentation and attitudes. The results indicate that knowledge about students' misunderstandings, difficulties and problems during experimentation must be imparted to pre-service science teachers during university education. It is essential to understand scientific procedures (knowing how) and epistemic constructs (knowing why) regarding scientific reasoning in experimental problem-solving. Imparting this procedural and epistemic knowledge to teacher education at an early phase of their

studies via appropriate adaptive support (scaffolding) could help pre-service science teachers develop process-oriented and results-oriented thinking patterns as well as diagnostic skills in this regard [162]. To put this in perspective, knowledge of students' difficulties and frequent sticking points can help teachers design lessons in a student-oriented way (cf., [174]). Studies on subject-specific teaching quality in biology with respect to diagnostic competencies have shown that PCK and PK are statistically significantly related to preservice teachers' diagnostic activities, and biology teachers' PCK is positively related to diagnostic accuracy [95]. Teachers are expected to provide students with learning opportunities that help them develop 21st-century skills, including core competencies in subject areas such as science. Zimmermann [19] concludes from her review of the literature on scientific reasoning that it is possible to teach both key features of science, i.e., the subjectspecific content of scientific disciplines (e.g., biology, physics) and skills in experimentation and evaluating evidence. Previous studies on scientific reasoning show that progress has been made in research how it can be done to help students become scientifically literate adults by applying their scientific reasoning skills (cf., [19]). Therefore, it is necessary that (pre-service) science teachers are taught concepts and theories important for experimental inquiry processes, i.e., the key decisions that must be made based on concrete examples in open-ended experimentation process in order to conduct a successful experiment. In addition to knowledge about individual student learning characteristics that may be relevant to the learning process of scientific reasoning [175], procedural knowledge of scientific reasoning enables teachers to diagnose students' difficulties and misconceptions in the experimentation process. Further research using the instruments employed in this study on a larger sample of pre-service science teachers would contribute to the development of models of diagnostic competence acquisition (e.g., [176,177]) as well as professional vision (cf., [178]) for experimental problem-solving in competence-oriented science instruction and identify similarities and differences across subjects.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/educsci11100629/s1, Task-Sheet S1: Tasks to analyse a classroom setting for experimentation— Difficulties and Approach of students.

**Author Contributions:** Conceptualization, D.H.-R. and M.M; formal analysis, D.H.-R. and M.M.; investigation, D.H.-R., M.M., D.H.; writing—original draft D.H.-R.; writing—review and editing, D.H.-R., M.M., D.H., K.H.; project administration, D.H.-R., M.M.; funding acquisition, K.H., D.H.-R., M.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research), at the Technische Universität Braunschweig in the project Diagonal-NaWi, grant number: 01JA1909 and at the Universität Kassel in the project PRONET<sup>2</sup> , grant number: 01JA1805. Both projects are part of the "Qualitätsoffensive Lehrerbildung", a joint initiative of the Federal Government and the Länder which aims to improve the quality of teacher training. The authors are responsible for the content of this publication. Funding of APC: The authors acknowledge support by the Open Access Publication Funds of Technische Universität Braunschweig.

**Institutional Review Board Statement:** All participants were students at two German universities. They took part voluntarily and signed an informed consent form. Pseudonymization of participants was guaranteed during the study and the implementation took place online-based in a stress-free environment at home. Due to all these measures in the implementation of the study, an audit by an ethics committee was waived.

**Informed Consent Statement:** Written informed consent was obtained from all participants involved in the study.

**Data Availability Statement:** Information and queries on the data used can be obtained from the authors of this article.

**Acknowledgments:** We would like to thank Femke Sander for her support in rating as well all pre-service teachers who participated in the study. We also thank Di Fuccia (University of Kassel) for his support in recruiting pre-service teachers with chemistry as a subject. We thank the academic editors, and two anonymous reviewers whose recommendations substantially improved the quality of the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A**

**Table A1.** Item parameters for the scale relevance of diagnostics in the teacher training program (*n* = 50).


## **Appendix B**

**Table A2.** Item analysis on domain-specific self-efficacy expectations with a pilot sample of *n* = 98.


Annotation: Principal component analysis with Varimax rotation; \* adapted from [142].

#### **References**

