The efficacy of MLMs was determined by comparing student groups performance on a validated assessment and exam scores. Student affect was measured via surveys. Specifically, learning gains on content in electricity and magnetism were measured via pre/post-test scores on the CSEM [
28]. In-class summative assessments in the form of exams were held constant across both groups, and each group’s performance was compared. Finally, student course evaluations were administered at the end of each course for both groups and compared.
4.1. Student Learning Gains
The CSEM is a 32-item multiple choice assessment designed to test student understanding of electricity and magnetism concepts covered in the average introductory physics course. For all groups, we administered the CSEM both before instruction and after instruction. In particular, students completed the CSEM initially during the first class meeting of the semester, and then again during the last class meeting of the semester. No course credit was assigned for completing the assessment.
Table 3 shows the pre- and post-test scores on the CSEM for the two different student groups across four semesters of instruction. The non-MLM group is composed of students enrolled in the fall 2010 (Fa10) and spring 2011 (Sp11) semesters. The MLM group is composed of students enrolled in the summer 2011 (Su11) and fall 2011 (Fa11) semesters. There is no statistical difference between any of the groups for the pre-test, suggesting that there is no measurable selection bias with respect to initial content knowledge as measured by the CSEM.
Each student’s pre-test score
and post-test score
were used along with the maximum possible score
to calculate individual normalized learning gain
g. Normalized gain is the ratio of the actual assessment score gain to the maximum possible gain, as follows [
29]:
To compare the results across groups, we averaged the individual normalized gains for members of the group. We only present data for students that completed both the pre- and post-test. Since the assessment was not a required and graded component of the course, we not surprisingly had fewer students complete both offerings compared to the numbers enrolled in the courses.
We have chosen to report normalized gain, since this metric is commonly used within the physics education research community. With respect to the literature on the CSEM, normalized gain, as opposed to effect size, has historically been used as the figure of merit when evaluating treatments, and we continue this tradition for comparison across studies. It should be noted that there are significant criticisms concerning the limitations of normalized gain as a metric. Specifically, Colletta and Phillips and Moore and Rubbo question the independence of gain to pre-score and scientific reasoning ability, respectively [
16,
30]. Miller et al. further highlight the inability of normalized gain to capture information about potential conceptual losses, since the measurement “implicitly assumes that losses are zero” [
31]. These criticisms should be considered when evaluating a comparison of normalized gain.
Table 3 shows the average normalized gain (
ḡ) on the CSEM for all four semesters of instruction. The non-MLM group was made up of students from the Fa10 and Sp11 sections of PHYS II, while the MLM group was composed of students in the Su11 and Fa11 sections. The average normalized gain on the CSEM for both groups is reported in
Table 4. Interestingly, the non-MLM group had an average normalized gain of
%, which is lower than the national average gain observed for courses not utilizing active engagement pedagogies (23%) [
28]. The MLM group had a significantly higher average normalized gain of
%. A two-sample location
t-test was used to determine whether or not the means of the two populations were equal. The MLM group had a significantly greater CSEM average learning gain compared to the non-MLM group valid at the
p < 1% level. We found no statistical difference in average learning gains for the Fa10 and Sp11 groups, and the Su11 and Fa11 groups.
Initially, the below-average performance of the non-MLM group was surprising. In these two semesters, we were utilizing TIP and JiTT pedagogies, which are both research-verified methods that have been shown to produce normalized gains on the CSEM considerably above “traditional” pedagogy. As discussed, successful implementation of new pedagogies requires caution, specifically because research shows that different populations can respond very differently to reformed instruction [
13,
16,
17]. However, we should caution the reader to first consider the limitations of normalized gain, specifically with respect to the relatively small population sizes within the groups under study. In addition to criticisms already mentioned, normalized gain also does not account for the size of the class or intra-class variations.
In education research outside of domain-specific physics education, it is more common to report Cohen’s
d effect size, which normalizes the average raw gain for a population by the pooled standard deviation.
Table 5 shows the effect size for both groups. Both the non-MLM and MLM groups demonstrate very large (>1.2) effect sizes, with the MLM treatment group demonstrating an effect size close to what Sawilowsky describes as “huge” [
32]. A large effect size on the CSEM is not surprising for either group, since both the MLM and non-MLM groups used research-verified pedagogies. It should be pointed out that we have implemented an adaptation of TIP in both situations that strays in significant ways from the intentions of the curriculum designers. Therefore, the data should not be interpreted as condemnation or success of any particular pedagogy for any particular group. Our main point is certainly not that some specific pedagogy or collection of pedagogies fails or succeeds to impact learning beyond the national average. We report normalized gain and effect size here as a comparison metric between the non-MLM and MLM treatment groups to determine efficacy of pre-class MLM-based instruction.
4.2. Course Examinations
To further compare cognitive achievement across the two groups, we compared scores on common exams. During all four semesters under study, students completed five closed-book exams, with four two-hour exams on content within specific learning units, and one two-hour cumulative exam after the end of the semester. All exams were composed of five free-response questions. Two questions were analytical problems similar to assigned homework problems, and three questions were free-response concept questions either taken directly or slightly modified from the sample exam questions in the Instructor’s Guide to
Tutorials in Introductory Physics [
15].
One exam focused on concepts in electrostatics, and another exam focused on topics in electromagnetism and DC circuits, consistent with the content described in
Table 1. These exams were the summative assessment of learning for these units, and for all four semesters studied (Fa10, Sp11, Su11, and Fa11), these two exams were the last two in-semester exams assigned. The same course instructor graded all of the exams. The exams for the Fa10 and Su11 semesters were identical. Likewise, the exams for the Sp11 and Fa11 semesters were identical, though different than the other two semesters. Reusing exams during non-consecutive semesters helped prevent old exams from being distributed while allowing for a common assessment across both groups.
Figure 3 shows the average exam grade for the MLM and non-MLM groups for exams on electrostatics and electromagnetism. For the electrostatics exam, the non-MLM group had an average of
while the MLM group had an average of
. For the electromagnetism exam, the non-MLM group had an average of
while the MLM group had an average of
. A one-tail un-paired
t-test was used to determine whether or not the means of the two populations were equal. The MLM group had a significantly greater exam average for both exams compared to the non-MLM group valid at the
p < 1% level. We found no statistical difference between exam grades when comparing between the two semesters of MLM and non-MLM instruction, which suggests the use of two different exams had little effect with respect to the average exam grade. There were also no measurable between-group differences with respect to content knowledge as measured by the CSEM (as discussed in the previous section).
On both exams, the MLM group scored between 8 and 9% higher than the non-MLM group. Both groups performed better on the electromagnetism exam than the electrostatics exam, which is consistent with observations from prior semesters. This is not well understood, but could be due to students’ poor performance with Gauss’ Law compared to surprisingly consistent success with applications of the right-hand-rule in magnetism. This is also evident in the CSEM scores, where students score approximately 5–8% higher on the magnetism questions compared to the electrostatic questions post-instruction (not shown).
The overall increase in exam scores with MLMs is consistent with the similarly observed increase in CSEM normalized learning gains and effect size. A similar increase in exam scores as a result of MLM use was observed for mechanics and electromagnetism content in other studies [
6,
9]. However, this is the first study of MLM efficacy showing a consistent link between increasing gains in content knowledge as measured by a nationally validated instrument and performance on in-class examinations.
4.3. Student Perceptions of the Instruction
A survey on student attitudes towards the instructor was administered at the end of the semester for all courses discussed in this study. The survey consisted of seven questions with all questions answered via a 7-point Likert scale ranging from “Strongly Disagree” to “Strongly Agree”. The survey used was administered during the last week of all classes and was part of the standardized faculty evaluations used in all courses at the university.
For this study, we were interested in the students’ attitudes with respect to the following areas: (1) instructor clarity; (2) class atmosphere; (3) effective use of class time, and (4) instructor effectiveness.
Table 6 shows the four questions on the survey used to elicit a response on these four areas. The other three questions asked students to assess the instructor on preparation for class, knowledge of subject matter, and enjoyment of teaching. These areas are not relevant to this study, and are therefore not discussed. It should be mentioned that the survey was designed as a student evaluation of the instructor, and not an assessment of the course. There were also no specific questions concerning the MLMs themselves. Student attitudes concerning specific course components, including MLMs, have been described elsewhere in the literature [
9].
Responses to survey questions were scored on a 7-point scale and the average for each class was normalized to a percentage scale for comparison across groups. For example, a response of “Strongly Disagree” would be scored as a 1, and “Strongly Agree” would be scored as a 7. An average score of 5 would be normalized to 71.4%.
Table 7 shows the average normalized scores on end-of-semester evaluations for all categories for both the non-MLM and MLM groups. With respect to instructor clarity, an average score of
was reported by the non-MLM group, and a score of
was reported by the MLM group. For class atmosphere, an average score of
was reported by the non-MLM group, and a score of
was reported by the MLM group. For effective use of class time, an average score of
was reported by the non-MLM group, and a score of
was reported by the MLM group. For instructor effectiveness, an average score of
was reported by the non-MLM group, and a score of
was reported by the MLM group.
A one-tail unpaired t-test indicates a significant difference at the p < 1% level between the MLM and non-MLM groups with respect to students’ attitudes in all areas except instructor effectiveness (). There was no significant difference in any area between the Fa10 and Sp11 non-MLM groups, or the Su11 and Fa11 MLM groups.
The largest improvement in student attitudes towards the instructor was in the area of the effective use of class time, where a 20% increase was observed. Also of interest is the large 11% increase in instructor clarity. In particular, improvements in student attitudes concerning the instructor are interesting considering all classes had the same instructor and there was no significant difference in face-to-face activities, with the exception being an instructor-led passive lecture session for the non-MLM group.