Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams

Caudill, Steven B.; Mixon, Franklin G.

doi:10.3390/stats6030046

Open AccessCommunication

Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams

by

Steven B. Caudill

¹ and

Franklin G. Mixon, Jr.

^2,*

¹

Department of Economics, Florida Atlantic University, Boca Raton, FL 33431, USA

²

Center for Economic Education, Columbus State University, Columbus, GA 31907, USA

^*

Author to whom correspondence should be addressed.

Stats 2023, 6(3), 734-739; https://doi.org/10.3390/stats6030046

Submission received: 22 May 2023 / Revised: 11 June 2023 / Accepted: 22 June 2023 / Published: 26 June 2023

Download Versions Notes

Abstract

:

The use of large lecture halls in business and economic education often dictates the use of multiple-choice exams to measure student learning. This study asserts that student performance on these types of exams can be viewed as the result of the process of elimination of incorrect answers, rather than the selection of the correct answer. More specifically, how students respond on a multiple-choice test can be broken down into the fractions of questions where no wrong answers can be eliminated (i.e., random guessing), one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. The results from an empirical model, representing a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated, we find, using student performance data from a final exam in principles of microeconomics consisting of 100 multiple choice questions, that the responses to all of the questions on the exam can be characterized by some form of guessing, with more than 26 percent of questions being completed using purely random guessing.

Keywords:

mixture of binomials; behavioral economics; economic education

1. Introduction

Course enrollment often dictates the format of examinations in business and economics courses. For large lecture sections, multiple choice exams are often preferred by instructors and students. This study asserts that student performance on these types of exams can be viewed as the result of the process of elimination of incorrect answers, rather than the selection of the correct answer. Viewed in this way, the elimination by a student of all of the incorrect answers to a particular exam question results in choosing the correct answer. However, if no wrong answers are eliminated, the response to a particular exam question is a fully uninformed guess, which has a 0.25 probability of being correct. Thus, the more wrong answers one eliminates, the higher the probability that the correct answer to a given exam question is selected by a student.

In this study, we assert that how students respond on a multiple-choice test can be broken down into the fractions of questions where no wrong answers can be eliminated (i.e., random guessing), one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. The results from an empirical model, representing a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated, we find, using performance data from a final exam in principles of microeconomics, that the responses to about 26 percent of the questions on the exam can be characterized as random guessing, while none of the questions on the exam is completed after eliminating all of the incorrect choices. Before delving further into our results, we first describe our mixture-model approach to performance on multiple choice exams.

2. Mining for Correct Answers on Multiple-Choice Exams

Given the widespread use of large lecture sections in introductory courses in economics, management and marketing, multiple-choice questions are the basis of a significant portion of assessment by college and university instructors [1,2,3]. The prevalence of multiple-choice testing has led to studies examining the impact of exam structure on student performance. One branch of the literature examines the possibility that the chronological ordering of exam questions (i.e., exam questions are presented in the same chronological order as the course content was delivered) has some bearing on exam performance in economics [4,5,6,7,8]. A second branch of the literature extends research in cognitive psychology to disfluency, which is defined as the subjective experience of difficulty associated with cognitive operations [9,10,11], and this was performed by testing whether or not font disfluency improves exam performance in economics principles [12].

This paper examines student performance on multiple-choice exams by focusing on what the educational measurement literature describes as a popular “test-wiseness” strategy, whereby students reach the correct answer by eliminating some distractors, depending on their partial knowledge of the test content [13,14,15,16]. In the absence of such partial knowledge, students tend to use blind guessing, which has a probability equal to one divided by number of alternatives in choosing the correct answer [13,17]. This paper assumes that economics principles students adopt this approach to multiple-choice exams, where each multiple-choice exam consists of

N

questions, each with

k

alternatives. The process of scoring well on the exam can be viewed as the elimination of incorrect answers, rather than the selection of the correct answer. For example, suppose that each multiple-choice question has four answers, that is,

k

= 4. Thus, the exam consists of 4

N

possible answers, of which

N

are correct, and 3

N

are incorrect. (In this simple case, we are assuming that there is no question for which “all of the above” or “none of the above” is the correct answer.) A perfect score on the exam involves the elimination of 3

N

, or, more generally, (

k

−1)

N

incorrect answers. Eliminating anything less than 3

N

incorrect answers involves some guessing, with the guesses being better, on average, when a larger number of incorrect answers can be eliminated for each choice. For any individual exam question, i,

w_{i}

is the number of incorrect choices eliminated out of k total choices, and

0 \leq w_{i} \leq (k - 1) .

That is, if each multiple-choice question has four (five) choices, the number of incorrect choices eliminated must fall between zero and three (four).

To illustrate, suppose there are four possible answers for each question. In this case, the elimination of three incorrect answers results in choosing the correct answer. However, if no wrong answers are eliminated, the response is a fully uninformed guess, which has a 0.25 probability of being correct. The more wrong answers one eliminates, the higher the probability that the correct answer is selected. If one wrong answer can be omitted with certainty, the probably of answering correctly rises to 0.33 or, more generally, to,

\frac{1}{(k - w_{i})}

(1)

On an exam with

k

= 4 choices, the probabilities of a correct answer,

C

, given the number of incorrect answers eliminated,

w_{i}

, are given by:

P_{e} = P (C| e) = \frac{1}{(k - w_{i})} w i t h w_{i} = 0 t o k - 1

(2)

Thus, on an exam with

k

= 4 choices for each question, the probabilities as a function of the number of wrong answers eliminated are, respectively,

P_{0} = \frac{1}{4}, P_{1} = \frac{1}{3}, P_{2} = \frac{1}{2}, a n d P_{3} = \frac{1}{1}

(3)

On any given exam for any student, there are multiple choice questions where the correct answer is known with certainty, other questions where random guesses are the response, and all cases in between.

In this framework, we have assumed that incorrect answers are equally plausible and not correlated or interconnected. That is, we do not consider the case where some incorrect answers are clearly implausible, as that would increase the probability of choosing the correct answer. Neither do we consider the situation where several questions refer to the same table or graph and an incorrect answer on the first question in the group increases the probability of an incorrect answer on subsequent questions in the group.

In this paper, we wish to investigate how students respond on a multiple-choice test, which can be broken down into the fraction of responses where no wrong answers are eliminated (i.e., random guessing), one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. In order to investigate this issue, we examine student responses on a final exam consisting of 100 questions (

N

= 100) with four choices each (

k

= 4). Thus, a perfect score involves the elimination of 300 incorrect answers.

3. Data, Empirical Strategy and Evidence, and Future Research

The data for our study come from student performance on a final exam in a principles of microeconomics course. These consist of 94 exam grades over each of 100 questions. The final exam consists of multiple-choice questions, with four answer choices offered for each question. Our empirical model is a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated. Thus, for each of the 100 questions on the exam, students can eliminate zero, one, two, or three incorrect answers. Eliminating zero incorrect answers is random guessing, and eliminating three incorrect questions results in a correct response. Between these two extremes, we have what we call “informed guessing”, where some incorrect answers have been eliminated.

Let

B (n, p)

represent a binomial indicating the number of successful choices,

p

, in

n

trials. Here, we have 100 trials or questions. The probabilities change with the number of incorrect choices eliminated. Thus, the probability function for each observation is based on a mixture of binomials given by:

S = λ_{1} B (100, \frac{1}{4}) + λ_{2} B (100, \frac{1}{3}) + λ_{3} B (100, \frac{1}{2}) + λ_{4} B (100, \frac{1}{1})

(4)

and the resulting log-likelihood function is given by:

l n L (λ_{1}, λ_{2}, λ_{3}, λ_{4}) = \sum l n S_{i}

(5)

Given that the mixing weights must sum to one, one of the mixing weights is not identified. We set

λ_{4} = 1 - (λ_{1} + λ_{2} + λ_{3})

. Formulated this way, the mixing weights indicate the fraction of exam responses consistent with eliminating zero, one, two, or three incorrect answers. Thus,

λ_{1}

is the fraction of responses associated with random guessing, and the probability of a correct answers is 0.25,

λ_{2}

is the fraction of responses associated with the elimination of one incorrect answer,

λ_{3}

is the fraction of responses associated with the elimination of two incorrect answers, and

λ_{4}

is the fraction associated with the elimination of three incorrect answers (that is, the correct answer is chosen).

As the success probabilities are known, our estimated model parameters are the mixing weights,

λ_{i}

,

i

= 1 to 4, although, due to the adding up restriction, only three are identified. Our fitted model is thus:

\hat{S} = 0.264 B (100, \frac{1}{4}) + 0.124 B (100, \frac{1}{3}) + 0.612 B (100, \frac{1}{2}) + 0.000 B (100, \frac{1}{1})

(6)

The maximized value of the likelihood function is −791.390. This value necessarily must fall between the likelihood value associated with pure guessing on every question by every student (i.e., −2166.71) and the likelihood value associated with correct answers provided by every student to every question (i.e., 0.000). The value of the likelihood function associated with pure guessing (i.e., −2166.71) is associated with essentially estimating the model in (4) above, which is subject to the constraint that

λ_{2} = λ_{3} = λ_{4} = 0

. Thus, we use a likelihood ratio test to evaluate the null hypothesis of pure guessing against the alternative hypothesis that students did not guess all the time. The test statistic is 2750.64 (i.e., 2(−791.390 + 2166.71)), with three degrees of freedom. This leads to the rejection of the null hypothesis of pure guessing at any of the usual levels of significance.

These estimation results are revealing. The first mixing weight indicates that the responses to more than 26 percent of the questions on the exam can be characterized as random guessing. The fourth mixing weight is 0.000, indicating that no questions on the exam were answered correctly 100 percent of the time, which is really not a great surprise. The two middle cases of informed guessing differ considerably. There seem to be more than 12 percent of the questions answered with one incorrect answer eliminated, while more than 61 percent of the questions are answered as if two incorrect answers have been eliminated.

Recall that scoring 100 percent on the exam requires the elimination of (

k

− 1)

N

, or, in this case, 300 incorrect answers, and our results indicate that the estimated number of incorrect answers eliminated is:

100 [(0.264 \times 0) + (0.124 \times 1) + (0.612 \times 2)] = 134.8

(7)

That is, this results in a metric of 134.8 out of 300. This is obviously less than half of the incorrect answers that need to be eliminated in order to be successful on the exam. As a result, the exam average is unsurprisingly low. The responses to over 26 percent of the questions were pure guesses and, at best, students were able to whittle the choices down to two for most questions.

There are a number of future research opportunities related to pedagogical choices that would extend or broaden the approach taken in this study. In terms of economics, prior studies have examined whether or not classroom experiments improve student learning. The seminal study in this genre [18] provides mixed results. Perhaps an extension of [18] and subsequent studies [19,20,21] that focuses on the in-class use of experimental economics to reduce student guessing on exams would shed additional light on the relationship between pedagogical choices and student performance. Next, prior research has focused on both the relationship between instructor attractiveness and pedagogical choices [22] and instructor appearance and academic performance [23,24]. Rejoining this line of research, an examination of the relationships between instructor appearance and academic performance that concentrates on a reduction in guessing by students would perhaps provide a new angle on the study of the beauty premium in academia. There are other avenues for future research to explore efforts to improve student performance on multiple-choice exams. Some of these are related to prior research that is discussed above in this study. For example, a re-examination of the possibility that the chronological ordering of exam questions matters that focuses on student guessing may provide a useful addition to the prior literature on the ordering of test questions [7,8]. Finally, another possibility is additional exploration related to the behavioral economics implications of disfluency in exam preparation [25,26]. In this regard, one might revisit the [11] study of student performance by employing separate exam review handouts formatted in either an easy-to-read or a difficult-to-read font in order to investigate how font disfluency relates to the mixing weights discussed above in this study.

4. Concluding Comments

The use of large lecture halls for instruction in business and economics courses often results in multiple choice-based assessments of learning. This study asserts that student performance on these types of assessment instruments can be viewed as the result of the process of elimination of incorrect answers, rather than the selection of the correct answer. Viewed in this way, how students respond on a multiple-choice test can be broken down into the fractions of responses where no wrong answers can be eliminated, one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. The first three of these categories represent some form of guessing by students. The first indicates “random guessing,” while the second and third constitute varying degrees of “informed guessing”.

Using data on student performance on a final exam in principles of microeconomics, the results from an empirical model represent a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated, indicating that some form of guessing accounts for performance on all exam questions, with performance on about 74 percent of all exam questions depending upon some degree of informed guessing. In all, purely random guessing accounts for student performance on about 26 percent of all exam questions. Given that scoring 100 percent on the exam requires the elimination of 300 incorrect answers using our empirical approach, our results indicate that the estimated number of incorrect answers eliminated is only 134.8. This is obviously less than half of the needed elimination of incorrect choices. Thus, encounters with relatively low performance statistics in assessments of learning in principles of economics are not surprising.

Useful and different information about exam performance is available at a glance from the mixing weights in our procedure. Instructors would likely prefer that all students mark the correct answer for every question on an exam. As this is likely not the case, instructors would prefer less guessing and the elimination of more incorrect answers on exams. These behaviors can easily be gleaned from the mixing weights in our model. That is, instructors would prefer to observe the mixing weights increasing, such that

λ_{1} <

λ_{2} <

λ_{3}

< λ_{4}

. This pattern indicates a trend away from guessing and in the direction of the elimination of more incorrect answers, thus leading to the provision of more correct answers. On the other hand, mixing weights that are skewed in the other direction, such that

λ_{1} > λ_{2} > λ_{3} > λ_{4}

, indicates an abundance of guessing and the elimination of few incorrect answers. Of course, this is not a desirable educational outcome.

Author Contributions

Conceptualization, S.B.C. and F.G.M.J.; methodology, S.B.C.; data curation, S.B.C.; writing-original draft preparation, S.B.C. and F.G.M.J.; writing-review and editing, S.B.C. and F.G.M.J.; project administration, F.G.M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the authors upon reasonable request.

Acknowledgments

The authors thank two anonymous reviewers for helpful comments on a prior version. The usual caveat applies.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chidomere, R.C. Test item arrangement and student performance in principles of marketing examination: A replication study. J. Mark. Educ. 1989, 11, 36–40. [Google Scholar] [CrossRef]
Buckles, S.; Siegfried, J.J. Using multiple-choice questions to evaluate in-depth learning of economics. J. Econ. Educ. 2006, 36, 48–57. [Google Scholar] [CrossRef]
Moncada, S.M.; Moncada, T.P. Assessing student learning with conventional multiple-choice exams: Design and implementation considerations for business faculty. Int. J. Educ. Res. 2010, 5, 15–30. [Google Scholar]
Taub, A.T.; Bell, E.B. A bias in scores on multiple-form exams. J. Econ. Educ. 1975, 7, 58–59. [Google Scholar] [CrossRef]
Gohmann, S.F.; Spector, L.C. Test scrambling and student performance. J. Econ. Educ. 1989, 20, 235–238. [Google Scholar] [CrossRef]
Bresnock, A.E.; Graves, P.E.; White, N. Multiple-choice testing: Question and response position. J. Econ. Educ. 1989, 20, 239–245. [Google Scholar] [CrossRef]
Caudill, S.B.; Gropper, D.M. Test structure, human capital, and student performance on economics exams. J. Econ. Educ. 1991, 22, 303–306. [Google Scholar] [CrossRef]
Kagundu, P.; Ross, G. The impact of question order on multiple choice exams on student performance in an unconventional introductory economics course. J. Econ. Educ. 2015, 15, 19–36. [Google Scholar]
Alter, A.L.; Oppenheimer, D.M.; Epley, N.; Eyre, R. Overcoming intuition: Metacognitive difficulty activates analytic reasoning. J. Exp. Psychol. 2007, 136, 569–576. [Google Scholar] [CrossRef]
Alter, A.L.; Oppenheimer, D.M. Uniting the tribes of fluency to form a metacognitive nation. Personal. Soc. Psychol. Rev. 2009, 13, 219–235. [Google Scholar] [CrossRef]
Diemand-Yauman, C.; Oppenheimer, D.M.; Vaughan, E.B. Fortune favors the bold (and the italic): Effects of disfluency on educational outcomes. Cognition 2011, 118, 111–115. [Google Scholar] [CrossRef]
Mendez-Carbajo, D.; Mixon, F.G., Jr. Obstacles and building blocks: Font disfluency and performance on economics exams. Int. J. Appl. Behav. Econ. 2021, 10, 1–11. [Google Scholar] [CrossRef]
Alnasraween, M.A.; Alsmadi, M.S.; Al-Zboon, H.S.; Alkurshe, T.O. The level of universities students’ test wiseness in Jordan during distance learning in light of some variables. Educ. Res. Int. 2022, 12, 6381857. [Google Scholar] [CrossRef]
Rukthong, A. MC listening questions vs. integrated listening-to-summarize tasks: What listening abilities do they assess? System 2021, 97, 102439. [Google Scholar] [CrossRef]
Krimmel, H.T. Dear professor: Why do I ace essay exams but bomb multiple choice ones? J. Leg. Educ. 2014, 63, 431–446. [Google Scholar]
Butler, A.C.; Karpicke, J.D.; Roediger, H.L., III. Correcting a metacognitive error: Feedback increases retention of low-confidence correct responses. J. Exp. Psychol. Learn. Mem. Cogn. 2008, 34, 918–928. [Google Scholar] [CrossRef] [Green Version]
Parkes, J.; Zimmaro, D. Learning and Assessing with Multiple-Choice Questions in College Classrooms; Routledge: London, UK, 2016. [Google Scholar]
Dickie, M. Do classroom experiments increase learning in introductory microeconomics? J. Econ. Educ. 2006, 37, 267–288. [Google Scholar] [CrossRef]
Durham, Y.; McKinnon, T.; Schulman, C. Classroom experiments: Not just fun and games. Econ. Inq. 2007, 45, 162–178. [Google Scholar] [CrossRef]
Cartwright, E.; Stepanova, A. What do students learn from a classroom experiment: Not much, unless they write a report on it. J. Econ. Educ. 2012, 43, 48–57. [Google Scholar] [CrossRef]
Emerson, T.L.N.; English, L.K. Classroom experiments: Teaching specific topics or promoting the economic way of thinking? J. Econ. Educ. 2016, 47, 288–299. [Google Scholar] [CrossRef]
Mixon, F.G., Jr.; Smith, K. Instructor attractiveness and academic rigour: Examination of student evaluation data. Australas. J. Econ. Educ. 2013, 10, 1–13. [Google Scholar]
Craig, J.D.; Savage, S. Instructor attire and student performance: Evidence from an undergraduate industrial organization experiment. Int. Rev. Econ. Educ. 2014, 17, 55–65. [Google Scholar] [CrossRef]
Craig, J.D.; Savage, S. Does instructor appearance affect student learning of principles of economics? Australas. J. Econ. Educ. 2015, 12, 30–49. [Google Scholar]
Kahneman, D. Thinking, Fast and Slow; Farrar, Straus and Giroux: New York, NY, USA, 2011. [Google Scholar]
Harford, T. How Frustration can Make Us More Creative. TED-Talk, 2 February. 2016. Available online: https://www.ted.com/talks/tim_harford_how_frustration_can_make_us_more_creative/up-next (accessed on 10 June 2023).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Caudill, S.B.; Mixon, F.G., Jr. Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams. Stats 2023, 6, 734-739. https://doi.org/10.3390/stats6030046

AMA Style

Caudill SB, Mixon FG Jr. Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams. Stats. 2023; 6(3):734-739. https://doi.org/10.3390/stats6030046

Chicago/Turabian Style

Caudill, Steven B., and Franklin G. Mixon, Jr. 2023. "Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams" Stats 6, no. 3: 734-739. https://doi.org/10.3390/stats6030046

Article Menu

Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams

Abstract

1. Introduction

2. Mining for Correct Answers on Multiple-Choice Exams

3. Data, Empirical Strategy and Evidence, and Future Research

4. Concluding Comments

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI