Topic Menu
► Topic MenuTopic Editors
Psychometric Methods: Theory and Practice
Topic Information
Dear Colleagues,
Measurement and quantification are ubiquitous in modern society. The historical foundation of psychometrics arose from the need to measure human abilities through suitable tests. This discipline then underwent rapid conceptual growth due to the incorporation of advanced mathematical and statistical methods. Today, psychometrics not only covers virtually all statistical methods but also incorporates advanced techniques from machine learning and data mining that are useful for the behavioral and social sciences, including but not limited to the handling of missing data, the combination of multiple-source information with measured data, measurement obtained from special experiments, visualization of statistical outcomes, measurement that discloses underlying problem-solving strategies, and so on. Psychometric methods now have a wide range of applicability in various disciplines, such as education, psychology, social sciences, behavioral genetics, neuropsychology, clinical psychology, medicine, and even visual arts and music, to name a few.
The dramatic development of psychometric methods and rigorous incorporation of psychometrics, data science, and even artificial intelligence techniques in interdisciplinary fields have aroused significant attention and led to pressing discussions about the future of measurement.
The aim of this Special Topic is to gather studies on the latest development of psychometric methods covering a broad range of methods, from traditional statistical methods to advanced data-driven approaches, and to highlight discussions about different approaches (e.g., theory-driven vs. data-driven) to address challenges in psychometric theory and practice.
This Special Topic consists of two subtopics: (1) theory-driven psychometric methods that exhibit the advancement of psychometric and statistical modeling in measurement to contribute to the development of psychological and hypothetical theories; and (2) data-driven computational methods that leverage new data sources and machine learning/data mining/artificial intelligence techniques to address new psychometric challenges.
In this issue, we seek original empirical or methodological studies, thematic/conceptual review articles, and discussion and comment papers highlighting pressing topics related to psychometrics.
Interested authors should submit a letter of intent including (1) a working title for the manuscript, (2) names, affiliations, and contact information for all authors, and (3) an abstract of no more than 500 words detailing the content of the proposed manuscript to the topic editors.
There is a two-stage submission process. Initially, interested authors are requested to submit only abstracts of their proposed papers. Authors of the selected abstracts will then be invited to submit full papers. Please note that the invitation to submit does not guarantee acceptance/publication in the Special Topic. Invited manuscripts will be subject to the usual review standards of the participating journals, including a rigorous peer review process.
Dr. Qiwei He
Dr. Yunxiao Chen
Prof. Dr. Carolyn Jane Anderson
Topic Editors
Participating Journals
Journal Name | Impact Factor | CiteScore | Launched Year | First Decision (median) | APC |
---|---|---|---|---|---|
Behavioral Sciences
|
2.5 | 2.6 | 2011 | 27 Days | CHF 2200 |
Education Sciences
|
2.5 | 4.8 | 2011 | 26.8 Days | CHF 1800 |
Journal of Intelligence
|
2.8 | 2.8 | 2013 | 36.5 Days | CHF 2600 |
Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.
MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:
- Immediately share your ideas ahead of publication and establish your research priority;
- Protect your idea from being stolen with this time-stamped preprint article;
- Enhance the exposure and impact of your research;
- Receive feedback from your peers in advance;
- Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.
Published Papers (10 papers)
Planned Papers
The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.
Title: Psychometric Modeling to Identify Examinee Strategy Differences Over the Course of Testing
Authors: Susan Embretson1; Clifford E. Hauenstein2
Affiliation: 1Georgia Institute of Technology; 2Johns Hopkins University
Abstract: Aptitude test scores are typically interpreted similarly for examinees with the same overall score. However, research has found evidence of strategy differences between examinees, as well as in examinees’ application of appropriate procedures over the course of testing. Research has shown that strategy differences can impact the correlates of test scores. Hence, the relevancy of test interpretations for equivalent scores can be questionable. The purpose of this study is to present several item response theory (IRT) models that are relevant to identifying examinee differences in strategies and understanding of test-taking procedures. First, mixture item response theory models identify latent clusters of examinees with different patterns of item responses. Early mixture IRT models (e.g., Rost & van Davier, 1995; Mislevy & Wilson, 1996) identify latent classes differing in patterns of item difficulty. More recently, item response time, in conjunction with item accuracy, are combined in joint IRT models to identify latent clusters of examinees with response patterns. Although mixture IRT models have long been available, they are not routinely applied. Second, more recent IRT-based models can also identify strategy shifts over the course of testing (e.g., de Boeck & Jeon, 2019; Hauenstein & Embretson, 2022; Molenaar & de Boeck, 2018). That is, within-person differences in item specific strategies are identified. In this study, relevant IRT models will be illustrated on test measuring various aspects of intelligence. Relevant tests to be used include items on non-verbal reasoning, spatial ability and mathematical problem solving.
Title: Investigating Pre-knowledge and Speed Effects in an IRTree Modeling Framework
Authors: Justin L. Kern; Hahyeong Kim
Affiliation: University of Illinois at Urbana-Champaign
Abstract: Pre-knowledge in testing refers to the situation in which examinees have gained access to exam questions or answers prior to taking an exam. The items the examinees have been exposed to in this way are called compromised items. The exposure of examinees to compromised items can result in an artificial boost in exam scores, jeopardizing test validity and reliability, test security, and test fairness. Furthermore, it has been argued that pre-knowledge may result in quicker responses. A better understanding of the effects of pre-knowledge can help test-creators and psychometricians overcome the problems pre-knowledge can cause. There has been a growing literature in psychometrics focusing on pre-knowledge. This literature has primarily been focused on the detection of person pre-knowledge. However, the majority of this work has used data where it is unknown whether a person has had prior exposure to items. This research aims to explore the effects of pre-knowledge with experimentally obtained data using the Revised Purdue Spatial Visualization Test (PSVT:R). To collect these data, we carried out an online experiment manipulating pre-knowledge levels amongst groups of participants. This was done by exposing a varying number of compromised items to participants in a practice session prior to test administration. Recently, there has also been a growing modeling paradigm using tree-based item response theory models, called IRTree models, to embed the cognitive theories into a model for responding to items on tests. One such form examined the role of speed on intelligence tests, positing differentiated fast and slow test-taking processes (DiTrapani et al., 2016). To investigate this, they proposed using a two-level IRTree model with the first level controlled by speed (i.e., is the item answered quickly or slowly?) and the second level controlled by an intelligence trait. This approach allows for separate parameters at the second level depending upon whether the responses were fast or slow; these can be separate item parameters, person parameters, or both. Building on this literature, we are interested in determining whether and how item pre-knowledge impacts item properties. In this approach, the effects to be studied include 1) whether pre-knowledge impacts the first-level IRTree parameters, affecting response time; 2) whether pre-knowledge impacts the second-level IRTree parameters, affecting response accuracy; and 3) whether the first-level response (i.e., fast or slow) impacts the second-level IRTree parameters. In all cases, an interesting sub-question to be asked is whether any of these effects are constant across items. Estimation of the models will be done using the mirt package in R. To determine efficacy of the IRTree modeling approach to answering these questions, a simulation study will be run under various conditions. Factors to be included are sample size, effect size, and model. The outcomes will include empirical Type I error and power rates. The approach will then be applied to the collected pre-knowledge data.
Title: Bayesian Monte Carlo Simulation Studies in Psychometrics: Practice and Implications
Authors: Allison J. Ames; Brian C. Leventhal; Nnamdi C. Ezike; Kathryn S. Thompson
Affiliation: Amazon
Abstract: Data simulation and Monte Carlo simulation studies (MCSS) are important skills for researchers and practitioners of educational and psychological measurement. Harwell et al. (1996) and Feinberg and Rubright (2016) outline an eight-step process for MCSS: 1. Specifying the research question(s), 2. Defining and justifying conditions, 3. Specifying the experimental design and outcome(s) of interest, 4. Simulating data under the specified conditions, 5. Estimating parameters, 6. Comparing true and estimated parameters, 7. Replicating the procedure a specified number of times, and 8. Analyzing results based on the design and research questions There are a few didactic resources for psychometric MCSS (e.g., Leventhal & Ames, 2020) and software demonstrations. For example, Ames et al. (2020) demonstrate how to operationalize the eight steps for IRT using SAS software and Feinberg and Rubright (2016) demonstrate similar concepts in R. Despite these resources, there is not a current accounting of MCSS practice for psychometrics. For example, there are no resources that describe the typical number of replications for MCSS (step 7), and whether this varies by outcome of interest (step 3) or number of conditions (step 2). Further, there are no resources for describing how Bayesian MCSS differ from frequentist MCSS. To understand the current practice of MCSS and provide a resource for researchers using MCSS, we reviewed six journals focusing on educational and psychological measurement from 2015-2019. This review examined a total of 1004 journal articles. Across all published manuscripts in those six journals, 55.8% contained a MCSS (n=560), of which 18.8% contained Bayesian simulations (n=105). Full results of the review will be presented in the manuscript. Because there is little guidance for Bayesian MCSS, the practice of Bayesian MCSS often utilizes frequentist techniques. This fails, in our opinion, to leverage the benefits of Bayesian methodology. We examined the outcomes of interest in frequentist and Bayesian MCSS. One trend that emerged from our review is the use of Bayesian posterior point estimates alone, disregarding other aspects of the posterior distribution. Specifically, while 58.72% examined some form of bias (e.g., absolute, relative), relying upon a posterior point estimate, only 10.09% examined coverage rates, defined as the proportion of times the true (generating) value was covered by a specified posterior interval. To address the gap in information specific to Bayesian MCSS, this study focuses on current practice and Bayesian-specific decisions within the MCSS steps. Related to current practice, we ask the following: 1) What are the current practices in psychometric Bayesian MCSS across seven journals from during a five-year period? 2) How are the philosophical differences between the practice of frequentist and Bayesian operationalized in MCSS? 3) What overlap exists between the practice of MCSS in the Bayesian and frequentist frameworks? Regarding Bayesian decisions in MCSS, we ask: 4) What are the implications of differing decisions across the eight steps on common MCSS types (e.g., parameter recovery)?
Title: Using keystroke log data to detect non-genuine behaviors in writing assessment: A subgroup analysis
Authors: Yang Jiang; Mo Zhang; Jiangang Hao; Paul Deane
Affiliation: Educational Testing Service
Abstract: In this paper, we will explore the use of keystroke logs – recording of every keypress – in detecting non-genuine writing behaviors in writing assessment, with a particular focus on fairness issues across different demographic subgroups. When writing assessment are delivered online and remotely, meaning the tests can be taken anywhere outside of a well-proctored and monitored testing center, test security related threats arise accordingly. While writing assessments usually require candidates to produce original text in response to a prompt, there are many possible ways to cheat especially in at-home testing. For example, the candidates may hire an imposter to write responses for them; the candidates may memorize some concealed script or general shell-text and simply apply them in whatever prompt they receive; the candidates may have copied text directly from other sources either entirely or partially; etc. Therefore, predicting non-genuine writing behaviors/texts is of great interest to test developers and administrators. Deane et al. (2022) study reported that, by using keystroke log patterns, various machine learning prediction models produced an overall prediction accuracy between .85 and .90 and ROC curve indicated around 80% of true positive and roughly 10% false negative rates. In the paper, we plan to apply similar machine learning methods in predicting non-genuine writing but, in addition to prediction accuracy, we will focus more on the subgroup invariance. It is of important validity concern that non-genuine writing can be predicted equally well across different demographic groups (e.g., race, gender, country, etc.). We will use a large-scale operational data set for exploration.