Software report
Special article on the 20th anniversary of the journal
-
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration
-
Hwanggyu Lim
, Kyungseok Kang
-
J Educ Eval Health Prof. 2024;21:23. Published online September 12, 2024
-
DOI: https://doi.org/10.3352/jeehp.2024.21.23
-
-
Abstract
PDF
Supplementary Material
- Computerized adaptive testing (CAT) has become a widely adopted test design for high-stakes licensing and certification exams, particularly in the health professions in the United States, due to its ability to tailor test difficulty in real time, reducing testing time while providing precise ability estimates. A key component of CAT is item response theory (IRT), which facilitates the dynamic selection of items based on examinees' ability levels during a test. Accurate estimation of item and ability parameters is essential for successful CAT implementation, necessitating convenient and reliable software to ensure precise parameter estimation. This paper introduces the irtQ R package (http://CRAN.R-project.org/), which simplifies IRT-based analysis and item calibration under unidimensional IRT models. While it does not directly simulate CAT, it provides essential tools to support CAT development, including parameter estimation using marginal maximum likelihood estimation via the expectation-maximization algorithm, pretest item calibration through fixed item parameter calibration and fixed ability parameter calibration methods, and examinee ability estimation. The package also enables users to compute item and test characteristic curves and information functions necessary for evaluating the psychometric properties of a test. This paper illustrates the key features of the irtQ package through examples using simulated datasets, demonstrating its utility in IRT applications such as test data analysis and ability scoring. By providing a user-friendly environment for IRT analysis, irtQ significantly enhances the capacity for efficient adaptive testing research and operations. Finally, the paper highlights additional core functionalities of irtQ, emphasizing its broader applicability to the development and operation of IRT-based assessments.
Technical Report
-
Calibrating the Medical Council of Canada’s Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs
-
Andre F. De Champlain
, Andre-Philippe Boulais
, Andrew Dallas
-
J Educ Eval Health Prof. 2016;13:6. Published online January 20, 2016
-
DOI: https://doi.org/10.3352/jeehp.2016.13.6
-
-
33,718
View
-
199
Download
-
4
Web of Science
-
4
Crossref
-
Abstract
PDF
- Purpose
The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory.
Methods
Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4.
Results
The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%).
Conclusion
Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
-
Citations
Citations to this article as recorded by

- Plus ça change, plus c’est pareil: Making a continued case for the use of MCQs in medical education
Debra Pugh, André De Champlain, Claire Touchie
Medical Teacher.2019; 41(5): 569. CrossRef - Identifying the Essential Portions of the Skill Acquisition Process Using Item Response Theory
Saseem Poudel, Yusuke Watanabe, Yo Kurashima, Yoichi M. Ito, Yoshihiro Murakami, Kimitaka Tanaka, Hiroshi Kawase, Toshiaki Shichinohe, Satoshi Hirano
Journal of Surgical Education.2019; 76(4): 1101. CrossRef - FUZZY CLASSIFICATION OF DICHOTOMOUS TEST ITEMS AND SOCIAL INDICATORS DIFFERENTIATION PROPERTY
Aleksandras Krylovas, Natalja Kosareva, Julija Karaliūnaitė
Technological and Economic Development of Economy.2018; 24(4): 1755. CrossRef - Analysis of the suitability of the Korean Federation of Science and Technology Societies journal evaluation tool
Geum‐Hee Jeong, Sun Huh
Learned Publishing.2016; 29(3): 193. CrossRef