Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
2 "Calibration"
Filter
Filter
Article category
Keywords
Publication year
Authors
Software report
Special article on the 20th anniversary of the journal
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration  
Hwanggyu Lim, Kyungseok Kang
J Educ Eval Health Prof. 2024;21:23.   Published online September 12, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.23
  • 2,258 View
  • 250 Download
AbstractAbstract PDFSupplementary Material
Computerized adaptive testing (CAT) has become a widely adopted test design for high-stakes licensing and certification exams, particularly in the health professions in the United States, due to its ability to tailor test difficulty in real time, reducing testing time while providing precise ability estimates. A key component of CAT is item response theory (IRT), which facilitates the dynamic selection of items based on examinees' ability levels during a test. Accurate estimation of item and ability parameters is essential for successful CAT implementation, necessitating convenient and reliable software to ensure precise parameter estimation. This paper introduces the irtQ R package (http://CRAN.R-project.org/), which simplifies IRT-based analysis and item calibration under unidimensional IRT models. While it does not directly simulate CAT, it provides essential tools to support CAT development, including parameter estimation using marginal maximum likelihood estimation via the expectation-maximization algorithm, pretest item calibration through fixed item parameter calibration and fixed ability parameter calibration methods, and examinee ability estimation. The package also enables users to compute item and test characteristic curves and information functions necessary for evaluating the psychometric properties of a test. This paper illustrates the key features of the irtQ package through examples using simulated datasets, demonstrating its utility in IRT applications such as test data analysis and ability scoring. By providing a user-friendly environment for IRT analysis, irtQ significantly enhances the capacity for efficient adaptive testing research and operations. Finally, the paper highlights additional core functionalities of irtQ, emphasizing its broader applicability to the development and operation of IRT-based assessments.
Technical Report
Calibrating the Medical Council of Canada’s Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs  
Andre F. De Champlain, Andre-Philippe Boulais, Andrew Dallas
J Educ Eval Health Prof. 2016;13:6.   Published online January 20, 2016
DOI: https://doi.org/10.3352/jeehp.2016.13.6
  • 33,718 View
  • 199 Download
  • 4 Web of Science
  • 4 Crossref
AbstractAbstract PDF
Purpose
The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory.
Methods
Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4.
Results
The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%).
Conclusion
Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.

Citations

Citations to this article as recorded by  
  • Plus ça change, plus c’est pareil: Making a continued case for the use of MCQs in medical education
    Debra Pugh, André De Champlain, Claire Touchie
    Medical Teacher.2019; 41(5): 569.     CrossRef
  • Identifying the Essential Portions of the Skill Acquisition Process Using Item Response Theory
    Saseem Poudel, Yusuke Watanabe, Yo Kurashima, Yoichi M. Ito, Yoshihiro Murakami, Kimitaka Tanaka, Hiroshi Kawase, Toshiaki Shichinohe, Satoshi Hirano
    Journal of Surgical Education.2019; 76(4): 1101.     CrossRef
  • FUZZY CLASSIFICATION OF DICHOTOMOUS TEST ITEMS AND SOCIAL INDICATORS DIFFERENTIATION PROPERTY
    Aleksandras Krylovas, Natalja Kosareva, Julija Karaliūnaitė
    Technological and Economic Development of Economy.2018; 24(4): 1755.     CrossRef
  • Analysis of the suitability of the Korean Federation of Science and Technology Societies journal evaluation tool
    Geum‐Hee Jeong, Sun Huh
    Learned Publishing.2016; 29(3): 193.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP