1Research & Development, Medical Council of Canada, Ottawa, Ontario, Canada
2Educational Research Methodology Department, School of Education, University of North Carolina at Greensboro, Greensboro, North Carolina, USA
© 2016, National Health Personnel Licensing Examination Board of the Republic of Korea
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Model | Summary |
---|---|
1-PL Concurrent | 3,499 dichotomous items: 3,229 MCQ+270 CDM calibrated concurrently in BILOG-MG 3.0 with the 1-PL IRT model |
2-PL Concurrent | 3,499 dichotomous items: 3,229 MCQ+270 CDM calibrated concurrently in BILOG-MG 3.0 with the 2-PL IRT model |
Anchored 1-PL | 3,499 dichotomous items: 3,229 MCQ items initially calibrated using a 1-PL model in BILOG-MG 3.0; CDM questions calibrated using BILOG-MG 3.0 by anchoring to MCQ values. The thetas were then computed using a 75% (MCQ) to 25% (CDM) weighting scheme. |
Anchored 2-PL | 3,499 dichotomous items: 3,229 MCQ items initially calibrated using a 2-PL model in BILOG-MG 3.0; CDM questions calibrated using BILOG-MG 3.0 by anchoring to MCQ values. The abilities were then computed using a 75% (MCQ) to 25% (CDM) weighting scheme. |
Concurrent 2PL +GRM | 3,100 dichotomous MCQs calibrated with the 2-PL model in BILOG-MG 3.0 and 178 polytomous cases calibrated with the GRM model concurrently in PARSCALE 4.0 |
Anchored 2PL and GRM | 3,229 MCQs calibrated in BILOG-MG 3.0 with the 2-PL model: the CDM case parameters were then estimated by anchoring to the MCQs using the GRM in PARSCALE 4.0. The abilities were weighted 75% (MCQ) to 25% (CDM) weighting scheme. |
Anchored 2PL +GPCM | 3,229 MCQs calibrated in BILOG-MG 3.0 with the 2-PL model: the CDM case parameters were then estimated by anchoring to the MCQs using the GPCM in PARSCALE 4.0. The abilities were estimated using a 75% (MCQ) to 25% (CDM) weighting scheme. |
Calibration | Number (Proportion) of items for which an IRT model did not fit (α=0.01) |
||
---|---|---|---|
MCQs | CDMs | Total | |
1PL Concurrent | 327 (0.10) | 86 (0.32) | 413 (0.12) |
2-PL Concurrent | 74 (0.02) | 9 (0.03) | 83 (0.02) |
1PL Anchored | 424 (0.13) | 74 (0.27) | 498 (0.14) |
2-PL Anchored | 68 (0.02) | 15 (0.06) | 83 (0.02) |
Calibration | Number (Proportion) misfitting items (α = 0.01) |
||
---|---|---|---|
MCQs | CDMs | Total | |
GRM Concurrent | 2,810 (0.91) | 173 (0.97) | 2,983 (0.91) |
GRM Anchored | 2,931 (0.91) | 157 (0.88) | 3,088(0.91) |
GPCM Anchored | 2,989 (0.88) | 27 (0.15) | 2,989 (0.88) |
1-PL | 2-PL | Anchored 1-PL | Anchored 2-PL | Concurrent 2-PL GRM | Anchored 2-PL GRM | Anchored 2-PL GPCM | Reported z-score | |
---|---|---|---|---|---|---|---|---|
1-PL | 1.00 | 0.98 | 0.99 | 0.98 | 0.97 | 0.74 | 0.74 | 0.91 |
2-PL | 1.00 | 0.97 | 0.99 | 0.99 | 0.76 | 0.77 | 0.89 | |
Anchored 1-PL | 1.00 | 0.98 | 0.97 | 0.74 | 0.74 | 0.91 | ||
Anchored 2-PL | 1.00 | 0.99 | 0.77 | 0.77 | 0.90 | |||
Concurrent 2-PL GRM | 1.00 | 0.78 | 0.78 | 0.90 | ||||
Anchored 2-PL GRM | 1.00 | 0.99 | 0.69 | |||||
Anchored 2-PL GPCM | 1.00 | 0.69 | ||||||
Reported z-score | 1.00 |
Score | Decision classification rates and inconsistencies |
||
---|---|---|---|
P (K) | False positives | False negatives | |
1-PL | 0.963 (0.86) | 0.028 | 0.01 |
2-PL | 0.963 (0.87) | 0.033 | 0.004 |
Anchored 1-PL | 0.970 (0.88) | 0.014 | 0.016 |
Anchored 2-PL | 0.969 (0.88) | 0.022 | 0.009 |
Concurrent GRM | 0.952 (0.83) | 0.046 | 0.001 |
Anchored GRM | 0.867 (0.49) | 0.072 | 0.062 |
Anchored GPCM | 0.867 (0.49) | 0.071 | 0.061 |
MCQ: multiple choice question, CDM: clinical decision making, 1-PL IRT: 1 parameter logistic item response theory, 2-PL IRT: 2 parameter logistic item response theory; GRM: graded-response model.
All 4 calibrations contained 3,499 items. The percentages are out of the total for that section (MCQ = 3,229 items, CDM = 270 total items). PL: parameter logistic, MCQ: multiple choice question, CDM: clinical decision making.
MCQ: multiple choice question, CDM: clinical decision making, GRM: graded-response model, GPCM: generalized partial-credit model.
PL: parameter logistic, GRM: graded-response model, GPCM: generalized partial-credit model.
PL: parameter logistic, GRM: graded-response model, GPCM: generalized partial-credit model.