This study introduces LIVECAT, a web-based computerized adaptive testing platform. This platform provides many functions, including writing item content, managing an item bank, creating and administering a test, reporting test results, and providing information about a test and examinees. The LIVECAT provides examination administrators with an easy and flexible environment for composing and managing examinations. It is available at http://www.thecatkorea.com/. Several tools were used to program LIVECAT, as follows: operating system, Amazon Linux; web server, nginx 1.18; WAS, Apache Tomcat 8.5; database, Amazon RDMS—Maria DB; and languages, JAVA8, HTML5/CSS, Javascript, and jQuery. The LIVECAT platform can be used to implement several item response theory (IRT) models such as the Rasch and 1-, 2-, 3-parameter logistic models. The administrator can choose a specific model of test construction in LIVECAT. Multimedia data such as images, audio files, and movies can be uploaded to items in LIVECAT. Two scoring methods (maximum likelihood estimation and expected a posteriori) are available in LIVECAT and the maximum Fisher information item selection method is applied to every IRT model in LIVECAT. The LIVECAT platform showed equal or better performance compared with a conventional test platform. The LIVECAT platform enables users without psychometric expertise to easily implement and perform computerized adaptive testing at their institutions. The most recent LIVECAT version only provides a dichotomous item response model and the basic components of CAT. Shortly, LIVECAT will include advanced functions, such as polytomous item response models, weighted likelihood estimation method, and content balancing method.
Citations
Citations to this article as recorded by
A Systematic Review on Computerized Adaptive Testing Hümeyra Demir, Selahattin Gelbal Erzincan Üniversitesi Eğitim Fakültesi Dergisi.2025; 27(1): 137. CrossRef
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study Dong Gi Seo, Jeongwook Choi, Jinha Kim Journal of Educational Evaluation for Health Professions.2024; 21: 18. CrossRef
Educational Technology in the University: A Comprehensive Look at the Role of a Professor and Artificial Intelligence Cheolkyu Shin, Dong Gi Seo, Seoyeon Jin, Soo Hwa Lee, Hyun Je Park IEEE Access.2024; 12: 116727. CrossRef
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration Hwanggyu Lim, Kyungseok Kang Journal of Educational Evaluation for Health Professions.2024; 21: 23. CrossRef
Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea Hyunjoo Pai Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Patient-reported outcome measures in cancer care: Integration with computerized adaptive testing Minyu Liang, Zengjie Ye Asia-Pacific Journal of Oncology Nursing.2023; 10(12): 100323. CrossRef
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study Yera Hur, Dong Gi Seo Journal of Educational Evaluation for Health Professions.2023; 20: 20. CrossRef
Purpose In a sequential objective structured clinical examination (OSCE), all students initially take a short screening OSCE. Examinees who pass are excused from further testing, but an additional OSCE is administered to the remaining examinees. Previous investigations of sequential OSCE were based on classical test theory. We aimed to design and evaluate screening OSCEs based on item response theory (IRT).
Methods We carried out a retrospective observational study. At each station of a 10-station OSCE, the students’ performance was graded on a Likert-type scale. Since the data were polytomous, the difficulty parameters, discrimination parameters, and students’ ability were calculated using a graded response model. To design several screening OSCEs, we identified the 5 most difficult stations and the 5 most discriminative ones. For each test, 5, 4, or 3 stations were selected. Normal and stringent cut-scores were defined for each test. We compared the results of each of the 12 screening OSCEs to the main OSCE and calculated the positive and negative predictive values (PPV and NPV), as well as the exam cost.
Results A total of 253 students (95.1%) passed the main OSCE, while 72.6% to 94.4% of examinees passed the screening tests. The PPV values ranged from 0.98 to 1.00, and the NPV values ranged from 0.18 to 0.59. Two tests effectively predicted the results of the main exam, resulting in financial savings of 34% to 40%.
Conclusion If stations with the highest IRT-based discrimination values and stringent cut-scores are utilized in the screening test, sequential OSCE can be an efficient and convenient way to conduct an OSCE.
Citations
Citations to this article as recorded by
Utility of eye-tracking technology for preparing medical students in Spain for the summative objective structured clinical examination Francisco Sánchez-Ferrer, J.M. Ramos-Rincón, M.D. Grima-Murcia, María Luisa Sánchez-Ferrer, Francisco Sánchez-del Campo, Antonio F. Compañ-Rosique, Eduardo Fernández-Jover Journal of Educational Evaluation for Health Professions.2017; 14: 27. CrossRef
Purpose The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory.
Methods Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4.
Results The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%).
Conclusion Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
Citations
Citations to this article as recorded by
Plus ça change, plus c’est pareil: Making a continued case for the use of MCQs in medical education Debra Pugh, André De Champlain, Claire Touchie Medical Teacher.2019; 41(5): 569. CrossRef
Identifying the Essential Portions of the Skill Acquisition Process Using Item Response Theory Saseem Poudel, Yusuke Watanabe, Yo Kurashima, Yoichi M. Ito, Yoshihiro Murakami, Kimitaka Tanaka, Hiroshi Kawase, Toshiaki Shichinohe, Satoshi Hirano Journal of Surgical Education.2019; 76(4): 1101. CrossRef
FUZZY CLASSIFICATION OF DICHOTOMOUS TEST ITEMS AND SOCIAL INDICATORS DIFFERENTIATION PROPERTY Aleksandras Krylovas, Natalja Kosareva, Julija Karaliūnaitė Technological and Economic Development of Economy.2018; 24(4): 1755. CrossRef
Analysis of the suitability of the Korean Federation of Science and Technology Societies journal evaluation tool Geum‐Hee Jeong, Sun Huh Learned Publishing.2016; 29(3): 193. CrossRef
We developed a program to estimate an examinee's ability in order to provide freely available access to a web-based computerized adaptive testing (CAT) program. We used PHP and Java Script as the program languages, PostgresSQL as the database management system on an Apache web server and Linux as the operating system. A system which allows for user input and searching within inputted items and creates tests was constructed. We performed an ability estimation on each test based on a Rasch model and 2- or 3-parametric logistic models. Our system provides an algorithm for a web-based CAT, replacing previous personal computer-based ones, and makes it possible to estimate an examinee?占퐏 ability immediately at the end of test.
Citations
Citations to this article as recorded by
Analysis on Validity and Academic Competency of Mock Test for Korean Medicine National Licensing Examination Using Item Response Theory Han Chae, Eunbyul Cho, SeonKyoung Kim, DaHye Choi, Seul Lee Keimyung Medical Journal.2023; 42(1): 7. CrossRef
Accuracy and Efficiency of Web-based Assessment Platform (LIVECAT) for Computerized Adaptive Testing Do-Gyeong Kim, Dong-Gi Seo The Journal of Korean Institute of Information Technology.2020; 18(4): 77. CrossRef
Computer‐Based Testing and Construction of an Item Bank Database for Medical Education in Korea Sun Huh Korean Medical Education Review.2014; 16(1): 11. CrossRef
Can computerized tests be introduced to the Korean Medical Licensing Examination? Sun Huh Journal of the Korean Medical Association.2012; 55(2): 124. CrossRef
Application of Computerized Adaptive Testing in Medical Education Sun Huh Korean Journal of Medical Education.2009; 21(2): 97. CrossRef
The passing rate of the Medical Licensing Examination has been variable, which probably originated from the difference in the difficulty of items and/or difference in the ability level of examinees. We tried to explain the origin of the difference using the test equating method based on the item response theory. The number of items and examinees were 500, 3,647 in 2003 and 550, 3,879 in 2004. Common item nonequivalent group design was used for 30 common items. Item and ability parameters were calculated by three parametric logistic models using ICL. Scale transformation and true score equating were executed using ST and PIE. The mean of difficulty index of the year 2003 was ??.957 (SD 2.628) and that of 2004 after equating was ??.456 (SD 3.399). The mean of discrimination index of year 2003 was 0.487 (SD 0.242) and that of 2004 was 0.363 (SD 0.193). The mean of ability parameter of year 2003 was 0.00617 (SD 0.96605) and that of year 2004 was 0.94636 (SD 1.32960). The difference of the equated true score at the same ability level was high at the range of score of 200??50. The reason for the difference in passing rates over two consecutive years was due to the fact that the Examination in 2004 was easier and the abilities of the examinees in 2004 were higher. In addition, the passing rates of examinees with score of 270??94 in 2003, and those with 322??43 in 2004, were affected by the examination year.
Citations
Citations to this article as recorded by
Comparison of proficiency in an anesthesiology course across distinct medical student cohorts: Psychometric approaches to test equating Shu-Wei Liao, Kuang-Yi Chang, Chien-Kun Ting, Mei-Yung Tsou, En-Tzu Chen, Kwok-Hon Chan, Wen-Kuei Chang Journal of the Chinese Medical Association.2014; 77(3): 150. CrossRef
Can computerized tests be introduced to the Korean Medical Licensing Examination? Sun Huh Journal of the Korean Medical Association.2012; 55(2): 124. CrossRef
Over the last two decades, there have been a number of significant changes in the evaluation system in medical education in Korea. One major improvement in this respect has been the listing of learning objectives at medical schools and the construction of a content outline for the Korean Medical Licensing Examination that can be used as a basis of evaluation. Item analysis has become a routine method for obtaining information that often provides valuable feedback concerning test items after the completion of a written test. The use of item response theory in analyzing test items has been spreading in medical schools as a way to evaluate performance tests and computerized adaptive testing. A series of recent studies have documented an upward trend in the adoption of the objective structured clinical examination (OSCE) and clinical practice examination (CPX) for measuring skill and attitude domains, in addition to tests of the knowledge domain. There has been an obvious increase in regional consortiums involving neighboring medical schools that share the planning and administration of the OSCE and CPX; this includes recruiting and training standardized patients. Such consortiums share common activities, such as case development and program evaluation. A short history and the pivotal roles of four organizations that have brought about significant changes in the examination system are discussed briefly.
Citations
Citations to this article as recorded by
Presidential address: Adoption of a clinical skills examination for dental licensing, implementation of computer-based testing for the medical licensing examination, and the 30th anniversary of the Korea Health Personnel Licensing Examination Institute Yoon-Seong Lee Journal of Educational Evaluation for Health Professions.2022; 19: 1. CrossRef
Effectiveness of Medical Education Assessment Consortium Clinical Knowledge Mock Examination (2011‐2016) Sang Yeoup Lee, Yeli Lee, Mi Kyung Kim Korean Medical Education Review.2018; 20(1): 20. CrossRef
Long for wonderful leadership in a new era of the Korean Association of Medical Colleges Young Hwan Lee Korean Journal of Medical Education.2014; 26(3): 163. CrossRef
Major Reforms and Issues of the Medical Licensing Examination Systems in Korea Sang-Ho Baik Korean Medical Education Review.2013; 15(3): 125. CrossRef
A Study on the Feasibility of a National Practical Examination in the Radiologic Technologist Soon-Yong Son, Tae-Hyung Kim, Jung-Whan Min, Dong-Kyoon Han, Sung-Min Ahn Journal of the Korea Academia-Industrial cooperation Society.2011; 12(5): 2149. CrossRef
The Relationship between Senior Year Examinations at a Medical School and the Korean Medical Licensing Examination Ki Hoon Jung, Ho Keun Jung, Kwan Lee Korean Journal of Medical Education.2009; 21(1): 17. CrossRef
What Qualities Do Medical School Applicants Need to Have? - Secondary Publication Yera Hur, Sun Kim Yonsei Medical Journal.2009; 50(3): 427. CrossRef
To test the applicability of item response theory (IRT) to the Korean Nurses' Licensing Examination (KNLE), item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7%) were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.
Citations
Citations to this article as recorded by
Item difficulty index, discrimination index, and reliability of the 26 health professions licensing examinations in 2022, Korea: a psychometric study Yoon Hee Kim, Bo Hyun Kim, Joonki Kim, Bokyoung Jung, Sangyoung Bae Journal of Educational Evaluation for Health Professions.2023; 20: 31. CrossRef
Study on the Academic Competency Assessment of Herbology Test using Rasch Model Han Chae, Soo Jin Lee, Chang-ho Han, Young Il Cho, Hyungwoo Kim Journal of Korean Medicine.2022; 43(2): 27. CrossRef
Can computerized tests be introduced to the Korean Medical Licensing Examination? Sun Huh Journal of the Korean Medical Association.2012; 55(2): 124. CrossRef
To evaluate the usefulness of computerized adaptive testing (CAT) in medical school, the General Examination for senior medical students was administered as a paper and pencil test (P&P) and using CAT. The General Examination is a graduate examination, which is also a preliminary examination for the Korean Medical Licensing Examination (KMLE). The correlations between the results of the CAT and P&P and KMLE were analyzed. The correlation between the CAT and P&P was 0.8013 (p=0.000); that between the CAT and P&P was 0.7861 (p=0.000); and that between the CAT and KMLE was 0.6436 (p=0.000). Six out of 12 students with an ability estimate below 0.52 failed the KMLE. The results showed that CAT could replace P&P in medical school. The ability of CAT to predict whether students would pass the KMLE was 0.5 when the criterion of the theta value was set at -0.52 that was chosen arbitrarily for the prediction of pass or failure.
Citations
Citations to this article as recorded by
Analysis on Validity and Academic Competency of Mock Test for Korean Medicine National Licensing Examination Using Item Response Theory Han Chae, Eunbyul Cho, SeonKyoung Kim, DaHye Choi, Seul Lee Keimyung Medical Journal.2023; 42(1): 7. CrossRef
Application of Computerized Adaptive Testing in Medical Education Sun Huh Korean Journal of Medical Education.2009; 21(2): 97. CrossRef
Estimation of an Examinee's Ability in the Web-Based Computerized Adaptive Testing Program IRT-CAT Yoon-Hwan Lee, Jung-Ho Park, In-Yong Park Journal of Educational Evaluation for Health Professions.2006; 3: 4. CrossRef
An examinee's ability can be evaluated precisely using computerized adaptive testing (CAT), which is shorter than written tests and more efficient in terms of the duration of the examination. We used CAT for the second General Examination of 98 senior students in medical college on November 27, 2004. We prepared 1,050 pre-calibrated test items according to item response theory, which had been used for the General Examination administered to senior students in 2003. The computer was programmed to pose questions until the standard error of the ability estimate was smaller than 0.01. To determine the students' attitude toward and evaluation of CAT, we conducted surveys before and after the examination, via the Web. The mean of the students' ability estimates was 0.3513 and its standard deviation was 0.9097 (range -2.4680 to +2.5310). There was no significant difference in the ability estimates according to the responses of students to items concerning their experience with CAT, their ability to use a computer, or their anxiety before and after the examination (p>0.05). Many students were unhappy that they could not recheck their responses (49%), and some stated that there were too few examination items (24%). Of the students, 79 % had no complaints concerning using a computer and 63% wanted to expand the use of CAT. These results indicate that CAT can be implemented in medical schools without causing difficulties for users.
Citations
Citations to this article as recorded by
Computer‐Based Testing and Construction of an Item Bank Database for Medical Education in Korea Sun Huh Korean Medical Education Review.2014; 16(1): 11. CrossRef
Can computerized tests be introduced to the Korean Medical Licensing Examination? Sun Huh Journal of the Korean Medical Association.2012; 55(2): 124. CrossRef
Application of Computerized Adaptive Testing in Medical Education Sun Huh Korean Journal of Medical Education.2009; 21(2): 97. CrossRef
The results of the 64th and 65th Korean Medical Licensing Examination were analyzed according to the classical test theory and item response theory in order to know the possibility of applying item response theory to item analys and to suggest its applicability to computerized adaptive test. The correlation coefficiency of difficulty index, discriminating index and ability parameter between two kinds of analysis were got using computer programs such as Analyst 4.0, Bilog and Xcalibre. Correlation coefficiencies of difficulty index were equal to or more than 0.75; those of discriminating index were between - 0.023 and 0.753; those of ability parameter were equal to or more than 0.90. Those results suggested that the item analysis according to item response theory showed the comparable results with that according to classical test theory except discriminating index. Since the ability parameter is most widely used in the criteria-reference test, the high correlation between ability parameter and total score can provide the validity of computerized adaptive test utilizing item response theory.
Citations
Citations to this article as recorded by
Journal of Educational Evaluation for Health Professions received the top-ranking Journal Impact Factor―9.3—in the category of Education, Scientific Disciplines in the 2023 Journal Citation Ranking by Clarivate Sun Huh Journal of Educational Evaluation for Health Professions.2024; 21: 16. CrossRef
Analysis on Validity and Academic Competency of Mock Test for Korean Medicine National Licensing Examination Using Item Response Theory Han Chae, Eunbyul Cho, SeonKyoung Kim, DaHye Choi, Seul Lee Keimyung Medical Journal.2023; 42(1): 7. CrossRef
Item difficulty index, discrimination index, and reliability of the 26 health professions licensing examinations in 2022, Korea: a psychometric study Yoon Hee Kim, Bo Hyun Kim, Joonki Kim, Bokyoung Jung, Sangyoung Bae Journal of Educational Evaluation for Health Professions.2023; 20: 31. CrossRef
Can computerized tests be introduced to the Korean Medical Licensing Examination? Sun Huh Journal of the Korean Medical Association.2012; 55(2): 124. CrossRef