Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
10 "Medical licensure"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research article
Improving item pool utilization for health professions examinations under variable-length computerized adaptive testing designs: a shadow-test approach  
Hwanggyu Lim, Kyung (Chris) Tyek Han
J Educ Eval Health Prof. 2025;22:35.   Published online November 3, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.35    [Epub ahead of print]
  • 1,270 View
  • 145 Download
AbstractAbstract PDFSupplementary Material
Purpose
The shadow-test approach to computerized adaptive testing (CAT) ensures content validity in health professions examinations but may suffer from poor item pool utilization in variable-length designs, increasing operational costs and security risks. This study aimed to address this challenge by developing algorithms that enhance the sustainability of shadow CAT in variable-length design.
Methods
A simulation study was conducted to evaluate 3 proposed modifications of the α-stratification method designed to improve item pool utilization. These methods, which integrated randomesque selection and multiple-form strategies, were compared with 2 baseline algorithms within a variable-length shadow CAT framework. Performance was assessed in terms of measurement precision, pool utilization, and test efficiency.
Results
The proposed modifications significantly outperformed the baseline methods across all measures of item pool utilization and exposure control. The most effective method (Modification 2) reduced the proportion of unused items from 35.6% to 5.0% and produced more uniform item exposure rates. These substantial gains in operational sustainability were achieved while maintaining measurement precision comparable to the baseline methods.
Conclusion
The proposed algorithms effectively mitigate poor item pool utilization in shadow CAT under variable-length design. This enhanced framework provides a robust, secure, and sustainable solution for high-stakes adaptive assessments in the health professions that remain content-valid, precise, and operationally efficient.
Review
Prompt engineering for single-best-answer multiple-choice questions in licensing examinations: a narrative review with a case study involving the Korean Medical Licensing Examination  
Bokyoung Kim, Junseok Kang, Min-Young Kim, Jihyun Ahn
J Educ Eval Health Prof. 2025;22:34.   Published online October 27, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.34
  • 1,088 View
  • 161 Download
AbstractAbstract PDFSupplementary Material
The emergence of large language models (LLMs) has generated growing interest in their potential applications for medical assessment and item development. This practice-oriented narrative review examines the potential of LLMs, particularly ChatGPT, for generating and validating single-best-answer multiple-choice questions in health professions licensing examinations, using a Korean Medical Licensing Examination (KMLE)-focused case perspective. We frame LLMs as human-in-the-loop tools rather than replacements for high-stakes testing. Recent applications of LLMs in assessment were reviewed, including prompting strategies such as few-shot, multi-stage, and chain-of-thought methods, as well as retrieval-augmented generation (RAG) to align outputs with exam blueprints. Approaches to enforcing formatting rules, checklist-based self-validation, and iterative refinement were analyzed for their role in supporting item development. Findings indicate that LLMs can perform near passing thresholds on high-stakes exams and assist with grading and feedback tasks. Prompt engineering enhances structural fidelity and clinical plausibility, while human oversight remains critical for accuracy, cultural appropriateness, and psychometric defensibility. The emerging multimodal generation of images, audio, and video suggests the feasibility of new item formats, provided robust validation safeguards are implemented. The most effective approach is a human-in-the-loop workflow that leverages artificial intelligence efficiency while embedding expert judgment, psychometric evaluation, and ethical governance. This practice-oriented roadmap—integrating strategic prompt selection, RAG-based blueprint alignment, rigorous validation gates, and KMLE-specific formatting—offers an implementable and methodologically defensible approach for licensing examinations.
Research articles
Performance of GPT-4o and o1-Pro on United Kingdom Medical Licensing Assessment-style items: a comparative study  
Behrad Vakili, Aadam Ahmad, Mahsa Zolfaghari
J Educ Eval Health Prof. 2025;22:30.   Published online October 10, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.30
  • 1,207 View
  • 160 Download
AbstractAbstract PDFSupplementary Material
Purpose
Large language models (LLMs) such as ChatGPT, and their potential to support autonomous learning for licensing exams like the UK Medical Licensing Assessment (UKMLA), are of growing interest. However, empirical evaluations of artificial intelligence (AI) performance against the UKMLA standard remain limited.
Methods
We evaluated the performance of 2 recent ChatGPT versions, GPT-4o and o1-Pro, on a curated set of 374 UKMLA-style single-best-answer items spanning diverse medical specialties. Statistical comparisons using McNemar’s test assessed the significance of differences between the 2 models. Specialties were analyzed to identify domain-specific variation. In addition, 20 image-based items were evaluated.
Results
GPT-4o achieved an accuracy of 88.8%, while o1-Pro achieved 93.0%. McNemar’s test revealed a statistically significant difference in favor of o1-Pro. Across specialties, both models demonstrated excellent performance in surgery, psychiatry, and infectious diseases. Notable differences arose in dermatology, respiratory medicine, and imaging, where o1-Pro consistently outperformed GPT-4o. Nevertheless, isolated weaknesses in general practice were observed. The analysis of image-based items showed 75% accuracy for GPT-4o and 90% for o1-Pro (P=0.25).
Conclusion
ChatGPT shows strong potential as an adjunct learning tool for UKMLA preparation, with both models achieving scores above the calculated pass mark. This underscores the promise of advanced AI models in medical education. However, specialty-specific inconsistencies suggest AI tools should complement, rather than replace, traditional study methods.
Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study  
Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat
J Educ Eval Health Prof. 2025;22:16.   Published online May 12, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.16
  • 3,974 View
  • 303 Download
  • 3 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand’s National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials.
Methods
We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds.
Results
All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7–89.1), surpassing Thailand’s national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT-4o: 90.0% vs. 82.6%).
Conclusion
General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.

Citations

Citations to this article as recorded by  
  • Performance of GPT-4o and o1-Pro on United Kingdom Medical Licensing Assessment-style items: a comparative study
    Behrad Vakili, Aadam Ahmad, Mahsa Zolfaghari
    Journal of Educational Evaluation for Health Professions.2025; 22: 30.     CrossRef
  • Large Language Models for the National Radiological Technologist Licensure Examination in Japan: Cross-Sectional Comparative Benchmarking and Evaluation of Model-Generated Items Study
    Toshimune Ito, Toru Ishibashi, Tatsuya Hayashi, Shinya Kojima, Kazumi Sogabe
    JMIR Medical Education.2025; 11: e81807.     CrossRef
  • Technologies, opportunities, challenges, and future directions for integrating generative artificial intelligence into medical education: a narrative review
    Junseok Kang, Jihyun Ahn
    Ewha Medical Journal.2025; 48(4): e53.     CrossRef
A nationwide survey on the curriculum and educational resources related to the Clinical Skills Test of the Korean Medical Licensing Examination: a cross-sectional descriptive study  
Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han
J Educ Eval Health Prof. 2025;22:11.   Published online March 13, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.11
  • 3,718 View
  • 301 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The revised Clinical Skills Test (CST) of the Korean Medical Licensing Exam aims to provide a better assessment of physicians’ clinical competence and ability to interact with patients. This study examined the impact of the revised CST on medical education curricula and resources nationwide, while also identifying areas for improvement within the revised CST.
Methods
This study surveyed faculty responsible for clinical clerkships at 40 medical schools throughout Korea to evaluate the status and changes in clinical skills education, assessment, and resources related to the CST. The researchers distributed the survey via email through regional consortia between December 7, 2023 and January 19, 2024.
Results
Nearly all schools implemented preliminary student–patient encounters during core clinical rotations. Schools primarily conducted clinical skills assessments in the third and fourth years, with a simplified form introduced in the first and second years. Remedial education was conducted through various methods, including one-on-one feedback from faculty after the assessment. All schools established clinical skills centers and made ongoing improvements. Faculty members did not perceive the CST revisions as significantly altering clinical clerkship or skills assessments. They suggested several improvements, including assessing patient records to improve accuracy and increasing the objectivity of standardized patient assessments to ensure fairness.
Conclusion
During the CST, students’ involvement in patient encounters and clinical skills education increased, improving the assessment and feedback processes for clinical skills within the curriculum. To enhance students’ clinical competencies and readiness, strengthening the validity and reliability of the CST is essential.

Citations

Citations to this article as recorded by  
  • Nationwide cross-sectional survey on the necessity of including a clinical skills assessment in the national licensure examination for Doctors of Korean Medicine
    Aram Jeong, Eunbyul Cho, Chan-Young Kwon, Sanghoon Lee, Chungsik Cho, Sangwoo Shin, Min Hwangbo, Dong-Hyeon Kim, Hye-Yoon Lee
    Medicine.2025; 104(45): e45366.     CrossRef
History article
History of the medical licensure system in Korea from the late 1800s to 1992
Sang-Ik Hwang
J Educ Eval Health Prof. 2024;21:36.   Published online December 9, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.36
  • 4,113 View
  • 114 Download
AbstractAbstract PDFSupplementary Material
The introduction of modern Western medicine in the late 19th century, notably through vaccination initiatives, marked the beginning of governmental involvement in medical licensure, with the licensing of doctors who performed vaccinations. The establishment of the national medical school “Euihakkyo” in 1899 further formalized medical education and licensure, granting graduates the privilege to practice medicine without additional examinations. The enactment of the Regulations on Doctors in 1900 by the Joseon government aimed to define doctor qualifications, including modern and traditional practitioners, comprehensively. However, resistance from the traditional medical community hindered its full implementation. During the Japanese colonial occupation of the Korean Peninsula from 1910 to 1945, the medical licensure system was controlled by colonial authorities, leading to the marginalization of traditional Korean medicine and the imposition of imperial hierarchical structures. Following liberation in 1945 from Japanese colonial rule, the Korean government undertook significant reforms, culminating in the National Medical Law, which was enacted in 1951. This law redefined doctor qualifications and reinstated the status of traditional Korean medicine. The introduction of national examinations for physicians increased state involvement in ensuring medical competence. The privatization of the Korean Medical Licensing Examination led to the establishment of the Korea Health Personnel Licensing Examination Institute in 1992, which assumed responsibility for administering licensing examinations for all healthcare workers. This shift reflected a move towards specialized management of professional standards. The evolution of the medical licensure system in Korea illustrates a dynamic process shaped by the historical context, balancing the protection of public health with the rights of medical practitioners.
Review
The legality and appropriateness of keeping Korean Medical Licensing Examination items confidential: a comparative analysis and review of court rulings  
Jae Sun Kim, Dae Un Hong, Ju Yoen Lee
J Educ Eval Health Prof. 2024;21:28.   Published online October 15, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.28
  • 3,619 View
  • 229 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
This study examines the legality and appropriateness of keeping the multiple-choice question items of the Korean Medical Licensing Examination (KMLE) confidential. Through an analysis of cases from the United States, Canada, and Australia, where medical licensing exams are conducted using item banks and computer-based testing, we found that exam items are kept confidential to ensure fairness and prevent cheating. In Korea, the Korea Health Personnel Licensing Examination Institute (KHPLEI) has been disclosing KMLE questions despite concerns over exam integrity. Korean courts have consistently ruled that multiple-choice question items prepared by public institutions are non-public information under Article 9(1)(v) of the Korea Official Information Disclosure Act (KOIDA), which exempts disclosure if it significantly hinders the fairness of exams or research and development. The Constitutional Court of Korea has upheld this provision. Given the time and cost involved in developing high-quality items and the need to accurately assess examinees’ abilities, there are compelling reasons to keep KMLE items confidential. As a public institution responsible for selecting qualified medical practitioners, KHPLEI should establish its disclosure policy based on a balanced assessment of public interest, without influence from specific groups. We conclude that KMLE questions qualify as non-public information under KOIDA, and KHPLEI may choose to maintain their confidentiality to ensure exam fairness and efficiency.

Citations

Citations to this article as recorded by  
  • Halted medical education and medical residents’ training in Korea, journal metrics, and appreciation to reviewers and volunteers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2025; 22: 1.     CrossRef
Research article
Is it possible to introduce an interview to the Korean Medical Licensing Examination to assess professional attributes?: a survey-based observational study  
Seung-Joo Na, HyeRin Roh, Kyung Hee Chun, Kyung Hye Park, Do-Hwan Kim
J Educ Eval Health Prof. 2022;19:10.   Published online May 10, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.10
  • 5,177 View
  • 303 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimsed to gather opinions from medical educators on the possibility of introducing an interview to the Korean Medical Licensing Examination (KMLE) to assess professional attributes. Specifically following topics were dealt with: the appropriate timing and tool to assess unprofessional conduct; ; the possiblity of prevention of unprofessional conduct by introducing an interview to the KMLE; and the possibility of implementation of an interview to the KMLE.
Methods
A cross-sectional study approach based on a survey questionnaire was adopted. We analyzed 104 pieces of news about doctors’ unprofessional conduct to determine the deficient professional attributes. We derived 24 items of unprofessional conduct and developed the questionnaire and surveyed 250 members of the Korean Society of Medical Education 2 times. Descriptive statistics, cross-tabulation analysis, and Fisher’s exact test were applied to the responses. The answers to the open-ended questions were analyzed using conventional content analysis.
Results
In the survey, 49 members (19.6%) responded. Out of 49, 24 (49.5%) responded in the 2nd survey. To assess unprofessional conduct, there was no dominant timing among basic medical education (BME), KMLE, and continuing professional development (CPD). There was no overwhelming assessment tool among written examination, objective structured clinical examination, practice observation, and interview. Response rates of “impossible” (49.0%) and “possible” (42.9%) suggested an interview of the KMLE prevented unprofessional conduct. In terms of implementation, “impossible” (50.0%) was selected more often than “possible” (33.3%).
Conclusion
Professional attributes should be assessed by various tools over the period from BME to CPD. Hence, it may be impossible to introduce an interview to assess professional attributes to the KMLE, and a system is needed such as self-regulation by the professional body rather than licensing examination.
History Article
History of the medical licensing examination (uieop) in Korea’s Goryeo Dynasty (918-1392)  
Kyung-Lock Lee
J Educ Eval Health Prof. 2015;12:19.   Published online May 26, 2015
DOI: https://doi.org/10.3352/jeehp.2015.12.19
  • 35,489 View
  • 191 Download
  • 1 Crossref
AbstractAbstract PDF
This article aims to describe the training and medical licensing system (uieop) for becoming a physician officer (uigwan) during Korea’s Goryeo Dynasty (918-1392). In the Goryeo Dynasty, although no license was necessary to provide medical services to the common people, there was a licensing examination to become a physician officer. No other national licensing system for healthcare professionals existed in Korea at that time. The medical licensing examination was administered beginning in 958. Physician officers who passed the medical licensing examination worked in two main healthcare institutions: the Government Hospital (Taeuigam) and Pharmacy for the King (Sangyakguk). The promotion and expansion of medical education differed depending on the historical period. Until the reign of King Munjong (1046-1083), medical education as a path to licensure was encouraged in order to increase the number of physician officers qualifying for licensure by examination; thus, the number of applicants sitting for the examination increased. However, in the late Goryeo Dynasty, after the officer class of the local authorities (hyangri) showed a tendency to monopolize the examination, the Goryeo government limited the examination applications by this group. The medical licensing examination was divided into two parts: medicine and ‘feeling the pulse and acupuncture’ (jugeumeop). The Goryeo Dynasty followed the Chinese Dang Dynasty’s medical system while also taking a strong interest in the Chinese Song Dynasty’s ideas about medicine.

Citations

Citations to this article as recorded by  
  • LİYAKAT TEMELLİ BÜROKRASİ: KORE KAMU SINAVLARI (GWAGEO) (958-1894) - THE MERIT-BASED BUREAUCRACY: THE CIVIL SERVICE EXAMINATION (GWAGEO) IN KOREA (958-1894)
    Murat KAÇER
    Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi.2018; 10(26): 754.     CrossRef
Technical Report
Best-fit model of exploratory and confirmatory factor analysis of the 2010 Medical Council of Canada Qualifying Examination Part I clinical decision-making cases  
André F. Champlain
J Educ Eval Health Prof. 2015;12:11.   Published online April 15, 2015
DOI: https://doi.org/10.3352/jeehp.2015.12.11
  • 35,154 View
  • 221 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDF
Purpose
This study aims to assess the fit of a number of exploratory and confirmatory factor analysis models to the 2010 Medical Council of Canada Qualifying Examination Part I (MCCQE1) clinical decision-making (CDM) cases. The outcomes of this study have important implications for a range of domains, including scoring and test development. Methods: The examinees included all first-time Canadian medical graduates and international medical graduates who took the MCCQE1 in spring or fall 2010. The fit of one- to five-factor exploratory models was assessed for the item response matrix of the 2010 CDM cases. Five confirmatory factor analytic models were also examined with the same CDM response matrix. The structural equation modeling software program Mplus was used for all analyses. Results: Out of the five exploratory factor analytic models that were evaluated, a three-factor model provided the best fit. Factor 1 loaded on three medicine cases, two obstetrics and gynecology cases, and two orthopedic surgery cases. Factor 2 corresponded to pediatrics, and the third factor loaded on psychiatry cases. Among the five confirmatory factor analysis models examined in this study, three- and four-factor lifespan period models and the five-factor discipline models provided the best fit. Conclusion: The results suggest that knowledge of broad disciplinary domains best account for performance on CDM cases. In test development, particular effort should be placed on developing CDM cases according to broad discipline and patient age domains; CDM testlets should be assembled largely using the criteria of discipline and age.

Citations

Citations to this article as recorded by  
  • Exploratory Factor Analysis of a Computerized Case-Based F-Type Testlet Variant
    Yavuz Selim Kıyak, Işıl İrem Budakoğlu, Dilara Bakan Kalaycıoğlu, Özlem Coşkun
    Medical Science Educator.2023; 33(5): 1191.     CrossRef
  • The key-features approach to assess clinical decisions: validity evidence to date
    G. Bordage, G. Page
    Advances in Health Sciences Education.2018; 23(5): 1005.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP