Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
179 "Education"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research articles
A nationwide survey on the curriculum and educational resources related to the Clinical Skills Test of the Korean Medical Licensing Examination: a cross-sectional descriptive study
Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han
J Educ Eval Health Prof. 2025;22:11.   Published online March 13, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.11    [Epub ahead of print]
  • 166 View
  • 26 Download
AbstractAbstract PDF
Purpose
The revised Clinical Skills Test (CST) of the Korean Medical Licensing Exam aims to provide a better assessment of physicians’ clinical competence and ability to interact with patients. This study examined the impact of the revised CST on medical education curricula and resources nationwide, while also identifying areas for improvement within the revised CST.
Methods
This study surveyed faculty responsible for clinical clerkships at 40 medical schools throughout Korea to evaluate the status and changes in clinical skills education, assessment, and resources related to the CST. The researchers distributed the survey via email through regional consortia between December 7, 2023 and January 19, 2024.
Results
Nearly all schools implemented preliminary student–patient encounters during core clinical rotations. Schools primarily conducted clinical skills assessments in the third and fourth years, with a simplified form introduced in the first and second years. Remedial education was conducted through various methods, including one-on-one feedback from faculty after the assessment. All schools established clinical skills centers and made ongoing improvements. Faculty members did not perceive the CST revisions as significantly altering clinical clerkship or skills assessments. They suggested several improvements, including assessing patient records to improve accuracy and increasing the objectivity of standardized patient assessments to ensure fairness.
Conclusion
During the CST, students’ involvement in patient encounters and clinical skills education increased, improving the assessment and feedback processes for clinical skills within the curriculum. To enhance students’ clinical competencies and readiness, strengthening the validity and reliability of the CST is essential.
Simulation-based teaching versus traditional small group teaching for first-year medical students among high and low scorers in respiratory physiology, India: a randomized controlled trial
Nalini Yelahanka Channegowda, Dinker Ramanand Pai, Shivasakthy Manivasakan
J Educ Eval Health Prof. 2025;22:8.   Published online February 21, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.8    [Epub ahead of print]
  • 275 View
  • 85 Download
AbstractAbstract PDF
Purpose
Although it is widely utilized in clinical subjects for skill training, using simulation-based education (SBE) for teaching basic science concepts to phase I medical students or pre-clinical students is limited. Simulation-based education/teaching is preferred in cardiovascular and respiratory physiology when compared to other systems because it is easy to recreate both the normal physiological component and alterations in the simulated environment, thus a promoting deep understanding of the core concepts.
Methods
A block randomized study was conducted among 107 phase 1 (first-year) medical undergraduate students at a Deemed to be University in India. Group A received SBE and Group B traditional small group teaching. The effectiveness of the teaching intervention was assessed using pre- and post-tests. Student feedback was obtained through a self administered structured questionnaire via an anonymous online survey and by in-depth interview.
Results
The intervention group showed a statistically significant improvement in post-test scores compared to the control group. A sub-analysis revealed that high scorers performed better than low scorers in both groups, but the knowledge gain among low scorers was more significant in the intervention group.
Conclusion
This teaching strategy offers a valuable supplement to traditional methods, fostering a deeper comprehension of clinical concepts from the outset of medical training.
Reliability and construct validation of the Blended Learning Usability Evaluation–Questionnaire with interprofessional clinicians in Canada: a methodological study  
Anish Kumar Arora, Jeff Myers, Tavis Apramian, Kulamakan Kulasegaram, Daryl Bainbridge, Hsien Seow
J Educ Eval Health Prof. 2025;22:5.   Published online January 16, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.5    [Epub ahead of print]
  • 524 View
  • 125 Download
AbstractAbstract PDF
Purpose
To generate Cronbach’s alpha and further mixed methods construct validity evidence for the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q).
Methods
Forty interprofessional clinicians completed the BLUE-Q after finishing a 3-month long blended learning professional development program in Ontario, Canada. Reliability was assessed with Cronbach’s α for each of the 3 sections of the BLUE-Q and for all quantitative items together. Construct validity was evaluated through the Grand-Guillaume-Perrenoud et al. framework, which consists of 3 elements: congruence, convergence, and credibility. To compare quantitative and qualitative results, descriptive statistics, including means and standard deviations for each Likert scale item of the BLUE-Q were calculated.
Results
Cronbach’s α was 0.95 for the pedagogical usability section, 0.85 for the synchronous modality section, 0.93 for the asynchronous modality section, and 0.96 for all quantitative items together. Mean ratings (with standard deviations) were 4.77 (0.506) for pedagogy, 4.64 (0.654) for synchronous learning, and 4.75 (0.536) for asynchronous learning. Of the 239 qualitative comments received, 178 were identified as substantive, of which 88% were considered congruent and 79% were considered convergent with the high means. Among all congruent responses, 69% were considered confirming statements and 31% were considered clarifying statements, suggesting appropriate credibility. Analysis of the clarifying statements assisted in identifying 5 categories of suggestions for program improvement.
Conclusion
The BLUE-Q demonstrates high reliability and appropriate construct validity in the context of a blended learning program with interprofessional clinicians, making it a valuable tool for comprehensive program evaluation, quality improvement, and evaluative research in health professions education.
Empathy and tolerance of ambiguity in medical students and doctors participating in art-based observational training at the Rijksmuseum in Amsterdam, the Netherlands: a before-and-after study  
Stella Anna Bult, Thomas van Gulik
J Educ Eval Health Prof. 2025;22:3.   Published online January 14, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.3
  • 696 View
  • 112 Download
AbstractAbstract PDFSupplementary Material
Purpose
This research presents an experimental study using validated questionnaires to quantitatively assess the outcomes of art-based observational training in medical students, residents, and specialists. The study tested the hypothesis that art-based observational training would lead to measurable effects on judgement skills (tolerance of ambiguity) and empathy in medical students and doctors.
Methods
An experimental cohort study with pre- and post-intervention assessments was conducted using validated questionnaires and qualitative evaluation forms to examine the outcomes of art-based observational training in medical students and doctors. Between December 2023 and June 2024, 15 art courses were conducted in the Rijksmuseum in Amsterdam. Participants were assessed on empathy using the Jefferson Scale of Empathy (JSE) and tolerance of ambiguity using the Tolerance of Ambiguity in Medical Students and Doctors (TAMSAD) scale.
Results
In total, 91 participants were included; 29 participants completed the JSE and 62 completed the TAMSAD scales. The results showed statistically significant post-test increases for mean JSE and TAMSAD scores (3.71 points for the JSE, ranging from 20 to 140, and 1.86 points for the TAMSAD, ranging from 0 to 100). The qualitative findings were predominantly positive.
Conclusion
The results suggest that incorporating art-based observational training in medical education improves empathy and tolerance of ambiguity. This study highlights the importance of art-based observational training in medical education in the professional development of medical students and doctors.
Inter-rater reliability and content validity of the measurement tool for portfolio assessments used in the Introduction to Clinical Medicine course at Ewha Womans University College of Medicine: a methodological study  
Dong-Mi Yoo, Jae Jin Han
J Educ Eval Health Prof. 2024;21:39.   Published online December 10, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.39
  • 637 View
  • 151 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to examine the reliability and validity of a measurement tool for portfolio assessments in medical education. Specifically, it investigated scoring consistency among raters and assessment criteria appropriateness according to an expert panel.
Methods
A cross-sectional observational study was conducted from September to December 2018 for the Introduction to Clinical Medicine course at the Ewha Womans University College of Medicine. Data were collected for 5 randomly selected portfolios scored by a gold-standard rater and 6 trained raters. An expert panel assessed the validity of 12 assessment items using the content validity index (CVI). Statistical analysis included Pearson correlation coefficients for rater alignment, the intraclass correlation coefficient (ICC) for inter-rater reliability, and the CVI for item-level validity.
Results
Rater 1 had the highest Pearson correlation (0.8916) with the gold-standard rater, while Rater 5 had the lowest (0.4203). The ICC for all raters was 0.3821, improving to 0.4415 after excluding Raters 1 and 5, indicating a 15.6% reliability increase. All assessment items met the CVI threshold of ≥0.75, with some achieving a perfect score (CVI=1.0). However, items like “sources” and “level and degree of performance” showed lower validity (CVI=0.72).
Conclusion
The present measurement tool for portfolio assessments demonstrated moderate reliability and strong validity, supporting its use as a credible tool. For a more reliable portfolio assessment, more faculty training is needed.
Validation of the 21st Century Skills Assessment Scale for public health students in Thailand: a methodological study  
Suphawadee Panthumas, Kaung Zaw, Wirin Kittipichai
J Educ Eval Health Prof. 2024;21:37.   Published online December 10, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.37
  • 931 View
  • 166 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to develop and validate the 21st Century Skills Assessment Scale (21CSAS) for Thai public health (PH) undergraduate students using the Partnership for 21st Century Skills framework.
Methods
A cross-sectional survey was conducted among 727 first- to fourth-year PH undergraduate students from 4 autonomous universities in Thailand. Data were collected using self-administered questionnaires between January and March 2023. Exploratory factor analysis (EFA) was used to explore the underlying dimensions of 21CSAS, while confirmatory factor analysis (CFA) was conducted to test the hypothesized factor structure using Mplus software (Muthén & Muthén). Reliability and item discrimination were assessed using Cronbach’s α and the corrected item-total correlation, respectively.
Results
EFA performed on a dataset of 300 students revealed a 20-item scale with a 6-factor structure: (1) creativity and innovation; (2) critical thinking and problem-solving; (3) information, media, and technology; (4) communication and collaboration; (5) initiative and self-direction; and (6) social and cross-cultural skills. The rotated eigenvalues ranged from 2.12 to 1.73. CFA performed on another dataset of 427 students confirmed a good model fit (χ2/degrees of freedom=2.67, comparative fit index=0.93, Tucker-Lewis index=0.91, root mean square error of approximation=0.06, standardized root mean square residual=0.06), explaining 34%–71% of variance in the items. Item loadings ranged from 0.58 to 0.84. The 21CSAS had a Cronbach’s α of 0.92.
Conclusion
The 21CSAS proved be a valid and reliable tool for assessing 21st century skills among Thai PH undergraduate students. These findings provide insights for educational system to inform policy, practice, and research regarding 21st-century skills among undergraduate students.
History article
History of the medical licensure system in Korea from the late 1800s to 1992
Sang-Ik Hwang
J Educ Eval Health Prof. 2024;21:36.   Published online December 9, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.36
  • 450 View
  • 77 Download
AbstractAbstract PDFSupplementary Material
The introduction of modern Western medicine in the late 19th century, notably through vaccination initiatives, marked the beginning of governmental involvement in medical licensure, with the licensing of doctors who performed vaccinations. The establishment of the national medical school “Euihakkyo” in 1899 further formalized medical education and licensure, granting graduates the privilege to practice medicine without additional examinations. The enactment of the Regulations on Doctors in 1900 by the Joseon government aimed to define doctor qualifications, including modern and traditional practitioners, comprehensively. However, resistance from the traditional medical community hindered its full implementation. During the Japanese colonial occupation of the Korean Peninsula from 1910 to 1945, the medical licensure system was controlled by colonial authorities, leading to the marginalization of traditional Korean medicine and the imposition of imperial hierarchical structures. Following liberation in 1945 from Japanese colonial rule, the Korean government undertook significant reforms, culminating in the National Medical Law, which was enacted in 1951. This law redefined doctor qualifications and reinstated the status of traditional Korean medicine. The introduction of national examinations for physicians increased state involvement in ensuring medical competence. The privatization of the Korean Medical Licensing Examination led to the establishment of the Korea Health Personnel Licensing Examination Institute in 1992, which assumed responsibility for administering licensing examinations for all healthcare workers. This shift reflected a move towards specialized management of professional standards. The evolution of the medical licensure system in Korea illustrates a dynamic process shaped by the historical context, balancing the protection of public health with the rights of medical practitioners.
Research article
Effectiveness of ChatGPT-4o in developing continuing professional development plans for graduate radiographers: a descriptive study  
Minh Chau, Elio Stefan Arruzza, Kelly Spuur
J Educ Eval Health Prof. 2024;21:34.   Published online November 18, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.34
  • 954 View
  • 163 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study evaluates the use of ChatGPT-4o in creating tailored continuing professional development (CPD) plans for radiography students, addressing the challenge of aligning CPD with Medical Radiation Practice Board of Australia (MRPBA) requirements. We hypothesized that ChatGPT-4o could support students in CPD planning while meeting regulatory standards.
Methods
A descriptive, experimental design was used to generate 3 unique CPD plans using ChatGPT-4o, each tailored to hypothetical graduate radiographers in varied clinical settings. Each plan followed MRPBA guidelines, focusing on computed tomography specialization by the second year. Three MRPBA-registered academics assessed the plans using criteria of appropriateness, timeliness, relevance, reflection, and completeness from October 2024 to November 2024. Ratings underwent analysis using the Friedman test and intraclass correlation coefficient (ICC) to measure consistency among evaluators.
Results
ChatGPT-4o generated CPD plans generally adhered to regulatory standards across scenarios. The Friedman test indicated no significant differences among raters (P=0.420, 0.761, and 0.807 for each scenario), suggesting consistent scores within scenarios. However, ICC values were low (–0.96, 0.41, and 0.058 for scenarios 1, 2, and 3), revealing variability among raters, particularly in timeliness and completeness criteria, suggesting limitations in the ChatGPT-4o’s ability to address individualized and context-specific needs.
Conclusion
ChatGPT-4o demonstrates the potential to ease the cognitive demands of CPD planning, offering structured support in CPD development. However, human oversight remains essential to ensure plans are contextually relevant and deeply reflective. Future research should focus on enhancing artificial intelligence’s personalization for CPD evaluation, highlighting ChatGPT-4o’s potential and limitations as a tool in professional education.

Citations

Citations to this article as recorded by  
  • Halted medical education and medical residents’ training in Korea, journal metrics, and appreciation to reviewers and volunteers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2025; 22: 1.     CrossRef
Technical report
Increased accessibility of computer-based testing for residency application to a hospital in Brazil with item characteristics comparable to paper-based testing: a psychometric study  
Marcos Carvalho Borges, Luciane Loures Santos, Paulo Henrique Manso, Elaine Christine Dantas Moisés, Pedro Soler Coltro, Priscilla Costa Fonseca, Paulo Roberto Alves Gentil, Rodrigo de Carvalho Santana, Lucas Faria Rodrigues, Benedito Carlos Maciel, Hilton Marcos Alves Ricz
J Educ Eval Health Prof. 2024;21:32.   Published online November 11, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.32
  • 670 View
  • 135 Download
AbstractAbstract PDFSupplementary Material
Purpose
With the coronavirus disease 2019 pandemic, online high-stakes exams have become a viable alternative. This study evaluated the feasibility of computer-based testing (CBT) for medical residency applications in Brazil and its impacts on item quality and applicants’ access compared to paper-based testing.
Methods
In 2020, an online CBT was conducted in a Ribeirao Preto Clinical Hospital in Brazil. In total, 120 multiple-choice question items were constructed. Two years later, the exam was performed as paper-based testing. Item construction processes were similar for both exams. Difficulty and discrimination indexes, point-biserial coefficient, difficulty, discrimination, guessing parameters, and Cronbach’s α coefficient were measured based on the item response and classical test theories. Internet stability for applicants was monitored.
Results
In 2020, 4,846 individuals (57.1% female, mean age of 26.64±3.37 years) applied to the residency program, versus 2,196 individuals (55.2% female, mean age of 26.47±3.20 years) in 2022. For CBT, there was an increase of 2,650 applicants (120.7%), albeit with significant differences in demographic characteristics. There was a significant increase in applicants from more distant and lower-income Brazilian regions, such as the North (5.6% vs. 2.7%) and Northeast (16.9% vs. 9.0%). No significant differences were found in difficulty and discrimination indexes, point-biserial coefficients, and Cronbach’s α coefficients between the 2 exams.
Conclusion
Online CBT with multiple-choice questions was a viable format for a residency application exam, improving accessibility without compromising exam integrity and quality.
Research articles
Validation of the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q) through an innovative Bayesian questionnaire validation approach  
Anish Kumar Arora, Charo Rodriguez, Tamara Carver, Hao Zhang, Tibor Schuster
J Educ Eval Health Prof. 2024;21:31.   Published online November 7, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.31
  • 878 View
  • 184 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The primary aim of this study is to validate the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q) for use in the field of health professions education through a Bayesian approach. As Bayesian questionnaire validation remains elusive, a secondary aim of this article is to serve as a simplified tutorial for engaging in such validation practices in health professions education.
Methods
A total of 10 health education-based experts in blended learning were recruited to participate in a 30-minute interviewer-administered survey. On a 5-point Likert scale, experts rated how well they perceived each item of the BLUE-Q to reflect its underlying usability domain (i.e., effectiveness, efficiency, satisfaction, accessibility, organization, and learner experience). Ratings were descriptively analyzed and converted into beta prior distributions. Participants were also given the option to provide qualitative comments for each item.
Results
After reviewing the computed expert prior distributions, 31 quantitative items were identified as having a probability of “low endorsement” and were thus removed from the questionnaire. Additionally, qualitative comments were used to revise the phrasing and order of items to ensure clarity and logical flow. The BLUE-Q’s final version comprises 23 Likert-scale items and 6 open-ended items.
Conclusion
Questionnaire validation can generally be a complex, time-consuming, and costly process, inhibiting many from engaging in proper validation practices. In this study, we demonstrate that a Bayesian questionnaire validation approach can be a simple, resource-efficient, yet rigorous solution to validating a tool for content and item-domain correlation through the elicitation of domain expert endorsement ratings.

Citations

Citations to this article as recorded by  
  • Reliability and construct validation of the Blended Learning Usability Evaluation–Questionnaire with interprofessional clinicians in Canada : a methodological study
    Anish Kumar Arora, Jeff Myers, Tavis Apramian, Kulamakan Kulasegaram, Daryl Bainbridge, Hsien Seow
    Journal of Educational Evaluation for Health Professions.2025; 22: 5.     CrossRef
The effect of simulation-based training on problem-solving skills, critical thinking skills, and self-efficacy among nursing students in Vietnam: a before-and-after study  
Tran Thi Hoang Oanh, Luu Thi Thuy, Ngo Thi Thu Huyen
J Educ Eval Health Prof. 2024;21:24.   Published online September 23, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.24
  • 2,029 View
  • 293 Download
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study investigated the effect of simulation-based training on nursing students’ problem-solving skills, critical thinking skills, and self-efficacy.
Methods
A single-group pretest and posttest study was conducted among 173 second-year nursing students at a public university in Vietnam from May 2021 to July 2022. Each student participated in the adult nursing preclinical practice course, which utilized a moderate-fidelity simulation teaching approach. Instruments including the Personal Problem-Solving Inventory Scale, Critical Thinking Skills Questionnaire, and General Self-Efficacy Questionnaire were employed to measure participants’ problem-solving skills, critical thinking skills, and self-efficacy. Data were analyzed using descriptive statistics and the paired-sample t-test with the significance level set at P<0.05.
Results
The mean score of the Personal Problem-Solving Inventory posttest (127.24±12.11) was lower than the pretest score (131.42±16.95), suggesting an improvement in the problem-solving skills of the participants (t172=2.55, P=0.011). There was no statistically significant difference in critical thinking skills between the pretest and posttest (P=0.854). Self-efficacy among nursing students showed a substantial increase from the pretest (27.91±5.26) to the posttest (28.71±3.81), with t172=-2.26 and P=0.025.
Conclusion
The results suggest that simulation-based training can improve problem-solving skills and increase self-efficacy among nursing students. Therefore, the integration of simulation-based training in nursing education is recommended.

Citations

Citations to this article as recorded by  
  • The Effect of Work-Based Learning on Employability Skills: The Role of Self-Efficacy and Vocational Identity
    Suyitno Suyitno, Muhammad Nurtanto, Dwi Jatmoko, Yuli Widiyono, Riawan Yudi Purwoko, Fuad Abdillah, Setuju Setuju, Yudan Hermawan
    European Journal of Educational Research.2025; 14(1): 309.     CrossRef
  • Interactive Success: Empowering Young Minds through Games-Based Learning at NADI PPR Intan Baiduri
    Mohamad Zaki Mohamad Saad, Shafinah Kamarudin, Zuraini Zukiffly, Siti Soleha Zuaimi
    Progress in Computers and Learning .2025; 2(1): 29.     CrossRef
Review
Immersive simulation in nursing and midwifery education: a systematic review  
Lahoucine Ben Yahya, Aziz Naciri, Mohamed Radid, Ghizlane Chemsi
J Educ Eval Health Prof. 2024;21:19.   Published online August 8, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.19
  • 4,123 View
  • 416 Download
  • 1 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Immersive simulation is an innovative training approach in health education that enhances student learning. This study examined its impact on engagement, motivation, and academic performance in nursing and midwifery students.
Methods
A comprehensive systematic search was meticulously conducted in 4 reputable databases—Scopus, PubMed, Web of Science, and Science Direct—following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. The research protocol was pre-registered in the PROSPERO registry, ensuring transparency and rigor. The quality of the included studies was assessed using the Medical Education Research Study Quality Instrument.
Results
Out of 90 identified studies, 11 were included in the present review, involving 1,090 participants. Four out of 5 studies observed high post-test engagement scores in the intervention groups. Additionally, 5 out of 6 studies that evaluated motivation found higher post-test motivational scores in the intervention groups than in control groups using traditional approaches. Furthermore, among the 8 out of 11 studies that evaluated academic performance during immersive simulation training, 5 reported significant differences (P<0.001) in favor of the students in the intervention groups.
Conclusion
Immersive simulation, as demonstrated by this study, has a significant potential to enhance student engagement, motivation, and academic performance, surpassing traditional teaching methods. This potential underscores the urgent need for future research in various contexts to better integrate this innovative educational approach into nursing and midwifery education curricula, inspiring hope for improved teaching methods.

Citations

Citations to this article as recorded by  
  • The power of moulage: Teaching with wound moulage simulations in nursing education
    Romaine Meichtry, Rabea Krings, Alina Klein, Monika Droz, Claudia Schlegel
    Teaching and Learning in Nursing.2025;[Epub]     CrossRef
  • NursingXR: Advancing Nursing Education Through Virtual Reality-Based Training
    Mohammad F. Obeid, Ahmed Ewais, Mohammad R. Asia
    Applied Sciences.2025; 15(6): 2949.     CrossRef
  • Application of Virtual Reality, Artificial Intelligence, and Other Innovative Technologies in Healthcare Education (Nursing and Midwifery Specialties): Challenges and Strategies
    Galya Georgieva-Tsaneva, Ivanichka Serbezova, Silvia Beloeva
    Education Sciences.2024; 15(1): 11.     CrossRef
Research article
Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman
J Educ Eval Health Prof. 2024;21:17.   Published online July 8, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.17
  • 2,398 View
  • 319 Download
  • 3 Web of Science
  • 4 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.

Citations

Citations to this article as recorded by  
  • Evaluating the Performance of ChatGPT4.0 Versus ChatGPT3.5 on the Hand Surgery Self-Assessment Exam: A Comparative Analysis of Performance on Image-Based Questions
    Kiera L Vrindten, Megan Hsu, Yuri Han, Brian Rust, Heili Truumees, Brian M Katt
    Cureus.2025;[Epub]     CrossRef
  • Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology
    Nadide Melike Sav
    Pediatric Nephrology.2025;[Epub]     CrossRef
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
  • Artificial Intelligence can Facilitate Application of Risk Stratification Algorithms to Bladder Cancer Patient Case Scenarios
    Max S Yudovich, Ahmad N Alzubaidi, Jay D Raman
    Clinical Medicine Insights: Oncology.2024;[Epub]     CrossRef
Educational/Faculty development material
The 6 degrees of curriculum integration in medical education in the United States  
Julie Youm, Jennifer Christner, Kevin Hittle, Paul Ko, Cinda Stone, Angela D. Blood, Samara Ginzburg
J Educ Eval Health Prof. 2024;21:15.   Published online June 13, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.15
  • 3,764 View
  • 442 Download
AbstractAbstract PDFSupplementary Material
Despite explicit expectations and accreditation requirements for integrated curriculum, there needs to be more clarity around an accepted common definition, best practices for implementation, and criteria for successful curriculum integration. To address the lack of consensus surrounding integration, we reviewed the literature and herein propose a definition for curriculum integration for the medical education audience. We further believe that medical education is ready to move beyond “horizontal” (1-dimensional) and “vertical” (2-dimensional) integration and propose a model of “6 degrees of curriculum integration” to expand the 2-dimensional concept for future designs of medical education programs and best prepare learners to meet the needs of patients. These 6 degrees include: interdisciplinary, timing and sequencing, instruction and assessment, incorporation of basic and clinical sciences, knowledge and skills-based competency progression, and graduated responsibilities in patient care. We encourage medical educators to look beyond 2-dimensional integration to this holistic and interconnected representation of curriculum integration.
Research article
Redesigning a faculty development program for clinical teachers in Indonesia: a before-and-after study
Rita Mustika, Nadia Greviana, Dewi Anggraeni Kusumoningrum, Anyta Pinasthika
J Educ Eval Health Prof. 2024;21:14.   Published online June 13, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.14
  • 1,471 View
  • 309 Download
AbstractAbstract PDFSupplementary Material
Purpose
Faculty development (FD) is important to support teaching, including for clinical teachers. Faculty of Medicine Universitas Indonesia (FMUI) has conducted a clinical teacher training program developed by the medical education department since 2008, both for FMUI teachers and for those at other centers in Indonesia. However, participation is often challenging due to clinical, administrative, and research obligations. The coronavirus disease 2019 pandemic amplified the urge to transform this program. This study aimed to redesign and evaluate an FD program for clinical teachers that focuses on their needs and current situation.
Methods
A 5-step design thinking framework (empathizing, defining, ideating, prototyping, and testing) was used with a pre/post-test design. Design thinking made it possible to develop a participant-focused program, while the pre/post-test design enabled an assessment of the program’s effectiveness.
Results
Seven medical educationalists and 4 senior and 4 junior clinical teachers participated in a group discussion in the empathize phase of design thinking. The research team formed a prototype of a 3-day blended learning course, with an asynchronous component using the Moodle learning management system and a synchronous component using the Zoom platform. Pre-post-testing was done in 2 rounds, with 107 and 330 participants, respectively. Evaluations of the first round provided feedback for improving the prototype for the second round.
Conclusion
Design thinking enabled an innovative-creative process of redesigning FD that emphasized participants’ needs. The pre/post-testing showed that the program was effective. Combining asynchronous and synchronous learning expands access and increases flexibility. This approach could also apply to other FD programs.

JEEHP : Journal of Educational Evaluation for Health Professions
TOP