Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
19 "Psychometrics"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research articles
Evaluation of an infectious‑disease response training program for primary‑care physicians using Kirkpatrick’s four levels and the Context, Input, Process, and Product model: a mixed‑methods study
Kyung Hee Chun, Jin Seo Lee, Sun Young Jeong, Young Soon Park
J Educ Eval Health Prof. 2025;22:40.   Published online December 31, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.40    [Epub ahead of print]
  • 344 View
  • 86 Download
AbstractAbstract PDF
Purpose
This study systematically evaluated the effectiveness of a training program to enhance infectious disease response capabilities among primary care physicians. Employing a mixed-methods design, the evaluation utilized Kirkpatrick’s four-level model and the Context, Input, Process, and Product (CIPP) framework to assess the program and propose improvements.
Methods
The study targeted a 2022 national infectious disease training program for primary care physicians (N=1,718). We analyzed quantitative pre- and post-training data from 100 randomly selected participants and qualitative data from in-depth interviews with 10 participants. Validated tools, developed by psychometrics and content experts, were used to measure satisfaction (Kirkpatrick Level 1), learning achievement (Level 2), practical application (Level 3), and organizational contribution (Level 4).
Results
Overall training satisfaction was high (3.96±0.72). Learning achievement (Level 2) showed statistically significant improvements from pre-training to post-training assessments (F = 12.922, p < 0.001). Practical application (Level 3; 3.19±0.86) and organizational contribution (Level 4; 3.47±0.70) scores indicated motivation to apply learning and institutional readiness for response strategies.
Conclusion
This study confirmed that the training effectively enhanced individual competency and organizational response capacity across all four Kirkpatrick levels. The integrated application of Kirkpatrick and CIPP models provided a robust framework for assessing learning transfer and guiding program improvement. These findings underscore the need for continued efforts to develop diverse training programs, conduct systematic evaluations, and disseminate successful outcomes to the wider healthcare community.
Development and psychometric assessment of a scale for evaluating healthcare professionals’ attitudes toward interprofessional education and collaboration in the United States: a cross-sectional study  
Michael Christopher Banks, Ryan Brock Mutcheson, Maedot Ariaya Haymete, Serkan Toy
J Educ Eval Health Prof. 2025;22:32.   Published online October 20, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.32
  • 1,326 View
  • 184 Download
AbstractAbstract PDFSupplementary Material
Purpose
Interprofessional education (IPE) is increasingly recognized as critical to preparing health professionals for collaborative practice, yet rigorous assessment remains limited by a lack of psychometrically sound instruments. Building on a previously developed questionnaire for physicians, this study aimed to expand the scale to include allied health professionals and to evaluate whether the factor structure remained consistent across professions. We hypothesized that a similar factor structure would emerge from the combined dataset, thereby supporting the scale’s generalizability.
Methods
This observational study included 930 healthcare professionals in the United States (379 physicians, 419 nurses, 76 pharmacists, and others) who completed a 35-item questionnaire addressing IPE competency domains. Data were collected between December 2019 and May 2020. Exploratory factor analysis was employed to examine the factor structure, followed by item response theory (IRT) analyses to assess item fit, reliability, and validity. Raw data are available upon request.
Results
Factor analysis of 22 retained items confirmed a 5-factor solution: teamwork and communication, patient-centered care, roles and responsibilities, ethics and attitudes, and reflective practice, explaining 59% of the variance. Subscale reliabilities ranged from α=0.65 to 0.87. IRT analyses supported construct validity and measurement precision, while identifying areas for refinement in reflective practice.
Conclusion
This study demonstrates that the scale is reliable, valid, and generalizable across diverse health professions. It provides a robust tool for assessing attitudes toward IPE, offering value for curriculum evaluation, institutional benchmarking, and future longitudinal research on professional identity formation and collaborative practice.
The impact of differential item functioning on ability estimation using the Korean Medical Licensing Examination with computerized adaptive testing: a post-hoc simulation study  
Dogyeong Kim, Jeongwook Choi, Dong Gi Seo
J Educ Eval Health Prof. 2025;22:31.   Published online October 10, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.31
  • 2,118 View
  • 155 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study examined the impact of differential item functioning (DIF) on ability estimation in a computerized adaptive testing (CAT) environment using real response data from the 2017 Korean Medical Licensing Examination (KMLE). We hypothesized that excluding gender-based DIF items would improve estimation accuracy, particularly for examinees at the extremes of the ability scale.
Methods
The study was conducted in 2 steps: (1) DIF detection and (2) post-hoc simulation. The analysis used data from 3,259 examinees who completed all 360 dichotomous items. Gender-based DIF was detected with the residual-based DIF method (reference group: males; focal group: females). Two CAT conditions (all items vs. DIF-excluded) were compared against a “true θ” estimated from a fixed-form test of 264 non-DIF items. Accuracy was evaluated using bias, root mean square error (RMSE), and correlation with true θ.
Results
In the CAT condition excluding DIF items, accuracy improved, with RMSE reduced and correlation with true θ increased. However, bias was slightly larger in magnitude. Gender-specific analyses showed that DIF removal reduced the underestimation of female ability but increased the underestimation of male ability, yielding estimates that were fairer across genders. When DIF items were included, estimation errors were more pronounced at both low and high ability levels.
Conclusion
Managing DIF in CAT-based high-stakes examinations can enhance fairness and precision. Using real examinee data, this study provides practical evidence of the implications of DIF for CAT-based measurement and supports fairness-oriented test design.
Technical report
Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study
Jeongwook Choi, Sung-Soo Jung, Eun Kwang Choi, Kyung Sik Kim, Dong Gi Seo
J Educ Eval Health Prof. 2025;22:29.   Published online October 1, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.29
  • 1,061 View
  • 168 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to investigate the feasibility of transitioning the Clinical Medical Science Comprehensive Examination (CMSCE) to computerized adaptive testing (CAT) in Korea, thereby providing greater opportunities for medical students to accurately compare their clinical competencies with peers nationwide and to monitor their own progress.
Methods
A medical self-assessment using CAT was conducted from March to June 2023, involving 1,541 medical students who volunteered from 40 medical colleges in Korea. An item bank consisting of 1,145 items from previously administered CMSCE examinations (2019–2021) hosted by the Medical Education Assessment Corporation was established. Items were selected through 2-stage filtering, based on classical test theory (discrimination index above 0.15) and item response theory (discrimination parameter estimates above 0.6 and difficulty parameter estimates between –5 and +5). Maximum Fisher information was employed as the item selection method, and maximum likelihood estimation was used for ability estimation.
Results
The CAT was successfully administered without significant issues. The stopping rule was set at a standard error of measurement of 0.25, with a maximum of 50 items for ability estimation. The mean ability score was 0.55, with an average of 28 items administered per student. Students at extreme ability levels reached the maximum of 50 items due to the limited availability of items at appropriate difficulty levels.
Conclusion
The medical self-assessment CAT, the first of its kind in Korea, was successfully implemented nationwide without significant problems. These results indicate strong potential for expanding the use of CAT in medical education assessments.
Research article
Comparing generative artificial intelligence platforms and nursing student performance on a women’s health nursing examination in Korea: a Rasch model approach  
Eun Jeong Ko, Tae Kyung Lee, Geum Hee Jeong
J Educ Eval Health Prof. 2025;22:23.   Published online September 5, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.23
  • 1,374 View
  • 174 Download
  • 1 Web of Science
AbstractAbstract PDFSupplementary Material
Purpose
This psychometric study aimed to compare the ability parameter estimates of generative artificial intelligence (AI) platforms with those of nursing students on a 50-item women’s health nursing examination at Hallym University, Korea, using the Rasch model. It also sought to estimate item difficulty parameters and evaluate AI performance across varying difficulty levels.
Methods
The exam, consisting of 39 multiple-choice items and 11 true/false items, was administered to 111 fourth-year nursing students in June 2023. In December 2024, 6 generative AI platforms (GPT-4o, ChatGPT free version, Claude.ai, Clova X, Mistral.ai, Google Gemini) completed the same items. The responses were analyzed using the Rasch model to estimate the ability and difficulty parameters. Unidimensionality was verified by the Dimensionality Evaluation to Enumerate Contributing Traits (DETECT), and analyses were conducted using the R packages irtQ and TAM.
Results
The items satisfied unidimensionality (DETECT=–0.16). Item difficulty parameter estimates ranged from –3.87 to 1.96 logits (mean=–0.61), with a mean difficulty index of 0.79. Examinees’ ability parameter estimates ranged from –0.71 to 3.15 logits (mean=1.17). GPT-4o, ChatGPT free version, and Claude.ai outperformed the median student ability (1.09 logits), scoring 2.68, 2.34, and 2.34, respectively, while Clova X, Mistral.ai, and Google Gemini exhibited lower scores (0.20, –0.12, 0.80). The test information curve peaked below θ=0, indicating suitability for examinees with low to average ability.
Conclusion
Advanced generative AI platforms approximated the performance of high-performing students, but outcomes varied. The Rasch model effectively evaluated AI competency, supporting its potential utility for future AI performance assessments in nursing education.
Software report
Special article on the 20th anniversary of the journal
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration  
Hwanggyu Lim, Kyungseok Kang
J Educ Eval Health Prof. 2024;21:23.   Published online September 12, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.23
  • 4,826 View
  • 304 Download
  • 3 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Computerized adaptive testing (CAT) has become a widely adopted test design for high-stakes licensing and certification exams, particularly in the health professions in the United States, due to its ability to tailor test difficulty in real time, reducing testing time while providing precise ability estimates. A key component of CAT is item response theory (IRT), which facilitates the dynamic selection of items based on examinees' ability levels during a test. Accurate estimation of item and ability parameters is essential for successful CAT implementation, necessitating convenient and reliable software to ensure precise parameter estimation. This paper introduces the irtQ R package (http://CRAN.R-project.org/), which simplifies IRT-based analysis and item calibration under unidimensional IRT models. While it does not directly simulate CAT, it provides essential tools to support CAT development, including parameter estimation using marginal maximum likelihood estimation via the expectation-maximization algorithm, pretest item calibration through fixed item parameter calibration and fixed ability parameter calibration methods, and examinee ability estimation. The package also enables users to compute item and test characteristic curves and information functions necessary for evaluating the psychometric properties of a test. This paper illustrates the key features of the irtQ package through examples using simulated datasets, demonstrating its utility in IRT applications such as test data analysis and ability scoring. By providing a user-friendly environment for IRT analysis, irtQ significantly enhances the capacity for efficient adaptive testing research and operations. Finally, the paper highlights additional core functionalities of irtQ, emphasizing its broader applicability to the development and operation of IRT-based assessments.

Citations

Citations to this article as recorded by  
  • Development of a CAT based Diagnostic System for Assessing Basic Academic Skills in Undergraduate Students
    Woo-Jin Han, Jeongwook Choi, Dong-Gi Seo
    The Korean Association of General Education.2025; 19(3): 177.     CrossRef
  • Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study
    Jeongwook Choi, Sung-Soo Jung, Eun Kwang Choi, Kyung Sik Kim, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2025; 22: 29.     CrossRef
Research articles
Special article on the 20th anniversary of the journal
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study  
Dong Gi Seo, Jeongwook Choi, Jinha Kim
J Educ Eval Health Prof. 2024;21:18.   Published online July 9, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.18
  • 3,219 View
  • 360 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under 2 stopping rules (standard error of measurement [SEM]=0.3 and 0.25) using both real and simulated data in medical examinations in Korea.
Methods
This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees’ passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules.
Results
Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/fail outcomes between the 2 SEM conditions, with a high correlation (r=0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data.
Conclusion
The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.

Citations

Citations to this article as recorded by  
  • AI-enhanced adaptive testing with cognitive diagnostic feedback and its association with performance in undergraduate surgical education: a pilot study
    Nuno Silva Gonçalves, Carlos Collares, José Miguel Pêgo
    Frontiers in Behavioral Neuroscience.2026;[Epub]     CrossRef
  • Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study
    Jeongwook Choi, Sung-Soo Jung, Eun Kwang Choi, Kyung Sik Kim, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2025; 22: 29.     CrossRef
Development and psychometric evaluation of a 360-degree evaluation instrument to assess medical students’ performance in clinical settings at the emergency medicine department in Iran: a methodological study  
Golnaz Azami, Sanaz Aazami, Boshra Ebrahimy, Payam Emami
J Educ Eval Health Prof. 2024;21:7.   Published online April 1, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.7
  • 4,686 View
  • 321 Download
AbstractAbstract PDFSupplementary Material
Background
In the Iranian context, no 360-degree evaluation tool has been developed to assess the performance of prehospital medical emergency students in clinical settings. This article describes the development of a 360-degree evaluation tool and presents its first psychometric evaluation.
Methods
There were 2 steps in this study: step 1 involved developing the instrument (i.e., generating the items) and step 2 constituted the psychometric evaluation of the instrument. We performed exploratory and confirmatory factor analyses and also evaluated the instrument’s face, content, and convergent validity and reliability.
Results
The instrument contains 55 items across 6 domains, including leadership, management, and teamwork (19 items), consciousness and responsiveness (14 items), clinical and interpersonal communication skills (8 items), integrity (7 items), knowledge and accountability (4 items), and loyalty and transparency (3 items). The instrument was confirmed to be a valid measure, as the 6 domains had eigenvalues over Kaiser’s criterion of 1 and in combination explained 60.1% of the variance (Bartlett’s test of sphericity [1,485]=19,867.99, P<0.01). Furthermore, this study provided evidence for the instrument’s convergent validity and internal consistency (α=0.98), suggesting its suitability for assessing student performance.
Conclusion
We found good evidence for the validity and reliability of the instrument. Our instrument can be used to make future evaluations of student performance in the clinical setting more structured, transparent, informative, and comparable.
Experience of introducing an electronic health records station in an objective structured clinical examination to evaluate medical students’ communication skills in Canada: a descriptive study  
Kuan-chin Jean Chen, Ilona Bartman, Debra Pugh, David Topps, Isabelle Desjardins, Melissa Forgie, Douglas Archibald
J Educ Eval Health Prof. 2023;20:22.   Published online July 4, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.22
  • 6,380 View
  • 183 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
There is limited literature related to the assessment of electronic medical record (EMR)-related competencies. To address this gap, this study explored the feasibility of an EMR objective structured clinical examination (OSCE) station to evaluate medical students’ communication skills by psychometric analyses and standardized patients’ (SPs) perspectives on EMR use in an OSCE.
Methods
An OSCE station that incorporated the use of an EMR was developed and pilot-tested in March 2020. Students’ communication skills were assessed by SPs and physician examiners. Students’ scores were compared between the EMR station and 9 other stations. A psychometric analysis, including item total correlation, was done. SPs participated in a post-OSCE focus group to discuss their perception of EMRs’ effect on communication.
Results
Ninety-nine 3rd-year medical students participated in a 10-station OSCE that included the use of the EMR station. The EMR station had an acceptable item total correlation (0.217). Students who leveraged graphical displays in counseling received higher OSCE station scores from the SPs (P=0.041). The thematic analysis of SPs’ perceptions of students’ EMR use from the focus group revealed the following domains of themes: technology, communication, case design, ownership of health information, and timing of EMR usage.
Conclusion
This study demonstrated the feasibility of incorporating EMR in assessing learner communication skills in an OSCE. The EMR station had acceptable psychometric characteristics. Some medical students were able to efficiently use the EMRs as an aid in patient counseling. Teaching students how to be patient-centered even in the presence of technology may promote engagement.

Citations

Citations to this article as recorded by  
  • Medical students’ perspectives on the role of OSPE and OSCE in the educational journey and contribution to career development: A cross-sectional study
    Fahad Abdulaziz Alrashed, Tauseef Ahmad, Abdulrahman M. Alsubiheen, Saad A. Alhammad, Mishal M. Aldaihan, Alaa M. Albishi, Zafrul Hasan
    Medicine.2026; 105(3): e47233.     CrossRef
  • Usage and perception of electronic medical records (EMR) among medical students in southwestern Nigeria
    A. A. Adeyeye, A. O. Ajose, O. M. Oduola, B. A. Akodu, A. Olufadeji
    Discover Public Health.2024;[Epub]     CrossRef
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study  
Yera Hur, Dong Gi Seo
J Educ Eval Health Prof. 2023;20:20.   Published online June 26, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.20
  • 4,247 View
  • 148 Download
  • 1 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to develop a test scale to measure the character qualities of medical students as a follow-up study on the 8 core character qualities revealed in a previous report.
Methods
In total, 160 preliminary items were developed to measure 8 core character qualities. Twenty questions were assigned to each quality, and a questionnaire survey was conducted among 856 students in 5 medical schools in Korea. Using the partial credit model, polytomous item response theory analysis was carried out to analyze the goodness-of-fit, followed by exploratory factor analysis. Finally, confirmatory factor and reliability analyses were conducted with the final selected items.
Results
The preliminary items for the 8 core character qualities were administered to the participants. Data from 767 students were included in the final analysis. Of the 160 preliminary items, 25 were removed by classical test theory analysis and 17 more by polytomous item response theory assessment. A total of 118 items and sub-factors were selected for exploratory factor analysis. Finally, 79 items were selected, and the validity and reliability were confirmed through confirmatory factor analysis and intra-item relevance analysis.
Conclusion
The character qualities test scale developed through this study can be used to measure the character qualities corresponding to the educational goals and visions of individual medical schools in Korea. Furthermore, this measurement tool can serve as primary data for developing character qualities tools tailored to each medical school’s vision and educational goals.

Citations

Citations to this article as recorded by  
  • Development and validation of a fall health literacy scale for Chinese hospitals from the perspective of older adults
    Tianxin Miao, Ke Chen, Dianli Han, Yingna Zhao, Liran Duan, Lan Zhang, Ying Yao
    Frontiers in Public Health.2025;[Epub]     CrossRef
  • The Values of Local Wisdom in the Jong Racing Tradition as a Means of Character Education for Students
    Rika Komalasari, Pittanauli Sialagan, Ima Turyani, Zaitun Zaitun, Tety Kurmalasari, J. bin Surif, G.M. Jacobs, D. Wei Dai, M.V. Reddy, T. Yamamoto, H. Pardi
    SHS Web of Conferences.2024; 205: 03005.     CrossRef
Acceptability of the 8-case objective structured clinical examination of medical students in Korea using generalizability theory: a reliability study  
Song Yi Park, Sang-Hwa Lee, Min-Jeong Kim, Ki-Hwan Ji, Ji Ho Ryu
J Educ Eval Health Prof. 2022;19:26.   Published online September 8, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.26
  • 5,378 View
  • 247 Download
  • 1 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study investigated whether the reliability was acceptable when the number of cases in the objective structured clinical examination (OSCE) decreased from 12 to 8 using generalizability theory (GT).
Methods
This psychometric study analyzed the OSCE data of 439 fourth-year medical students conducted in the Busan and Gyeongnam areas of South Korea from July 12 to 15, 2021. The generalizability study (G-study) considered 3 facets—students (p), cases (c), and items (i)—and designed the analysis as p×(i:c) due to items being nested in a case. The acceptable generalizability (G) coefficient was set to 0.70. The G-study and decision study (D-study) were performed using G String IV ver. 6.3.8 (Papawork, Hamilton, ON, Canada).
Results
All G coefficients except for July 14 (0.69) were above 0.70. The major sources of variance components (VCs) were items nested in cases (i:c), from 51.34% to 57.70%, and residual error (pi:c), from 39.55% to 43.26%. The proportion of VCs in cases was negligible, ranging from 0% to 2.03%.
Conclusion
The case numbers decreased in the 2021 Busan and Gyeongnam OSCE. However, the reliability was acceptable. In the D-study, reliability was maintained at 0.70 or higher if there were more than 21 items/case in 8 cases and more than 18 items/case in 9 cases. However, according to the G-study, increasing the number of items nested in cases rather than the number of cases could further improve reliability. The consortium needs to maintain a case bank with various items to implement a reliable blueprinting combination for the OSCE.

Citations

Citations to this article as recorded by  
  • From Agents to Governance: Essential AI Skills for Clinicians in the Large Language Model Era
    Weiping Cao, Qing Zhang, Jialin Liu, Siru Liu
    Journal of Medical Internet Research.2026; 28: e86550.     CrossRef
  • Applying the Generalizability Theory to Identify the Sources of Validity Evidence for the Quality of Communication Questionnaire
    Flávia Del Castanhel, Fernanda R. Fonseca, Luciana Bonnassis Burg, Leonardo Maia Nogueira, Getúlio Rodrigues de Oliveira Filho, Suely Grosseman
    American Journal of Hospice and Palliative Medicine®.2024; 41(7): 792.     CrossRef
The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examination  
Dong Gi Seo, Jae Kum Kim
J Educ Eval Health Prof. 2021;18:15.   Published online July 5, 2021
DOI: https://doi.org/10.3352/jeehp.2021.18.15
  • 6,683 View
  • 310 Download
  • 2 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Diagnostic classification models (DCMs) were developed to identify the mastery or non-mastery of the attributes required for solving test items, but their application has been limited to very low-level attributes, and the accuracy and consistency of high-level attributes using DCMs have rarely been reported compared with classical test theory (CTT) and item response theory models. This paper compared the accuracy of high-level attribute mastery between deterministic inputs, noisy “and” gate (DINA) and Rasch models, along with sub-scores based on CTT.
Methods
First, a simulation study explored the effects of attribute length (number of items per attribute) and the correlations among attributes with respect to the accuracy of mastery. Second, a real-data study examined model and item fit and investigated the consistency of mastery for each attribute among the 3 models using the 2017 Korean Medical Licensing Examination with 360 items.
Results
Accuracy of mastery increased with a higher number of items measuring each attribute across all conditions. The DINA model was more accurate than the CTT and Rasch models for attributes with high correlations (>0.5) and few items. In the real-data analysis, the DINA and Rasch models generally showed better item fits and appropriate model fit. The consistency of mastery between the Rasch and DINA models ranged from 0.541 to 0.633 and the correlations of person attribute scores between the Rasch and DINA models ranged from 0.579 to 0.786.
Conclusion
Although all 3 models provide a mastery decision for each examinee, the individual mastery profile using the DINA model provides more accurate decisions for attributes with high correlations than the CTT and Rasch models. The DINA model can also be directly applied to tests with complex structures, unlike the CTT and Rasch models, and it provides different diagnostic information from the CTT and Rasch models.

Citations

Citations to this article as recorded by  
  • Stable Knowledge Tracing Using Causal Inference
    Jia Zhu, Xiaodong Ma, Changqin Huang
    IEEE Transactions on Learning Technologies.2024; 17: 124.     CrossRef
  • Just When You Thought that Quantitizing Merely Involved Counting: A Renewed Call for Expanding the Practice of Quantitizing in Mixed Methods Research With a Focus on Measurement-Based Quantitizing
    Tony Onwuegbuzie
    Journal of Mixed Methods Studies.2024; (10): 99.     CrossRef
  • Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study
    Yera Hur, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2023; 20: 20.     CrossRef
Development and validation of a measurement scale to assess nursing students’ readiness for the flipped classroom in Sri Lanka  
Punithalingam Youhasan, Yan Chen, Mataroria Lyndon, Marcus Alexander Henning
J Educ Eval Health Prof. 2020;17:41.   Published online December 14, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.41
  • 9,702 View
  • 312 Download
  • 10 Web of Science
  • 8 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The aim of this study was to develop and validate a scale to measure nursing students’ readiness for the flipped classroom in Sri Lanka.
Methods
A literature review provided the theoretical framework for developing the Nursing Students’ Readiness for Flipped Classroom (NSR-FC) questionnaire. Five content experts evaluated the NSR-FC, and content validity indices (CVI) were calculated. Cross-sectional surveys among 355 undergraduate nursing students from 3 state universities in Sri Lanka were carried out to assess the psychometric properties of the NSR-FC. Principal component analysis (PCA, n=265), internal consistency (using the Cronbach α coefficient, n=265), and confirmatory factor analysis (CFA, n=90) were done to test construct validity and reliability.
Results
Thirty-seven items were included in the NSR-FC for content validation, resulting in an average scale CVI of 0.94. Two items received item level CVI of less than 0.78. The factor structures of the 35 items were explored through PCA with orthogonal factor rotation, culminating in the identification of 5 factors. These factors were classified as technological readiness, environmental readiness, personal readiness, pedagogical readiness, and interpersonal readiness. The NSR-FC also showed an overall acceptable level of internal consistency (Cronbach α=0.9). CFA verified a 4-factor model (excluding the interpersonal readiness factor) and 20 items that achieved acceptable fit (standardized root mean square residual=0.08, root mean square error of approximation=0.08, comparative fit index=0.87, and χ2/degrees of freedom=1.57).
Conclusion
The NSR-FC, as a 4-factor model, is an acceptable measurement scale for assessing nursing students’ readiness for the flipped classroom in terms of its construct validity and reliability.

Citations

Citations to this article as recorded by  
  • AI readiness scale for teachers: Development and validation
    Mehmet Ramazanoglu, Tayfun Akın
    Education and Information Technologies.2025; 30(6): 6869.     CrossRef
  • Design and validation of a preliminary instrument to contextualize interactions through information technologies of health professionals
    José Fidencio López Luna, Eddie Nahúm Armendáriz Mireles, Marco Aurelio Nuño Maganda, Hiram Herrera Rivas, Rubén Machucho Cadena, Jorge Arturo Hernández Almazán
    Health Informatics Journal.2024;[Epub]     CrossRef
  • Content validity of the Constructivist Learning in Higher Education Settings (CLHES) scale in the context of the flipped classroom in higher education
    Turki Mesfer Alqahtani, Farrah Dina Yusop, Siti Hajar Halili
    Humanities and Social Sciences Communications.2023;[Epub]     CrossRef
  • The intensivist's assessment of gastrointestinal function: A pilot study
    Varsha M. Asrani, Colin McArthur, Ian Bissett, John A. Windsor
    Australian Critical Care.2022; 35(6): 636.     CrossRef
  • Psychometric evidence of a perception scale about covid-19 vaccination process in Peruvian dentists: a preliminary validation
    César F. Cayo-Rojas, Nancy Córdova-Limaylla, Gissela Briceño-Vergel, Marysela Ladera-Castañeda, Hernán Cachay-Criado, Carlos López-Gurreonero, Alberto Cornejo-Pinto, Luis Cervantes-Ganoza
    BMC Health Services Research.2022;[Epub]     CrossRef
  • Implementation of a Web-Based Educational Intervention for Promoting Flipped Classroom Pedagogy: A Mixed-Methods Study
    Punithalingam Youhasan, Mataroria P. Lyndon, Yan Chen, Marcus A. Henning
    Medical Science Educator.2022; 33(1): 91.     CrossRef
  • Assess the feasibility of flipped classroom pedagogy in undergraduate nursing education in Sri Lanka: A mixed-methods study
    Punithalingam Youhasan, Yan Chen, Mataroria Lyndon, Marcus A. Henning, Gwo-Jen Hwang
    PLOS ONE.2021; 16(11): e0259003.     CrossRef
  • Newly appointed medical faculty members’ self-evaluation of their educational roles at the Catholic University of Korea College of Medicine in 2020 and 2021: a cross-sectional survey-based study
    Sun Kim, A Ra Cho, Chul Woon Chung
    Journal of Educational Evaluation for Health Professions.2021; 18: 28.     CrossRef
Software report
Introduction to the LIVECAT web-based computerized adaptive testing platform  
Dong Gi Seo, Jeongwook Choi
J Educ Eval Health Prof. 2020;17:27.   Published online September 29, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.27
  • 8,235 View
  • 158 Download
  • 7 Web of Science
  • 9 Crossref
AbstractAbstract PDFSupplementary Material
This study introduces LIVECAT, a web-based computerized adaptive testing platform. This platform provides many functions, including writing item content, managing an item bank, creating and administering a test, reporting test results, and providing information about a test and examinees. The LIVECAT provides examination administrators with an easy and flexible environment for composing and managing examinations. It is available at http://www.thecatkorea.com/. Several tools were used to program LIVECAT, as follows: operating system, Amazon Linux; web server, nginx 1.18; WAS, Apache Tomcat 8.5; database, Amazon RDMS—Maria DB; and languages, JAVA8, HTML5/CSS, Javascript, and jQuery. The LIVECAT platform can be used to implement several item response theory (IRT) models such as the Rasch and 1-, 2-, 3-parameter logistic models. The administrator can choose a specific model of test construction in LIVECAT. Multimedia data such as images, audio files, and movies can be uploaded to items in LIVECAT. Two scoring methods (maximum likelihood estimation and expected a posteriori) are available in LIVECAT and the maximum Fisher information item selection method is applied to every IRT model in LIVECAT. The LIVECAT platform showed equal or better performance compared with a conventional test platform. The LIVECAT platform enables users without psychometric expertise to easily implement and perform computerized adaptive testing at their institutions. The most recent LIVECAT version only provides a dichotomous item response model and the basic components of CAT. Shortly, LIVECAT will include advanced functions, such as polytomous item response models, weighted likelihood estimation method, and content balancing method.

Citations

Citations to this article as recorded by  
  • A Systematic Review on Computerized Adaptive Testing
    Hümeyra Demir, Selahattin Gelbal
    Erzincan Üniversitesi Eğitim Fakültesi Dergisi.2025; 27(1): 137.     CrossRef
  • Development of a CAT based Diagnostic System for Assessing Basic Academic Skills in Undergraduate Students
    Woo-Jin Han, Jeongwook Choi, Dong-Gi Seo
    The Korean Association of General Education.2025; 19(3): 177.     CrossRef
  • Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study
    Jeongwook Choi, Sung-Soo Jung, Eun Kwang Choi, Kyung Sik Kim, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2025; 22: 29.     CrossRef
  • Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study
    Dong Gi Seo, Jeongwook Choi, Jinha Kim
    Journal of Educational Evaluation for Health Professions.2024; 21: 18.     CrossRef
  • Educational Technology in the University: A Comprehensive Look at the Role of a Professor and Artificial Intelligence
    Cheolkyu Shin, Dong Gi Seo, Seoyeon Jin, Soo Hwa Lee, Hyun Je Park
    IEEE Access.2024; 12: 116727.     CrossRef
  • The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration
    Hwanggyu Lim, Kyungseok Kang
    Journal of Educational Evaluation for Health Professions.2024; 21: 23.     CrossRef
  • Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea
    Hyunjoo Pai
    Journal of Educational Evaluation for Health Professions.2023; 20: 8.     CrossRef
  • Patient-reported outcome measures in cancer care: Integration with computerized adaptive testing
    Minyu Liang, Zengjie Ye
    Asia-Pacific Journal of Oncology Nursing.2023; 10(12): 100323.     CrossRef
  • Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study
    Yera Hur, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2023; 20: 20.     CrossRef
Research article
Correlations between moral courage scores and social desirability scores among medical residents and fellows in Argentina  
Raúl Alfredo Borracci, Graciana Ciambrone, José María Alvarez Gallesio
J Educ Eval Health Prof. 2020;17:6.   Published online February 18, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.6
  • 9,272 View
  • 206 Download
  • 6 Web of Science
  • 6 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Moral courage refers to the conviction to take action on one’s ethical beliefs despite the risk of adverse consequences. This study aimed to evaluate correlations between social desirability scores and moral courage scores among medical residents and fellows, and to explore gender- and specialty-based differences in moral courage scores.
Methods
In April 2018, the Moral Courage Scale for Physicians (MCSP), the Professional Moral Courage (PMC) scale and the Marlowe-Crowne scale to measure social desirability were administered to 87 medical residents from Hospital Alemán in Buenos Aires, Argentina.
Results
The Cronbach α coefficients were 0.78, 0.74, and 0.81 for the Marlowe-Crowne, MCSP, and PMC scales, respectively. Correlation analysis showed that moral courage scores were weakly correlated with social desirability scores, while both moral courage scales were strongly correlated with each other. Physicians who were training in a surgical specialty showed lower moral courage scores than nonsurgical specialty trainees, and men from any specialty tended to have lower moral courage scores than women. Specifically, individuals training in surgical specialties ranked lower on assessments of the “multiple values,” “endurance of threats,” and “going beyond compliance” dimensions of the PMC scale. Men tended to rank lower than women on the “multiple values,” “moral goals,” and “endurance of threats” dimensions.
Conclusion
There was a poor correlation between 2 validated moral courage scores and social desirability scores among medical residents and fellows in Argentina. Conversely, both moral courage tools showed a close correlation and concordance, suggesting that these scales are reasonably interchangeable.

Citations

Citations to this article as recorded by  
  • Cross-cultural adaptation, reliability, and validity of the Turkish version of moral courage scale for physicians
    Şerife Yılmaz, Gamze Özbek Güven, Feyza İnceoğlu, Othman A. Alfuqaha
    PLOS One.2025; 20(10): e0333598.     CrossRef
  • Moral courage level of nurses: a systematic review and meta-analysis
    Hang Li, JuLan Guo, ZhiRong Ren, Dingxi Bai, Jing Yang, Wei Wang, Han Fu, Qing Yang, Chaoming Hou, Jing Gao
    BMC Nursing.2024;[Epub]     CrossRef
  • CESARET NEDİR? CESARET TANIMLARININ İÇERİK ANALİZİ
    İbrahim Sani MERT
    Uluslararası İktisadi ve İdari Bilimler Dergisi.2023; 9(2): 126.     CrossRef
  • The Impact of Active Bystander Training on Officer Confidence and Ability to Address Ethical Challenges
    Travis Taniguchi, Heather Vovak, Gary Cordner, Karen Amendola, Yukun Yang, Katherine Hoogesteyn, Martin Bartness
    Policing: A Journal of Policy and Practice.2022; 16(3): 508.     CrossRef
  • The Role of Academic Medicine in the Call for Justice
    Danielle Laraque-Arena, Ilene Fennoy, Leslie L. Davidson
    Journal of the National Medical Association.2021; 113(4): 388.     CrossRef
  • Can Careproviders Still Bond with Patients after They Are Turned Down for a Treatment They Need?
    Edmund G. Howe
    The Journal of Clinical Ethics.2021; 32(3): 185.     CrossRef
Review article
Overview and current management of computerized adaptive testing in licensing/certification examinations  
Dong Gi Seo
J Educ Eval Health Prof. 2017;14:17.   Published online July 26, 2017
DOI: https://doi.org/10.3352/jeehp.2017.14.17
  • 41,908 View
  • 400 Download
  • 16 Web of Science
  • 16 Crossref
AbstractAbstract PDF
Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.

Citations

Citations to this article as recorded by  
  • From Development to Validation: Exploring the Efficiency of Numetrive, a Computerized Adaptive Assessment of Numerical Reasoning
    Marianna Karagianni, Ioannis Tsaousis
    Behavioral Sciences.2025; 15(3): 268.     CrossRef
  • Global harmonization in advanced therapeutics: balancing innovation, safety, and access
    Ankit Dahiya, Kartikey Singh, Anunav Ashish, Nipun, Aayush Bhadyaria, Shubham Thakur, Manish Kumar, Ghanshyam Das Gupta, Balak Das Kurmi, Ravi Raj Pal
    Personalized Medicine.2025; 22(3): 181.     CrossRef
  • Development of a CAT based Diagnostic System for Assessing Basic Academic Skills in Undergraduate Students
    Woo-Jin Han, Jeongwook Choi, Dong-Gi Seo
    The Korean Association of General Education.2025; 19(3): 177.     CrossRef
  • Validation of the cognitive section of the Penn computerized adaptive test for neurocognitive and clinical psychopathology assessment (CAT-CCNB)
    Akira Di Sandro, Tyler M. Moore, Eirini Zoupou, Kelly P. Kennedy, Katherine C. Lopez, Kosha Ruparel, Lucky J. Njokweni, Sage Rush, Tarlan Daryoush, Olivia Franco, Alesandra Gorgone, Andrew Savino, Paige Didier, Daniel H. Wolf, Monica E. Calkins, J. Cobb S
    Brain and Cognition.2024; 174: 106117.     CrossRef
  • Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study
    Dong Gi Seo, Jeongwook Choi, Jinha Kim
    Journal of Educational Evaluation for Health Professions.2024; 21: 18.     CrossRef
  • The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration
    Hwanggyu Lim, Kyungseok Kang
    Journal of Educational Evaluation for Health Professions.2024; 21: 23.     CrossRef
  • Implementing Computer Adaptive Testing for High-Stakes Assessment: A Shift for Examinations Council of Lesotho
    Musa Adekunle Ayanwale, Julia Chere-Masopha, Mapulane Mochekele, Malebohang Catherine Morena
    International Journal of New Education.2024;[Epub]     CrossRef
  • The current utilization of the patient-reported outcome measurement information system (PROMIS) in isolated or combined total knee arthroplasty populations
    Puneet Gupta, Natalia Czerwonka, Sohil S. Desai, Alirio J. deMeireles, David P. Trofa, Alexander L. Neuwirth
    Knee Surgery & Related Research.2023;[Epub]     CrossRef
  • Evaluating a Computerized Adaptive Testing Version of a Cognitive Ability Test Using a Simulation Study
    Ioannis Tsaousis, Georgios D. Sideridis, Hannan M. AlGhamdi
    Journal of Psychoeducational Assessment.2021; 39(8): 954.     CrossRef
  • Accuracy and Efficiency of Web-based Assessment Platform (LIVECAT) for Computerized Adaptive Testing
    Do-Gyeong Kim, Dong-Gi Seo
    The Journal of Korean Institute of Information Technology.2020; 18(4): 77.     CrossRef
  • Transformaciones en educación médica: innovaciones en la evaluación de los aprendizajes y avances tecnológicos (parte 2)
    Veronica Luna de la Luz, Patricia González-Flores
    Investigación en Educación Médica.2020; 9(34): 87.     CrossRef
  • Introduction to the LIVECAT web-based computerized adaptive testing platform
    Dong Gi Seo, Jeongwook Choi
    Journal of Educational Evaluation for Health Professions.2020; 17: 27.     CrossRef
  • Computerised adaptive testing accurately predicts CLEFT-Q scores by selecting fewer, more patient-focused questions
    Conrad J. Harrison, Daan Geerards, Maarten J. Ottenhof, Anne F. Klassen, Karen W.Y. Wong Riff, Marc C. Swan, Andrea L. Pusic, Chris J. Sidey-Gibbons
    Journal of Plastic, Reconstructive & Aesthetic Surgery.2019; 72(11): 1819.     CrossRef
  • Presidential address: Preparing for permanent test centers and computerized adaptive testing
    Chang Hwi Kim
    Journal of Educational Evaluation for Health Professions.2018; 15: 1.     CrossRef
  • Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2018; 15: 36.     CrossRef
  • Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
    Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim
    Journal of Educational Evaluation for Health Professions.2018; 15: 26.     CrossRef
Research Articles
Psychometric properties of a novel knowledge assessment tool of mechanical ventilation for emergency medicine residents in the northeastern United States  
Jeremy B. Richards, Tania D. Strout, Todd A. Seigel, Susan R. Wilcox
J Educ Eval Health Prof. 2016;13:10.   Published online February 16, 2016
DOI: https://doi.org/10.3352/jeehp.2016.13.10
  • 30,083 View
  • 182 Download
  • 5 Web of Science
  • 6 Crossref
AbstractAbstract PDF
Purpose
Prior descriptions of the psychometric properties of validated knowledge assessment tools designed to determine Emergency medicine (EM) residents understanding of physiologic and clinical concepts related to mechanical ventilation are lacking. In this setting, we have performed this study to describe the psychometric and performance properties of a novel knowledge assessment tool that measures EM residents’ knowledge of topics in mechanical ventilation.
Methods
Results from a multicenter, prospective, survey study involving 219 EM residents from 8 academic hospitals in northeastern United States were analyzed to quantify reliability, item difficulty, and item discrimination of each of the 9 questions included in the knowledge assessment tool for 3 weeks, beginning in January 2013.
Results
The response rate for residents completing the knowledge assessment tool was 68.6% (214 out of 312 EM residents). Reliability was assessed by both Cronbach’s alpha coefficient (0.6293) and the Spearman-Brown coefficient (0.6437). Item difficulty ranged from 0.39 to 0.96, with a mean item difficulty of 0.75 for all 9 questions. Uncorrected item discrimination values ranged from 0.111 to 0.556. Corrected item-total correlations were determined by removing the question being assessed from analysis, resulting in a range of item discrimination from 0.139 to 0.498.
Conclusion
Reliability, item difficulty and item discrimination were within satisfactory ranges in this study, demonstrating acceptable psychometric properties of this knowledge assessment tool. This assessment indicates that this knowledge assessment tool is sufficiently rigorous for use in future research studies or for assessment of EM residents for evaluative purposes.

Citations

Citations to this article as recorded by  
  • Management of Mechanical Ventilation in Emergency Medicine: A Scoping Review
    Robert J. Klemisch, Mitchell S. Hymowitz, Ryan J. Alcantara, Brendan F. Mullan, Margaret L. Davis, Rachel Blume, Nicholas J. Johnson, Brian M. Fuller
    JACEP Open.2026; 7(1): 100297.     CrossRef
  • Educational Environment: Students’ Perceptions Using the Dundee Ready Educational Environment Measure Inventory
    Sneha Sathyan, Sicy Maria, Siroslin S Mahitha, Smiji Jinny, Sonia, Soumya Dsouza, Shycil Mathew
    Journal of Health and Allied Sciences NU.2025; 15: 444.     CrossRef
  • Comparison of three methods for teaching mechanical ventilation in an emergency setting to sixth-year medical students: a randomized trial
    Fernando Sabia Tallo, Letícia Sandre Vendrame, André Luciano Baitello
    Revista da Associação Médica Brasileira.2020; 66(10): 1409.     CrossRef
  • Critical Appraisal of Emergency Medicine Educational Research: The Best Publications of 2016
    Nicole M. Dubosh, Jaime Jordan, Lalena M. Yarris, Edward Ullman, Joshua Kornegay, Daniel Runde, Amy Miller Juve, Jonathan Fisher, Teresa Chan
    AEM Education and Training.2019; 3(1): 58.     CrossRef
  • Mechanical Ventilation Training During Graduate Medical Education: Perspectives and Review of the Literature
    Jonathan M. Keller, Dru Claar, Juliana Carvalho Ferreira, David C. Chu, Tanzib Hossain, William Graham Carlos, Jeffrey A. Gold, Stephanie A. Nonas, Nitin Seam
    Journal of Graduate Medical Education.2019; 11(4): 389.     CrossRef
  • Development and validation of a questionnaire to assess the knowledge of mechanical ventilation in urgent care among students in their last-year medical course in Brazil
    Fernando Sabia Tallo, Simone de Campos Vieira Abib, Andre Luciano Baitello, Renato Delascio Lopes
    Clinics.2019; 74: e663.     CrossRef
The validity and reliability of a problem-based learning implementation questionnaire  
Bhina Patria
J Educ Eval Health Prof. 2015;12:22.   Published online June 8, 2015
DOI: https://doi.org/10.3352/jeehp.2015.12.22
  • 53,859 View
  • 351 Download
  • 4 Web of Science
  • 2 Crossref
AbstractAbstract PDF
Purpose
The aim of this paper is to provide evidence for the validity and reliability of a questionnaire for assessing the implementation of problem-based learning (PBL). This questionnaire was developed to assess the quality of PBL implementation from the perspective of medical school graduates. Methods: A confirmatory factor analysis was conducted to assess the validity of the questionnaire. The analysis was based on a survey of 225 graduates of a problem-based medical school in Indonesia. Results: The results showed that the confirmatory factor analysis model had a good fit to the data. Further, the values of the standardized loading estimates, the squared inter-construct correlations, the average variances extracted, and the composite reliabilities all provided evidence of construct validity. Conclusion: The PBL implementation questionnaire was found to be valid and reliable, making it suitable for evaluation purposes.

Citations

Citations to this article as recorded by  
  • Changes in Learning Outcomes of Students Participating in Problem-Based Learning for the First Time: A Case Study of a Financial Management Course
    Yung-Chuan Lee
    The Asia-Pacific Education Researcher.2025; 34(1): 511.     CrossRef
  • Prison education in the resocialization of incarcerated individuals
    Rafael Romero-Carazas, Fabrizio Del Carpio-Delgado, Roque Juan Espinoza-Casco, David Hugo Bernedo-Moreira, Wilter C. Morales-García, Renza Adriana Alexandra Rodríguez-Asto, Lorena Karolay Quiñones-Ormeño
    Frontiers in Education.2025;[Epub]     CrossRef
Review Article
Reconsidering the Cut Score of Korean National Medical Licensing Examination
Duck Sun Ahn, Sowon Ahn
J Educ Eval Health Prof. 2007;4:1.   Published online April 28, 2007
DOI: https://doi.org/10.3352/jeehp.2007.4.1
  • 44,386 View
  • 181 Download
  • 5 Crossref
AbstractAbstract PDF
After briefly reviewing theories of standard setting we analyzed the problems of the current cut scores. Then, we reported the results of need assessment on the standard setting among medical educators and psychometricians. Analyses of the standard setting methods of developed countries were reported as well. Based on these findings, we suggested the Bookmark and the modified Angoff methods as alternative methods for setting standard. Possible problems and challenges were discussed when these methods were applied to the National Medical Licensing Examination.

Citations

Citations to this article as recorded by  
  • Licentiate Examinations in English Speaking Developed Countries USA, UK as Against Proposed National Exit Test (NExT) as a Licentiate Examination in India – A Narrative Review
    Neelam Mishra, Shubhada Gade, Vedprakash Mishra, Gaurav Mishra
    Journal of Datta Meghe Institute of Medical Sciences University.2025; 20(3): 470.     CrossRef
  • Predicting medical graduates’ clinical performance using national competency examination results in Indonesia
    Prattama Santoso Utomo, Amandha Boy Timor Randita, Rilani Riskiyana, Felicia Kurniawan, Irwin Aras, Cholis Abrori, Gandes Retno Rahayu
    BMC Medical Education.2022;[Epub]     CrossRef
  • Possibility of independent use of the yes/no Angoff and Hofstee methods for the standard setting of the Korean Medical Licensing Examination written test: a descriptive study
    Do-Hwan Kim, Ye Ji Kang, Hoon-Ki Park
    Journal of Educational Evaluation for Health Professions.2022; 19: 33.     CrossRef
  • Applying the Bookmark method to medical education: Standard setting for an aseptic technique station
    Monica L. Lypson, Steven M. Downing, Larry D. Gruppen, Rachel Yudkowsky
    Medical Teacher.2013; 35(7): 581.     CrossRef
  • Standard Setting in Student Assessment: Is a Defensible Method Yet to Come?
    A Barman
    Annals of the Academy of Medicine, Singapore.2008; 37(11): 957.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP