Purpose This study aimed to examine the reliability and validity of a measurement tool for portfolio assessments in medical education. Specifically, it investigated scoring consistency among raters and assessment criteria appropriateness according to an expert panel.
Methods A cross-sectional observational study was conducted from September to December 2018 for the Introduction to Clinical Medicine course at the Ewha Womans University College of Medicine. Data were collected for 5 randomly selected portfolios scored by a gold-standard rater and 6 trained raters. An expert panel assessed the validity of 12 assessment items using the content validity index (CVI). Statistical analysis included Pearson correlation coefficients for rater alignment, the intraclass correlation coefficient (ICC) for inter-rater reliability, and the CVI for item-level validity.
Results Rater 1 had the highest Pearson correlation (0.8916) with the gold-standard rater, while Rater 5 had the lowest (0.4203). The ICC for all raters was 0.3821, improving to 0.4415 after excluding Raters 1 and 5, indicating a 15.6% reliability increase. All assessment items met the CVI threshold of ≥0.75, with some achieving a perfect score (CVI=1.0). However, items like “sources” and “level and degree of performance” showed lower validity (CVI=0.72).
Conclusion The present measurement tool for portfolio assessments demonstrated moderate reliability and strong validity, supporting its use as a credible tool. For a more reliable portfolio assessment, more faculty training is needed.
Citations
Citations to this article as recorded by
Values Education in Curriculum Reform: A Qualitative Document Analysis of the Türkiye Century Maarif Model for Primary Education Ethem Gürhan Journal of Educational Research and Practice.2026; 4(1): 24. CrossRef
On the quantitative analysis of assessment scores with implicit and explicit constraints Sanjeeb Shrestha, Xiaoying Kong, Paul Kwan Studies in Educational Evaluation.2025; 87: 101509. CrossRef