Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions



Page Path
HOME > J Educ Eval Health Prof > Volume 20; 2023 > Article
Brief report
Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study
Sun Huh*orcid

Published online: January 11, 2023
  • 914 Download
  • 37 Crossref
  • 64 Scopus

Department of Parasitology and Institute of Medical Education, College of Medicine, Hallym University, Chuncheon, Korea

*Corresponding email:

Editor: Yera Hur, Hallym University, Korea

• Received: January 3, 2023   • Accepted: January 11, 2023

© 2023 Korea Health Personnel Licensing Examination Institute

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.
O’Connor and ChatGPT [1] wrote an editorial, the opening paragraphs of which were written by ChatGPT, an artificial intelligence (AI) chatbot. ChatGTP was trained by a model using reinforcement learning from human feedback, using the same methods as InstructGPT (GPT: generative pre-trained transformer) [2]. AI chatbots such as ChatGPT could provide tutoring and homework help by answering questions and providing explanations to help students understand complex concepts. However, there are concerns that the use of AI software by students to write university assessments could diminish the value of the assessments and the overall quality of the university program [1]. After the release of ChatGPT to the public on November 30, 2022, it became a hot topic, particularly in education. Stokel-Walker [3] also noted that ChatGPT, an AI-powered chatbot that generates intelligent-sounding text in response to user prompts, including homework assignments and exam-style questions, has caused concern. Medical students must be able to evaluate the accuracy of medical information generated by AI and have the abilities to create reliable, validated information for patients and the public [4]. Therefore, it is necessary to determine how accurately ChatGPT, a recently developed AI chatbot, can solve questions on medical examinations. This comparison of ChatGPT’s abilities may provide insights into whether—and if so, how—medical students could use ChatGPT for their learning.
This study aimed to compare the knowledge and interpretation ability of ChatGPT with those of medical students in Korea by administering a parasitology examination. This subject is required in medical schools in Korea. Specifically, the following were investigated: (1) the scores of ChatGPT compared to those of the medical students; (2) the correct answer rate of ChatGPT according to items’ knowledge level; and (3) the acceptability of ChatGPT’s explanations as reflecting current parasitology knowledge, as evaluated by the author.
This was not a study of human subjects, but an analysis of the results of an educational examination routinely conducted at medical colleges. Therefore, neither receiving approval from the institutional review board nor obtaining informed consent was required.
This is a descriptive study to compare the ability of ChatGPT to answer questions with that of medical students.
On January 1, 2023 (Seoul time), a parasitology examination with identical items to those administered to first-year medical students at Hallym University on December 12, 2022, using computer-based testing (Supplement 1), was administered to ChatGPT (version December 15, 2022). The answers given by ChatGPT were compared to those of the medical students. Parasitology classes for medical students began on October 31, 2022, and ended on December 8, 2022. There were 16 hours of lectures and 32 hours of laboratory practice.
Seventy-seven medical students took the parasitology on December 12, 2022. ChatGPT was counted as one examinee. There were no exclusion criteria.
The items’ knowledge level and the examinees’ scores were the variables.
The response data of 77 medical students on the parasitology examination and ChatGPT were compared. The correct answer rate according to items’ level of knowledge was analyzed. The author also evaluated the acceptability of the explanations provided by ChatGPT (Supplement 2, Fig. 1), and classified the acceptability as good, needing revision, and unacceptable.
There was no bias in the selection of examinees. All students who attended the parasitology lecture course were included.
Sample size estimation was not required because all target students were included, and one AI platform was added.
Descriptive statistics were used to analyze the chatbot’s score. A comparative analysis was conducted using DBSTAT version 5.0 (DBSTAT).
According to data from Dataset 1, ChatGPT correctly answered 48 out of 79 items (60.8%). This score was lower than the average score of 77 medical students, which was 71.8 out of 79 (90.8%), with a minimum score of 65 (89.0%) and a maximum score of 74 (93.7%).
Table 1 shows ChatGPT’s responses according to items’ knowledge level. The chi-square test yielded results of χ2=3.02, degrees of freedom (df)=2, with a significance level of 0.05 (χ2=5.99). This result indicates that the relationship between the 2 variables was not significant (P=0.2206).
Table 2 shows the acceptability of ChatGPT’s explanations according to the correctness of the answer. The chi-square test showed results of χ2=51.62, df=2, with a significance level of 0.05 (χ2=5.99). This result indicates that the relationship between the 2 variables was significant (P=0.0000).
ChatGPT’s performance was lower than that of medical students. The correct answer rate shown by ChatGPT was not related to the items’ knowledge level. However, there was an association between acceptable explanations and correct answers.
ChatGPT’s correct answer rate of 60.8% was not necessarily an indicator of poor performance, as the questions were not easy for medical students to answer correctly. The considerably higher average score (89.6%) of the medical students may have been due to their prior learning of parasitology and the fact that the examination was administered 4 days after the class. If the examination had been taken 1 or 2 months after the class, the students’ performance scores might have been lower. Some incorrect answers may have been due to the following factors: first, ChatGPT is currently unable to interpret figures, graphs, and tables as a student can, so the author had to describe these materials in text form. Second, some epidemiological data unique to Korea were outside ChatGPT’s knowledge. Some of those data are only available in Korean or are not searchable online. Third, ChatGPT sometimes did not understand multiple-choice questions where the examinee must select the best answer out of multiple options. ChatGPT sometimes selected 2 or more options, as it has not yet been trained to do otherwise.
There was no significant difference in the correct answer rate according to the knowledge level of the items. However, this may vary in other examinations and may have been a unique phenomenon for this parasitology exam. ChatGPT’s explanations of the question items were generally acceptable if it made a correct selection. However, the explanations for 7 items needed to be updated or revised because they contained incorrect information. This finding suggests that ChatGPT’s knowledge in specific fields (e.g., parasitology) remains insufficient. If the incorrect option was selected, the explanation was unacceptable or needed revision in 90.0% of items. This result was anticipated, as students’ explanations for incorrect selections are also usually unacceptable. Sometimes, GPT could not select best answer but, the explanation is acceptable. Example is the item number 39.
There have been no reported studies in the literature databases, including PubMed, Scopus, and Web of Science, on the comparability of ChatGPT’s performance to that of students on medical examinations.
The input for the question items for ChatGPT was not precisely the same as for the medical students. The chatbot cannot receive information in graphs, figures, and tables, so this information was re-described by the author. Additionally, the interpretation of the explanations and correct answers may vary according to the perspectives of different parasitologists, although the author has worked in the field of parasitology for 40 years (1982–2022) in Korea. Best practices for patient care may also vary according to the region and medical environment.
The above results cannot be generalized directly to other subjects or medical schools, as chatbots will likely continue to evolve rapidly through user feedback. A future trial with the same items may yield different results. The present results reflect the abilities of ChatGPT on January 1, 2023.
Currently, ChatGPT’s level of knowledge and interpretation is not sufficient to be used by medical students, especially in medical school exams. This may also be the case for high-stakes exams, including health licensing exams. However, I believe that ChatGPT’s knowledge and interpretation abilities will improve rapidly through deep learning, similar to AlphaGo’s ability [5]. Therefore, medical/health professors and students should be mindful of how to incorporate this AI platform into medical/health education soon. Furthermore, AI should be integrated into the medical school curriculum, and some schools have already adopted it [6].
ChatGPT’s knowledge and interpretation ability in answering this parasitology examination are not yet comparable to those of medical students in Korea. However, these abilities will likely improve through deep learning. Medical/health professors and students should be aware of the progress of this AI chatbot and consider its potential adoption in learning and education.

Authors’ contributions

All work was done by Sun Huh.

Conflict of interest

Sun Huh has been the editor of the Journal of Educational Evaluation for Health Professions since 2005. He was not involved in the review process. Otherwise, no potential conflict of interest relevant to this article was reported.



Data availability

Data files are available from Harvard Dataverse:

Dataset 1. Raw data for analysis, including item number, knowledge level, correct answer, ChatGPT’s answers, and correctness of explanations for a parasitology examination taken by the first-year medical students at Hallym University on December 12, 2022..


Supplementary files are available from Harvard Dataverse:
Supplement 1. Seventy-nine items from a parasitology examination taken by first-year medical students at Hallym University on December 12, 2022.
Supplement 2. ChatGPT’s responses to 79 items from a parasitology examination taken by first-year medical students at Hallym University on December 12, 2022, inputted on January 1, 2023, by the author. Figures and tables are removed and explained in the item stem. Explanations of the options selected by ChatGPT are also included.
Supplement 3. Audio recording of the abstract.
Fig. 1.
Screenshot of ChatGPT’s answer to a question item from a parasitology examination for medical students at Hallym University.
Table 1.
Correct responses by ChatGPT according to the knowledge level of 79 items
Knowledge level of items Correct responses Incorrect answers
Recall 17 15
Interpretation 20 12
Problem-solving 11 4
Table 2.
Acceptability of ChatGPT’s explanations of the 79 question items by correctness of the answer
Explanation Correct answers Incorrect answers
Good 41 3
Needs to be revised 7 8
Unacceptable 0 20

Figure & Data



    Citations to this article as recorded by  
    • Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology
      Ranwir K Sinha, Asitava Deb Roy, Nikhil Kumar, Himel Mondal
      Cureus.2023;[Epub]     CrossRef
    • Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers
      Sun Huh
      Journal of Educational Evaluation for Health Professions.2023; 20: 5.     CrossRef
    • Emergence of the metaverse and ChatGPT in journal publishing after the COVID-19 pandemic
      Sun Huh
      Science Editing.2023; 10(1): 1.     CrossRef
    • Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum
      Dipmala Das, Nikhil Kumar, Langamba Angom Longjam, Ranwir Sinha, Asitava Deb Roy, Himel Mondal, Pratima Gupta
      Cureus.2023;[Epub]     CrossRef
    • Evaluating ChatGPT's Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry
      Arindam Ghosh, Aritri Bir
      Cureus.2023;[Epub]     CrossRef
    • Overview of Early ChatGPT’s Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts
      Omar Temsah, Samina A Khan, Yazan Chaiah, Abdulrahman Senjab, Khalid Alhasan, Amr Jamal, Fadi Aljamaan, Khalid H Malki, Rabih Halwani, Jaffar A Al-Tawfiq, Mohamad-Hani Temsah, Ayman Al-Eyadhy
      Cureus.2023;[Epub]     CrossRef
    • ChatGPT for Future Medical and Dental Research
      Bader Fatani
      Cureus.2023;[Epub]     CrossRef
    • ChatGPT in Dentistry: A Comprehensive Review
      Hind M Alhaidry, Bader Fatani, Jenan O Alrayes, Aljowhara M Almana, Nawaf K Alfhaed
      Cureus.2023;[Epub]     CrossRef
    • Can we trust AI chatbots’ answers about disease diagnosis and patient care?
      Sun Huh
      Journal of the Korean Medical Association.2023; 66(4): 218.     CrossRef
    • Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions
      Alaa Abd-alrazaq, Rawan AlSaad, Dari Alhuwail, Arfan Ahmed, Padraig Mark Healy, Syed Latifi, Sarah Aziz, Rafat Damseh, Sadam Alabed Alrazak, Javaid Sheikh
      JMIR Medical Education.2023; 9: e48291.     CrossRef
    • Early applications of ChatGPT in medical practice, education and research
      Sam Sedaghat
      Clinical Medicine.2023; 23(3): 278.     CrossRef
    • A Review of Research on Teaching and Learning Transformation under the Influence of ChatGPT Technology
      璇 师
      Advances in Education.2023; 13(05): 2617.     CrossRef
    • Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
      Soshi Takagi, Takashi Watari, Ayano Erabi, Kota Sakaguchi
      JMIR Medical Education.2023; 9: e48002.     CrossRef
    • ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions
      Cosima C. Hoch, Barbara Wollenberg, Jan-Christoffer Lüers, Samuel Knoedler, Leonard Knoedler, Konstantin Frank, Sebastian Cotofana, Michael Alfertshofer
      European Archives of Oto-Rhino-Laryngology.2023; 280(9): 4271.     CrossRef
    • Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
      Mayank Agarwal, Priyanka Sharma, Ayan Goswami
      Cureus.2023;[Epub]     CrossRef
    • The Intersection of ChatGPT, Clinical Medicine, and Medical Education
      Rebecca Shin-Yee Wong, Long Chiau Ming, Raja Affendi Raja Ali
      JMIR Medical Education.2023; 9: e47274.     CrossRef
    • The Role of Artificial Intelligence in Higher Education: ChatGPT Assessment for Anatomy Course
      Tarık TALAN, Yusuf KALINKARA
      Uluslararası Yönetim Bilişim Sistemleri ve Bilgisayar Bilimleri Dergisi.2023; 7(1): 33.     CrossRef
    • Comparing ChatGPT’s ability to rate the degree of stereotypes and the consistency of stereotype attribution with those of medical students in New Zealand in developing a similarity rating test: a methodological study
      Chao-Cheng Lin, Zaine Akuhata-Huntington, Che-Wei Hsu
      Journal of Educational Evaluation for Health Professions.2023; 20: 17.     CrossRef
    • Examining Real-World Medication Consultations and Drug-Herb Interactions: ChatGPT Performance Evaluation
      Hsing-Yu Hsu, Kai-Cheng Hsu, Shih-Yen Hou, Ching-Lung Wu, Yow-Wen Hsieh, Yih-Dih Cheng
      JMIR Medical Education.2023; 9: e48433.     CrossRef
    • Assessing the Efficacy of ChatGPT in Solving Questions Based on the Core Concepts in Physiology
      Arijita Banerjee, Aquil Ahmad, Payal Bhalla, Kavita Goyal
      Cureus.2023;[Epub]     CrossRef
    • ChatGPT Performs on the Chinese National Medical Licensing Examination
      Xinyi Wang, Zhenye Gong, Guoxin Wang, Jingdan Jia, Ying Xu, Jialu Zhao, Qingye Fan, Shaun Wu, Weiguo Hu, Xiaoyang Li
      Journal of Medical Systems.2023;[Epub]     CrossRef
    • Artificial intelligence and its impact on job opportunities among university students in North Lima, 2023
      Doris Ruiz-Talavera, Jaime Enrique De la Cruz-Aguero, Nereo García-Palomino, Renzo Calderón-Espinoza, William Joel Marín-Rodriguez
      ICST Transactions on Scalable Information Systems.2023;[Epub]     CrossRef
    • Revolutionizing Dental Care: A Comprehensive Review of Artificial Intelligence Applications Among Various Dental Specialties
      Najd Alzaid, Omar Ghulam, Modhi Albani, Rafa Alharbi, Mayan Othman, Hasan Taher, Saleem Albaradie, Suhael Ahmed
      Cureus.2023;[Epub]     CrossRef
    • Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review
      Carl Preiksaitis, Christian Rose
      JMIR Medical Education.2023; 9: e48785.     CrossRef
    • Exploring the impact of language models, such as ChatGPT, on student learning and assessment
      Araz Zirar
      Review of Education.2023;[Epub]     CrossRef
    • Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT
      Jad Abi-Rafeh, Hong Hao Xu, Roy Kazan, Ruth Tevlin, Heather Furnas
      Aesthetic Surgery Journal.2023;[Epub]     CrossRef
    • Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system
      Shani Rosen, Mor Saban
      European Radiology.2023;[Epub]     CrossRef
    • Redesigning Tertiary Educational Evaluation with AI: A Task-Based Analysis of LIS Students’ Assessment on Written Tests and Utilizing ChatGPT at NSTU
      Shamima Yesmin
      Science & Technology Libraries.2023; : 1.     CrossRef
    • ChatGPT and the AI revolution: a comprehensive investigation of its multidimensional impact and potential
      Mohd Afjal
      Library Hi Tech.2023;[Epub]     CrossRef
    • The Significance of Artificial Intelligence Platforms in Anatomy Education: An Experience With ChatGPT and Google Bard
      Hasan B Ilgaz, Zehra Çelik
      Cureus.2023;[Epub]     CrossRef
    • Is ChatGPT’s Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination?
      Abhra Ghosh, Nandita Maini Jindal, Vikram K Gupta, Ekta Bansal, Navjot Kaur Bajwa, Abhishek Sett
      Cureus.2023;[Epub]     CrossRef
    • Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers
      Ana Suárez, Víctor Díaz‐Flores García, Juan Algar, Margarita Gómez Sánchez, María Llorente de Pedro, Yolanda Freire
      International Endodontic Journal.2023;[Epub]     CrossRef
    • Ethical consideration of the use of generative artificial intelligence, including ChatGPT in writing a nursing article
      Sun Huh
      Child Health Nursing Research.2023; 29(4): 249.     CrossRef
    • Potential Use of ChatGPT for Patient Information in Periodontology: A Descriptive Pilot Study
      Osman Babayiğit, Zeynep Tastan Eroglu, Dilek Ozkan Sen, Fatma Ucan Yarkac
      Cureus.2023;[Epub]     CrossRef
    • Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
      Aleksandra Ignjatović, Lazar Stevanović
      Journal of Educational Evaluation for Health Professions.2023; 20: 28.     CrossRef
    • Assessing the Performance of ChatGPT in Medical Biochemistry Using Clinical Case Vignettes: Observational Study
      Krishna Mohan Surapaneni
      JMIR Medical Education.2023; 9: e47191.     CrossRef
    • Bob or Bot: Exploring ChatGPT's answers to University Computer Science Assessment
      Mike Richards, Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch
      ACM Transactions on Computing Education.2023;[Epub]     CrossRef

    We recommend
    Related articles

    JEEHP : Journal of Educational Evaluation for Health Professions