Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Articles

Page Path
HOME > J Educ Eval Health Prof > Volume 22; 2025 > Article
Review
The impact of artificial intelligence-driven simulation on the development of non-technical skills in medical education: a systematic review
Sana Loubbairi*orcid, Yasmine El Moussaouiorcid, Laila Lahlouorcid, Imad Chakriorcid, Hicham Nassikorcid

DOI: https://doi.org/10.3352/jeehp.2025.22.37
Published online: November 24, 2025

Research and Innovation Laboratory in Health Science, Faculty of Medicine and Pharmacy, Ibn Zohr University, Agadir, Morocco

*Corresponding email: sana.loubbairi@edu.uiz.ac.ma

Editor: A Ra Cho, The Catholic University of Korea, Korea

• Received: October 6, 2025   • Accepted: November 14, 2025

© 2025 Korea Health Personnel Licensing Examination Institute

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

prev next
  • 1,850 Views
  • 291 Download
  • Purpose
    Artificial intelligence (AI)-driven simulation is an emerging approach in healthcare education that enhances learning effectiveness. This review examined its impact on the development of non-technical skills among medical learners.
  • Methods
    Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a systematic review was conducted using the following databases: Web of Science, ScienceDirect, Scopus, and PubMed. The quality of the included studies was assessed using the Mixed Methods Appraisal Tool. The protocol was previously registered in PROSPERO (CRD420251038024).
  • Results
    Of the 1,442 studies identified in the initial search, 20 met the inclusion criteria, involving 2,535 participants. The simulators varied considerably, ranging from platforms built on symbolic AI methods to social robots powered by computational AI. Among the 15 AI-driven simulators, 10 used ChatGPT or its variants as virtual patients. Several studies evaluated multiple non-technical skills simultaneously. Communication and clinical reasoning were the most frequently assessed skills, appearing in 12 and 6 studies, respectively, which generally reported positive outcomes. Improvements were also noted in decision-making, empathy, self-confidence, critical thinking, and problem-solving. In contrast, emotional regulation, assessed in a single study, showed no significant difference. Notably, none of the studies examined reflection, reflective practice, teamwork, or leadership.
  • Conclusion
    AI-driven simulation shows substantial potential for enhancing non-technical skills in medical education, particularly communication and clinical reasoning. However, its effects on several other non-technical skills remain unclear. Given heterogeneity in study designs and outcome measures, these findings should be interpreted cautiously. These considerations highlight the need for further research to support integrating this innovative approach into medical curricula.
The integration of soft skills into medical education is increasingly recognized as essential for preparing future physicians to navigate the complexities of patient care and teamwork. Soft skills, defined as non-technical interpersonal abilities, enhance healthcare professionals’ capacity to connect with patients, collaborate with colleagues, and perform tasks safely and effectively in challenging environments [1]. These skills are critical not only for improving patient outcomes and satisfaction but also for fostering a culture of safety and professionalism within healthcare settings [2]. In this context, non-technical skills can be broadly categorized into 3 interrelated domains: cognitive, interpersonal, and emotional-social skills [3]. Each domain contributes meaningfully to personal and professional development, supporting individuals’ ability to function effectively in complex clinical settings [4].
Simulation-based education has emerged as a transformative pedagogical method for developing these non-technical skills. It offers experiential learning opportunities in a safe, controlled environment where students can practice clinical and interpersonal skills without jeopardizing patient safety [5].
The integration of artificial intelligence (AI) into simulation environments is revolutionizing medical education by enabling adaptive and personalized learning experiences. AI algorithms can dynamically adjust scenario complexity based on learner performance, provide detailed analytics on decision-making processes, and deliver individualized feedback [6]. AI-powered simulations therefore represent a promising strategy to optimize training outcomes and better prepare medical learners for the complexities of modern healthcare.
Beyond medical education, the incorporation of AI into healthcare simulators significantly enhances training and educational outcomes by providing more realistic, adaptive, and personalized learning experiences [7]. AI-driven simulators can create virtual patients that respond dynamically to learners’ actions, enabling the simulation of complex clinical scenarios, including rare or high-risk cases, in a safe environment conducive to skill refinement. AI algorithms further personalize instruction by analyzing performance, delivering tailored feedback, and identifying areas for improvement, ensuring that each learner progresses at an appropriate pace [8].
Although these substantial benefits exist, challenges persist, including the need for technical expertise and the importance of addressing potential biases in AI systems to ensure effective and equitable implementation [9]. Despite growing recognition of the importance of non-technical skills, their systematic integration and assessment in medical education remain inconsistent. Traditional curricula often emphasize technical competencies, leaving gaps in the development of interpersonal and cognitive skills essential for holistic patient care.
Objectives
This study aimed to examine the impact of integrating AI within simulation environments on the development and reinforcement of non-technical skills—such as communication, critical thinking, and decision-making—among medical learners.
Ethics statement
This study is based solely on previously published literature; therefore, neither ethical committee approval nor informed consent was required.
Study design
This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [10]. The protocol was registered and published in PROSPERO (ID: CRD420251038024).
Eligibility criteria
The selection criteria for this review were developed using the population, intervention, comparison, outcome, and study design (PICOS) framework, as shown in Table 1. This review considered eligible studies that had assessed at least one of these non-technical skills, as presented in Table 2.
Information sources
Four databases—PubMed, ScienceDirect, Scopus, and Web of Science—were searched from their inception through July 2025. Only studies published in English were considered eligible. Commentary articles and conference abstracts were excluded. The final search was completed on July 18, 2025.
Search strategy
Two authors (S.L. and L.L.) independently performed the database searches using a strategy developed according to the predefined inclusion criteria. The strategy incorporated relevant keywords and Medical Subject Headings (MeSH), including: “students, medical” [MeSH Term], “medical students,” “medicine students,” “intern,” “resident,” “simulation,” “patient simulation” [MeSH Term], “virtual human,” “simulation training” [MeSH Terms], “computer simulation” [MeSH Term], “virtual patient,” “virtual reality” [MeSH Term], “non-technical skills,” “soft skills,” “mental competency,” “cognitive skills,” “social skills,” “emotional skills,” “interpersonal skills,” “artificial intelligence” [MeSH Term], “AI,” “machine learning” [MeSH Term], “deep learning” [MeSH Term], “natural language processing” [MeSH Term], “intelligent tutoring systems,” “chatbot,” “virtual assistant,” and “ChatGPT.” The detailed search strategy for each database is provided in Supplement 1.
Selection and data collection process
The screening and selection process was performed independently by 2 authors (S.L. and Y.E.). In cases of disagreement, the authors discussed the issue to reach a consensus; if no agreement was reached, a third author (H.N.) mediated the decision. Abstract and title screening, along with duplicate removal, was conducted using Rayyan QCRI (Rayyan) [11]. Full-text articles of potentially eligible studies were then reviewed. References of included articles were hand-searched to identify additional studies. All selected studies were imported into Zotero, a bibliographic management software system.
Data items
The extracted data included author names, year of publication, study location, study design, sample size, learner level, non-technical skills assessed, measurement tools or scales, AI-based simulator used, AI approach, intervention and control group methods (if applicable), and key findings.
Study risk of bias assessment
Two authors (S.L. and I.C.) independently evaluated the risk of bias using the Mixed Methods Appraisal Tool (MMAT) (Supplement 2) [12], which is designed to assess qualitative, randomized controlled, non-randomized, quantitative descriptive, and mixed-methods studies. The tool contains 2 screening questions and 5 core quality criteria tailored to each study design. For each study, the number of criteria met was reported (e.g., 4/5, 3/5), accompanied by a qualitative summary (e.g., “most criteria met,” “some criteria unmet”). Table 3 presents the methodological quality of all included studies.
Effect measures
Data were extracted according to the analytical methods used in each study, including statistical tests such as analysis of variance (ANOVA), the t-test, Wilcoxon, Mann-Whitney, analysis of covariance, and repeated-measures ANOVA/multivariate analysis of covariance, as well as effect size indicators (partial η², r), to facilitate comparisons across continuous, ordinal, and categorical outcomes.
Synthesis methods
To address the aim of this review, a structured synthesis approach was applied. First, a theoretical framework was developed to explore the role of AI-driven simulation in promoting non-technical skills among medical learners. Two authors (S.L. and L.L.) independently performed data extraction and categorization. The extracted data were analyzed using narrative synthesis, beginning with the construction of a framework aligned with the review’s objectives.
Reporting bias assessment
To minimize reporting bias, 2 authors (I.C. and S.B.) reviewed the publication processes and policies of the journals in which the included studies appeared. They then compared the reported results with published protocols and registrations to identify any selective outcome reporting.
Certainty assessment
Not done.
Study selection
The search strategy identified 1,442 articles. After removing duplicates, 1,291 records were screened based on titles and abstracts, leading to the exclusion of 1,231 articles. A total of 60 articles were selected for further analysis, of which 15 met the inclusion criteria and were included in this review. Additionally, 5 studies were added through a manual reference searching. Consequently, 20 studies were ultimately included in this systematic review. This process of data selection is illustrated in Fig. 1.
Study characteristics
A total of 20 studies were included in this review [13-32]. Four were conducted in Germany [13,19,20,30], 3 in China [26,29,31], 2 each in Sweden [14,15], the United States [18,22], Iran [16,21], Japan [27,32], and the United Kingdom [24,25], and one each in Canada [17], Taiwan [28], and Portugal [23]. Ten studies used a quasi-experimental design, 5 employed mixed methods, 4 were RCTs, and one was qualitative. The participants consisted of 2,535 medical learners with varying levels, ranging from first-year students to specialty interns. The key characteristics of these studies are presented in Table 3.
Risk of bias in studies
The MMAT showed that the majority of studies (12 out of 20) met all the criteria, reflecting high methodological quality [15-18,21,23-25,27-29,31]. Three studies met most of the criteria [14,20,30], whereas 5 met only some, indicating a moderate risk of bias [13,19,22,26,32]. Overall, although most findings were derived from high-quality studies, minor limitations were observed, including incomplete outcome data in 2 randomized controlled trials [18,22], representativeness concerns in 3 non-randomized studies [13,19,26], and occasional difficulties in integrating quantitative and qualitative findings in 2 mixed-method studies [14,20] (see Supplement 3 for more details).
Results of individual studies

Types of AI-based simulations

The analyzed studies included 15 AI-based simulators, 11 of which used computational AI approaches spanning several types. Virtual conversational agents were the most common, powered by advanced language models such as ChatGPT and its variants (ChatGPT, ChatGPT 3.5, ChatGPT-4o), as well as GPT models integrated into simulators (GPT-3.5, GPT-4, GPT-4 Turbo via Miibo) [13,15,19,20,24,25,29-32]. Other modern AI systems included the virtual operative assistant [17], AIteach [26], the AI-assisted procedural method “Waseda-Kyoto-Kagaka Suture No. 2 Refined II” [28], and a social robotic platform combined with large language models [14]. Symbolic AI simulators, comprising 4 systems, included immersive virtual platforms such as Body Interact [23,27], MPathic-VR [18,22], and HAMTA [21], and virtual or humanoid social robots such as Safir, Medobot [16].

Traditional teaching methods

For comparative analysis, various methods were employed. Four studies used traditional or digital teaching approaches such as standard computer-based learning [18,22], theory courses supported by PowerPoint presentations [29], traditional lecture [21], or standard clinical training [28]. Another study used traditional simulation with role-play [31], while others employed virtual simulations without AI [15-17,30]. However, several studies did not report a control group [13,14,19,20,23-27].

Non-technical skills assessment tools

The studies identified several tools for assessing soft skills, with each study using either a single instrument or a combination of complementary approaches. Communication was assessed through tools such as the objective structured clinical examination [18,22,31,32], the mini-clinical evaluation exercise [16,29], question-answer pairs [20] and the Immersive Technology Evaluation Measure [24], generally using 5-point scales in which higher scores reflect better performance. Some tools incorporated AI, including automated assessment by GPT-4 for coding communication skills, and MPathic-VR, which combines immersive simulation with interaction analysis through a reverse scoring system (0 for the optimal response and higher scores for inappropriate responses) [18,22]. Empathic communication was measured by the Empathic Communication Coding System, which codes empathic expression according to 7 levels, as well as by objective structured video examinations, based on audio and video recordings and automatically scored from 0 (unacceptable) to 3 (totally acceptable) on 5 dimensions [16].
Clinical reasoning was assessed through various approaches. AIteach analyzed 5 key indicators—rigor, logic, systematicity, agility, and breadth of knowledge—expressed as percentages from 0 to 100 [26]. Semi-structured interviews were also conducted to evaluate clinical reasoning [15]. A 20-item multiple-choice questionnaire scored as “correct” or “incorrect” produced a total score out of 20, with higher scores indicating stronger reasoning skills [27]. Additionally, the Clinical Reasoning Indicator–History Taking Inventory, a validated 5-point Likert scale assessing focusing questions, creating context, and securing information, was used [30].
Decision-making and conflict management were measured using questionnaires [14,23], or scripted tests combining simulation and targeted assessment [21]. Self-confidence was explored through self-assessment questionnaires using a 5-point Likert scale [24]. Emotional regulation was assessed using the Measurement of Emotions Scale, based on a 5-point scale comprising 22 adjectives grouped into 4 categories (basic, success, epistemic, and social emotions) [17]. Finally, critical thinking was measured using the Clinical Critical Thinking Scale, with a 5-point scoring system for an overall score of 100 [29]. Finally, several cross-sectional questionnaires assessed multiple non-technical skills simultaneously [14,23,25]. These tools demonstrated content validity supported by expert review or theoretical frameworks, and acceptable reliability, measured using Cronbach’s α or test–retest methods when available.
Results of syntheses
The majority of studies reported a positive effect of AI–based simulation on the development of non-technical skills among medical learners. The most frequently observed improvement was in communication, identified in 12 studies [14,16,18-20,22-25,29,31,32]. Regarding clinical reasoning, all studies evaluating this skill reported significant improvement [14,23,26,27,30,31]. Empathy and empathic history-taking also showed significant gains [13,16,25], while 3 studies found significant improvements in self-confidence [23,24,28]. Similarly, both studies evaluating critical thinking reported significant enhancement [25,29]. Additionally, decision-making improved in 3 of the 4 studies assessing this outcome [14,23,30]. Only one study examined conflict management skills, reporting a positive effect [23]. In contrast, emotional regulation, evaluated in a single study, showed no statistically significant difference between the experimental and control groups [17]. None of the included studies examined other soft skills, such as reflection, teamwork, leadership, and reflective practice.
Reporting biases
All included studies were published in peer-reviewed journals; however, it could not be confirmed that every measured outcome was fully reported in the published manuscripts.
Interpretation
The cumulative evidence reviewed consistently indicates that AI-driven simulation represents a highly effective pedagogical innovation for fostering non-technical skills—including critical thinking, clinical reasoning, communication, decision-making, self-confidence, conflict management, and empathy. Communication and clinical reasoning were the most frequently assessed competencies, and both demonstrated significant improvement across studies.
ChatGPT and its variants were the predominant computational AI tools used, providing adaptive and interactive virtual patient experiences. Similarly, commercial platforms such as Body Interact and MPathic-VR, representing symbolic AI, offered structured virtual patient scenarios. These systems require learners to apply theoretical knowledge in real time, integrate analytical reasoning, and engage in decision-making under pressure. This active, experiential mode of learning has been shown to reinforce knowledge retention and deepen cognitive engagement when compared with traditional didactic approaches [18,22,23,27].
Comparison with previous studies
Twelve studies assessing communication skills generally reported improvements, highlighting the effectiveness of AI-based simulators. Zidoun and El Mardi [33] likewise found that AI-based simulators and simulated patients (SPs) enhanced history-taking skills, suggesting that conventional approaches may also be beneficial. In contrast, Harder et al. [34] reported that AI-driven virtual reality simulations were perceived as more realistic, interactive, and engaging than SPs. These findings align with the present review, which showed that control groups exposed to theory-based courses, computer-based learning, or standard clinical training demonstrated no significant improvements relative to AI-assisted groups [18,22,28,29].
Regarding clinical reasoning, all 6 studies in this review reported significant improvement. This is consistent with findings by García-Torres et al. [35], who confirmed the effectiveness of virtual patients in enhancing clinical reasoning. However, the present results differ from those of Liu et al. [36], who noted that generative AI did not consistently support this competency.
Critical thinking, a key cognitive ability, also demonstrated consistent improvements, with both studies assessing this outcome reporting positive effects. Problem-solving showed similar benefits in one study. These findings are consistent with a previous study demonstrating that AI-based tools enhanced analytical reasoning, critical thinking, and problem-solving skills in medical students [37].
Beyond critical thinking, decision-making showed improvement in 3 of the 4 studies evaluating this competency. A recent study similarly demonstrated that AI-based rule engines and machine learning tools effectively strengthen clinical decision-making [38]. Improvements in empathy were also consistently reported across 3 studies. Importantly, a study evaluating a GPT-powered conversational agent for empathic history-taking showed strong concordance between AI-generated assessments and expert evaluations, further supporting the pedagogical reliability of these systems [13].
Self-confidence consistently improved across the 3 studies evaluating this skill [23,24,28]. This finding is noteworthy, as confidence is among the most critical soft skills required in healthcare practice [39].
The positive impact of AI on soft skills can be largely attributed to its ability to create a psychologically safe learning environment in which learners can repeatedly engage in complex clinical encounters without fear of adverse consequences. Through this mechanism, AI tools allow learners to experiment with diagnostic reasoning, refine communication strategies, and receive immediate, individualized feedback on their performance [40]. Additionally, the provision of multidimensional feedback—spanning technical accuracy, communication quality, and reasoning processes—offers an advantage over passive instruction, enabling learners to identify deficiencies in both cognitive and behavioral strategies and make targeted improvements [18].
Limitations
This review, the first to examine the impact of AI-based simulation on multiple non-technical skills, nevertheless highlights several methodological limitations that temper the generalizability of its findings. The included studies demonstrated considerable heterogeneity in design, ranging from randomized controlled trials to quasi-experimental and mixed-methods investigations. The absence of control groups in many studies further weakens causal inference, making it difficult to attribute observed improvements solely to AI-driven interventions [13,14,19,20,23-27]. Compounding this limitation is the lack of standardized, validated assessment tools for non-technical skills, which introduces inconsistencies and complicates cross-study comparisons. This underscores the urgent need for consensus-based evaluation frameworks in future research. Another notable gap is the absence of studies addressing leadership, teamwork, reflection, and reflective practice—skills that are essential in medical education [41]. This omission represents a clear deficiency in the current literature, and future research must investigate how AI-based simulation can influence these domains. Emotional regulation was also underexplored, with only one study assessing this skill and reporting no significant intergroup differences [17]. This suggests that although AI systems are effective in modeling cognitive and interpersonal competencies, they may be less capable of cultivating deeper emotional skills—an area warranting further exploration. A recent study emphasized the ongoing “dearth of quantitative evidence” regarding the long-term, verifiable impacts of AI models on learner outcomes [42], highlighting the need for longitudinal, rigorously controlled research to validate AI’s contribution to sustained professional development.
Implications
Collectively, these findings position AI-driven simulation as a transformative tool for advancing medical education, particularly in the domain of non-technical skills. To fully leverage its potential, future research must address the methodological limitations identified in the existing literature. Specifically, there is a need for larger, multicenter studies with robust control groups, standardized assessment instruments, and long-term follow-up to establish both the efficacy and durability of outcomes. Research should also explore hybrid instructional models in which AI-based simulation is integrated with human-led debriefing, reflective practice, and emotional skills coaching, thereby targeting complex domains such as empathy, reflective practice, and emotional regulation. Ethical considerations—including algorithmic transparency, equitable access to AI-enhanced learning tools, and safeguards against bias in training content—must also be prioritized [43]. Ultimately, AI-driven simulation should be viewed not as a replacement for human educators but as a powerful complement that offers personalized, adaptive, and scalable learning opportunities [44]. When thoughtfully implemented, these technologies can help cultivate healthcare professionals who are not only clinically competent but also communicatively skilled, emotionally intelligent, and well-equipped to address the humanistic dimensions of patient care.
AI-powered medical simulations are transforming education by offering personalized, adaptive, and immersive learning experiences that enhance non-technical skills, particularly communication and clinical reasoning. These AI-based platforms allow learners to practice complex clinical cases safely, receive immediate feedback, and build confidence and competence more effectively than with traditional methods. They also support self-directed learning by tailoring scenarios to individual needs and enabling flexible, on-demand access. Despite their promise, important challenges remain, including variability in study designs, ethical concerns, and limited exploration of certain cognitive and emotional skills. Future research should prioritize robust, multisite studies and integrate AI tools with human-guided reflection to optimize learning outcomes. Overall, AI-enabled simulation represents a scalable, learner-centered approach that substantially enhances healthcare training and better prepares learners for the complexities of modern clinical practice.

Authors’ contributions

Conceptualization: SL, YE, LL, HN. Data curation: SL, YE, LL. Methodology/formal analysis/validation: SL, YE, LL, IC, HN. Project administration: SL, LL. Funding acquisition: SL, YE, LL. Writing–original draft: SL, YE, IC. Writing–review & editing: SL, YE, LL, IC, HN.

Conflict of interest

No potential conflict of interest relevant to this article was reported.

Funding

None.

Data availability

Not applicable.

Acknowledgments

None.

Supplement files are available from https://doi.org/10.7910/DVN/3ATEYC
Supplement 1. Search strategies for each database.
jeehp-22-37-suppl1.docx
Supplement 2. Mixed Methods Appraisal Tool.
jeehp-22-37-suppl2.docx
Supplement 3. Summary of MMAT Ratings.
jeehp-22-37-suppl3.docx
Supplement 4. Audio recording of the abstract.
jeehp-22-37-abstract-recording.avi
Fig. 1.
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. AI, artificial intelligence.
jeehp-22-37f1.jpg
jeehp-22-37f2.jpg
Table 1.
Inclusion and exclusion criteria
PICOS items Inclusion and exclusion criteria
Population Undergraduate and postgraduate medical learners, regardless of their academic level or grades.
Intervention All type of simulation-based artificial intelligence
Comparison If a control group was included, it referred to those who had used either the simulation technique or the traditional teaching methods
Outcome The included studies must assess the non-technical skills under investigation
Study design Quantitative, qualitative and mixed-methods studies

PICOS, population, intervention, comparison, outcome, and study design.

Table 2.
Non-technical skills
Contents
Cognitive skills Critical thinking, reflection, reflective practice, decision-making, problem-solving, clinical reasoning
Interpersonal skills Communication, teamwork, leadership
Emotional-social skills Empathy, emotional regulation, self-confidence, stress management
Table 3.
Summary of the characteristics of the included studies
Authors (year) (country) Study design Sample size & population Intervention Study outcomes Assessment tool AI approach Key findings MMAT criteria met
Experimental Control
Kron et al. [22] (2017) (USA) RCT 421 second-year medical students MPathic-VR Self-directed online CBL module Interprofessional & intercultural communication - MPathic-VR Symbolic Significant effect on overall communication skills, with MPathic-VR students scoring higher (mean=0.806, SD=0.201) than CBL students (mean=0.752, SD=0.198); F(1,414)=6.09, P=0.014, η²=0.0145 3/5 Some criteria unmet
- OSCE
Fazlollahi et al. [17] (2022) (Canada) RCT 70 first- or second-year medical students VOA Simulation without feedback Emotional regulation MES Computational Positive emotions increased (mean difference=+0.36; P<0.001) and negative emotions decreased (mean difference=–0.59; P<0.001) after simulation training, with no significant difference between groups (P>0.05). 5/5 All criteria met
Wang et al. [26] (2022) (China) Quasi-experimental 15 medical graduate students Real records + AI feedback None Clinical thinking AIteach system evaluation Computational Significant improvement in clinical thinking scores between pre-test (mean=69.87, SD=14.69) and post-test (mean=85.6, SD=11.31) (P<0.01) 3/5 Some criteria unmet
Borg et al. [15] (2024) (Sweden) Qualitative 23 third-year medical students Social robot Computer-based VP Clinical reasoning Semi-structured interviews Computational The social robotics platform improved clinical reasoning through symptom-based reasoning, hypothesis generation, and adapting to new patient information. 5/5 All criteria met
Holderried et al. [20] (2024) (Germany) Mixed methods 26 medical students GPT-3.5 chatbot None Communication (history-taking) QAPs Computational GPT-3.5 chatbot enabled realistic patient interaction for practicing communication, with 97.9% of answers rated as plausible. 4/5 Most criteria met
Zheng et al. [29] (2024) (China) Quasi-experimental 66 first-year medical students ChatGPT Theory (PowerPoint) and lab sessions Critical thinking, communication - Mini-CEX Computational The experimental group scored significantly higher in clinical critical thinking (mean=92.94, SD=2.13 vs. mean=89.31, SD=2.53; P<0.001) and communication skills (mean=76.24, SD=12.30 vs. mean=70.19, SD=11.26; P<0.05) than the control group. 5/5 All criteria met
- Clinical Critical Thinking Scale
Yang & Shulghuf [28] (2019) (Taiwan) Quasi-experimental 72 medical interns WKS-2RII AI-enabled suturing system Standard clinical training Self-confidence Questionnaire Computational Self-confidence in suturing/ligature skills increased in all groups, with significant gains in the AI-assisted group after repeated practice (1 session: mean=3.8, SD=0.3 → 3 sessions: mean=4.7, SD=1.2; out of 5 points; P<0.05 compared with regular and expert-led groups). 5/5 All criteria met
Gutterman et al. [18] (2019) (USA) Mixed methods 417 second-year medical students MPathic-VR Standard computer module Communication - MPathic-VR Symbolic Significant effects of the MPathic-VR intervention on communication were observed (intercultural: mean=11.7 → 5.9; interprofessional: mean=7.6 → 4.6; P<0.001), compared to the control group. 5/5 All criteria met
- OSCE
Mestre et al. [23] (2022) (Portugal) Quasi-experimental 293 medical students (1st–3rd year) Body Interact None Clinical reasoning, decision-making, communication, confidence, conflict management Questionnaire Symbolic Significant improvements in clinical reasoning (mean=5.09 → 5.46), decision-making (mean=4.72 → 5.20), communication (mean=4.62 → 5.03), confidence (mean=4.75 → 5.18), and conflict management (mean=4.34 → 4.89) 5/5 All criteria met
Watari et al. [27] (2020) (Japan) Quasi-experimental 169 fourth-year medical students Body Interact None Clinical reasoning Questionnaire Symbolic Significant improvements in clinical reasoning (mean=5.39 → 7.81; P<0.001). 5/5 All criteria met
Jeddi et al. [21] (2024) (Iran) Quasi-experimental 80 medical interns (6th–7th year) HAMTA computer case-based simulation Traditional lectures Decision-making Scenario-based tests Symbolic No statistically significant differences existed between the groups in clinical decision-making (diagnosis and antibiotic prescription) (P>0.21). 5/5 All criteria met
Borg et al. [14] (2025) (Sweden) Mixed methods 62 third-year medical students Social robotic platform + LLM None Clinical reasoning, communication, decision-making Questionnaire Computational The social robotic platform improved communication, clinical reasoning, and decision-making (clinical reasoning: mean=4.4 vs. 4.1, P=0.01; decision-making: mean=4.4 vs. 3.9, P=0.03). 4/5 Most criteria met
Mukadam et al. [24] (2025) (UK) Mixed methods 27 fourth- and fifth-year medical students ChatGPT-4o voice AI standardized patient None Communication, self-confidence - Self-reported assessments Computational ChatGPT improved the perceived usefulness of communication (median 3 → 4; P=0.010) and increased students’ confidence in managing difficult patients, delivering bad news, and counselling anxious patients (P<0.001). 5/5 All criteria met
- ITEM
Holderried et al. [19] (2024) (Germany) Quasi-experimental 106 third-year medical students GPT-4 virtual patient None Communication (history-taking) GPT-4–based communication skills assessment system Computational High communication performance, with strong agreement between GPT-4 and the human expert (κ=0.832) 3/5 Some criteria unmet
Aster et al. [13] (2025) (Germany) Quasi-experimental 35 third-year medical students ChatGPT 3.5 virtual patient None Empathic history-taking ECCS Computational Students demonstrated adequate history taking and empathic communication, with 14% of interactions identified as empathic (ECCS coding, ICC=0.770). 3/5 Some criteria unmet
Derakhshan et al. [16] (2025) (Iran) Quasi-experimental 404 third-semester medical students Safir humanoid robot Non-robotic virtual agent Medobot Empathy, communication - OSVEs Symbolic Interaction with the AI robot improved students' empathy and communication skills compared to the control group (mean listening score=15.86 vs. 13.65; mean speaking score 16.15 vs. 14.21; P<0.001). 5/5 All criteria met
- Mini-CEX
Pears et al. [25] (2024) (UK) Mixed methods 27 urology trainees GPT-4 None Communication, critical thinking, empathy - Self-reported questionnaires Computational Significantly higher scores were observed for communication (linguistic terminology: U=155.5, P=0.003; complexity: U=184, P=0.020), critical thinking (U=496, P=0.020, ES=0.398), and empathy (mean 3.63 vs. 3.04; P=0.021). 5/5 All criteria met
- Structured custom form (by expert)
Yamamoto et al. [32] (2024) (Japan) Quasi-experimental 145 fourth-year medical students GPT-4 Turbo via miibo Traditional program without AI Communication (medical interview) OSCE Computational Significant improvements in medical interview scores were observed in the AI group compared to the control group (mean=28.1 vs. 27.1; P<0.05). 3/5 Some criteria unmet
Brügge et al. [30] (2024) (Germany) RCT 21 medical students (mostly 3rd semester) ChatGPT 3.5 Simulated patient conversation (without AI feedback) Clinical reasoning, decision-making CRI-HTI Computational Significant improvements in clinical reasoning and decision-making were observed in the AI feedback group compared to the control group (F(1,20)=4.44, P=0.049, η²=0.198). 4/5 Most criteria met
Wang et al. [31] (2025) (China) RCT 56 fifth-year medical students GPT-4 Traditional role-playing with experienced instructors Communication (history taking), clinical reasoning OSCE Computational Significant improvements in history taking and clinical reasoning were observed in the GPT group (mean=86.79, SD=5.46) compared to the control group (mean=73.64, SD=4.76; t=9.60, P<0.001). 5/5 All criteria met

The “MMAT criteria met” column shows the number of criteria fulfilled; the qualitative summary reflects the extent to which criteria were met. No overall score or percentage was calculated, in line with MMAT guidance.

AI, artificial intelligence; MMAT, Mixed Methods Appraisal Tool; RCT, randomized controlled trial; VR, virtual reality; OSCE, objective structured clinical examination; SD, standard deviation; CBL, computer-based learning; VOA, Virtual Operative Assistant; MES, Medical Emotion Scale; VP, virtual patient; GPT-3, Generative Pre-trained Transformer 3; QAPs, question–answer pairs; Mini-CEX, Mini Clinical Evaluation Exercise; WKS-2RII, Waseda–Kyoto–Kagaka Suture No. 2 Refined II; LLM, large language model; ITEM, Immersive Technology Evaluation Measure; ECCS, Empathic Communication Coding System; ICC, intraclass correlation coefficient; OSVEs, Objective Structured Video Examinations; ES, effect size; CRI-HTI, Clinical Reasoning Indicator–History Taking Inventory.

  • 1. Adell-Lleixa M, Riba-Porquet F, Grau-Castell L, Sarrio-Colas L, Ginovart-Prieto M, Mulet-Aloras E, Reverte-Villarroya S. Transforming communication and non-technical skills in intermediate care nurses through ultra-realistic clinical simulation: a cross-sectional study. Nurs Rep 2025;15:272. https://doi.org/10.3390/nursrep15080272 ArticlePubMedPMC
  • 2. Sancho-Cantus D, Cubero-Plazas L, Botella Navas M, Castellano-Rioja E, Canabate Ros M. Importance of soft skills in health sciences students and their repercussion after the COVID-19 epidemic: scoping review. Int J Environ Res Public Health 2023;20:4901. https://doi.org/10.3390/ijerph20064901 ArticlePubMedPMC
  • 3. Prineas S, Mosier K, Mirko C, Guicciardi S. Non-technical skills in healthcare. In: Donaldson L, Ricciardi W, Sheridan S, Tartaglia R, editors. Textbook of patient safety and clinical risk management. Springer; 2021. p. 413-434. https://doi.org/10.1007/978-3-030-59403-9
  • 4. Mwita KM, Mwilongo NH, Mwamboma I. The role of soft skills, technical skills and academic performance on graduate employability. Int J Res Bus Soc Sci 2024;13:767-776. https://doi.org/10.20525/ijrbs.v13i5.3457 Article
  • 5. Loutet MG, Zhang J, Varsaneux O, Ferguson A, Hulme J, Stone S, Oldenburger D, Piggott T. Using experiential simulation-based learning to increase engagement in global health education: an evaluation of self-reported participant experience. Med Sci Educ 2020;30:1245-1253. https://doi.org/10.1007/s40670-020-00999-w ArticlePubMedPMC
  • 6. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44-56. https://doi.org/10.1038/s41591-018-0300-7 ArticlePubMed
  • 7. Rodgers DL, Needler M, Robinson A, Barnes R, Brosche T, Hernandez J, Poore J, VandeKoppel P, Ahmed R. Artificial intelligence and the simulationists. Simul Healthc 2023;18:395-399. https://doi.org/10.1097/SIH.0000000000000747 ArticlePubMed
  • 8. Harder N. Advancing healthcare simulation through artificial intelligence and machine learning: exploring innovations. Clin Simul Nurs 2023;83:101456. https://doi.org/10.1016/j.ecns.2023.101456 Article
  • 9. Hasanzadeh F, Josephson CB, Waters G, Adedinsewo D, Azizi Z, White JA. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med 2025;8:154. https://doi.org/10.1038/s41746-025-01503-7 ArticlePubMedPMC
  • 10. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hrobjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. https://doi.org/10.1136/bmj.n71 ArticlePubMedPMC
  • 11. Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan: a web and mobile app for systematic reviews. Syst Rev 2016;5:210. https://doi.org/10.1186/s13643-016-0384-4 ArticlePubMedPMC
  • 12. Hong QN, Fabregues S, Bartlett G, Boardman F, Cargo M, Dagenais P, Gagnon MP, Griffiths F, Nicolau B, O’Cathain A, Rousseau MC. The Mixed Methods Appraisal Tool (MMAT) version 2018 for information professionals and researchers. Educ Inf 2018;34:285-291. https://doi.org/10.3233/EFI-180221 Article
  • 13. Aster A, Ragaller SV, Raupach T, Marx A. ChatGPT as a virtual patient: written empathic expressions during medical history taking. Med Sci Educ 2025;35:1513-1522. https://doi.org/10.1007/s40670-025-02342-7 ArticlePubMedPMC
  • 14. Borg A, Georg C, Jobs B, Huss V, Waldenlind K, Ruiz M, Edelbring S, Skantze G, Parodis I. Virtual patient simulations using social robotics combined with large language models for clinical reasoning training in medical education: mixed methods study. J Med Internet Res 2025;27:e63312. https://doi.org/10.2196/63312 ArticlePubMedPMC
  • 15. Borg A, Jobs B, Huss V, Gentline C, Espinosa F, Ruiz M, Edelbring S, Georg C, Skantze G, Parodis I. Enhancing clinical reasoning skills for medical students: a qualitative comparison of LLM-powered social robotic versus computer-based virtual patients within rheumatology. Rheumatol Int 2024;44:3041-3051. https://doi.org/10.1007/s00296-024-05731-0 ArticlePubMedPMC
  • 16. Derakhshan A, Teo T, Khazaie S. Investigating the usefulness of artificial intelligence-driven robots in developing empathy for English for medical purposes communication: the role-play of Asian and African students. Comput Hum Behav 2025;162:108416. https://doi.org/10.1016/j.chb.2024.108416 Article
  • 17. Fazlollahi AM, Bakhaidar M, Alsayegh A, Yilmaz R, Winkler-Schwartz A, Mirchi N, Langleben I, Ledwos N, Sabbagh AJ, Bajunaid K, Harley JM, Del Maestro RF. Effect of artificial intelligence tutoring vs expert instruction on learning simulated surgical skills among medical students: a randomized clinical trial. JAMA Netw Open 2022;5:e2149008. https://doi.org/10.1001/jamanetworkopen.2021.49008 ArticlePubMedPMC
  • 18. Guetterman TC, Sakakibara R, Baireddy S, Kron FW, Scerbo MW, Cleary JF, Fetters MD. Medical students’ experiences and outcomes using a virtual human simulation to improve communication skills: mixed methods study. J Med Internet Res 2019;21:e15459. https://doi.org/10.2196/15459 ArticlePubMedPMC
  • 19. Holderried F, Stegemann-Philipps C, Herrmann-Werner A, Festl-Wietek T, Holderried M, Eickhoff C, Mahling M. A language model-powered simulated patient with automated feedback for history taking: prospective study. JMIR Med Educ 2024;10:e59213. https://doi.org/10.2196/59213 ArticlePubMedPMC
  • 20. Holderried F, Stegemann-Philipps C, Herschbach L, Moldt JA, Nevins A, Griewatz J, Holderried M, Herrmann-Werner A, Festl-Wietek T, Mahling M. A Generative pretrained transformer (GPT)-powered chatbot as a simulated patient to practice history taking: prospective, mixed methods study. JMIR Med Educ 2024;10:e53961. https://doi.org/10.2196/53961 ArticlePubMedPMC
  • 21. Jeddi FR, Momen-Heravi M, Farrahi R, Nabovati E, Akbari H, Khodabandeh ME. Computer case-based reasoning simulation versus traditional lectures for medical interns teaching of diagnosis and antibiotic prescribing for acute respiratory infection: a comparative quasi-experimental study. BMC Med Educ 2024;24:1463. https://doi.org/10.1186/s12909-024-06453-4 ArticlePubMedPMC
  • 22. Kron FW, Fetters MD, Scerbo MW, White CB, Lypson ML, Padilla MA, Gliva-McConvey GA, Belfore LA, West T, Wallace AM, Guetterman TC, Schleicher LS, Kennedy RA, Mangrulkar RS, Cleary JF, Marsella SC, Becker DM. Using a computer simulation for teaching communication skills: a blinded multisite mixed methods randomized controlled trial. Patient Educ Couns 2017;100:748-759. https://doi.org/10.1016/j.pec.2016.10.024 ArticlePubMed
  • 23. Mestre A, Muster M, El Adib AR, Osp Egilsdottir H, Byermoen KR, Padilha M, Aguilar T, Tabagari N, Betts L, Sales L, Garcia P, Ling L, Cafe H, Binnie A, Marreiros A. The impact of small-group virtual patient simulator training on perceptions of individual learning process and curricular integration: a multicentre cohort study of nursing and medical students. BMC Med Educ 2022;22:375. https://doi.org/10.1186/s12909-022-03426-3 ArticlePubMedPMC
  • 24. Mukadam A, Suresh S, Jacobs C. Beyond traditional simulation: an exploratory study on the effectiveness and acceptability of ChatGPT 4o advanced voice mode for communication skills practice among medical students. Cureus 2025;17:e84381. https://doi.org/10.7759/cureus.84381 ArticlePubMedPMC
  • 25. Pears M, Wadhwa K, Payne SR, Hanchanale V, Elmamoun MH, Jain S, Konstantinidis ST, Rochester M, Doherty R, Spearpoint K, Ng O, Dick L, Yule S, Biyani CS. Non-technical skills for urology trainees: a double-blinded study of ChatGPT4 AI benchmarking against consultant interaction. J Healthc Inform Res 2025;9:103-118. https://doi.org/10.1007/s41666-024-00180-7 ArticlePubMed
  • 26. Wang M, Sun Z, Jia M, Wang Y, Wang H, Zhu X, Chen L, Ji H. Intelligent virtual case learning system based on real medical records and natural language processing. BMC Med Inform Decis Mak 2022;22:60. https://doi.org/10.1186/s12911-022-01797-7 ArticlePubMedPMC
  • 27. Watari T, Tokuda Y, Owada M, Onigata K. The utility of virtual patient simulations for clinical reasoning education. Int J Environ Res Public Health 2020;17:5325. https://doi.org/10.3390/ijerph17155325 ArticlePubMedPMC
  • 28. Yang YY, Shulruf B. Expert-led and artificial intelligence (AI) system-assisted tutoring course increase confidence of Chinese medical interns on suturing and ligature skills: prospective pilot study. J Educ Eval Health Prof 2019;16:7. https://doi.org/10.3352/jeehp.2019.16.7 ArticlePubMedPMC
  • 29. Zheng K, Shen Z, Chen Z, Che C, Zhu H. Application of AI-empowered scenario-based simulation teaching mode in cardiovascular disease education. BMC Med Educ 2024;24:1003. https://doi.org/10.1186/s12909-024-05977-z ArticlePubMedPMC
  • 30. Brugge E, Ricchizzi S, Arenbeck M, Keller MN, Schur L, Stummer W, Holling M, Lu MH, Darici D. Large language models improve clinical decision making of medical students through patient simulation and structured feedback: a randomized controlled trial. BMC Med Educ 2024;24:1391. https://doi.org/10.1186/s12909-024-06399-7 ArticlePubMedPMC
  • 31. Wang Z, Fan TT, Li ML, Zhu NJ, Wang XC. Feasibility study of using GPT for history-taking training in medical education: a randomized clinical trial. BMC Med Educ 2025;25:1030. https://doi.org/10.1186/s12909-025-07614-9 ArticlePubMedPMC
  • 32. Yamamoto A, Koda M, Ogawa H, Miyoshi T, Maeda Y, Otsuka F, Ino H. Enhancing medical interview skills through AI-simulated patient interactions: nonrandomized controlled trial. JMIR Med Educ 2024;10:e58753. https://doi.org/10.2196/58753 ArticlePubMedPMC
  • 33. Zidoun Y, El Mardi A. Artificial intelligence (AI)-based simulators versus simulated patients in undergraduate programs: a protocol for a randomized controlled trial. BMC Med Educ 2024;24:1260. https://doi.org/10.1186/s12909-024-06236-x ArticlePubMedPMC
  • 34. Harder N, Ali F, Turner S, Workum K, Gillman L. Comparing artificial intelligence-enhanced virtual reality and simulated patient simulations in undergraduate nursing education. Clin Simul Nurs 2025;105:101780. https://doi.org/10.1016/j.ecns.2025.101780 Article
  • 35. Garcia-Torres D, Vicente Ripoll MA, Fernandez Peris C, Mira Solves JJ. Enhancing clinical reasoning with virtual patients: a hybrid systematic review combining human reviewers and ChatGPT. Healthcare (Basel) 2024;12:2241. https://doi.org/10.3390/healthcare12222241 ArticlePubMedPMC
  • 36. Liu YM, Chou CC, Jaing TH, Okoli CT. Generative AI for clinical reasoning: a scoping review. Teach Learn Nurs 2025 Sep 10. [Epub]. https://doi.org/10.1016/j.teln.2025.08.008 Article
  • 37. Wei H, Dai Y, Yuan K, Li KY, Hung KF, Hu EM, Lee AHC, Chang JW, Zhang C, Li X. AI-powered problem- and case-based learning in medical and dental education: a systematic review and meta-analysis. Int Dent J 2025;75:100858. https://doi.org/10.1016/j.identj.2025.100858 ArticlePubMedPMC
  • 38. Alnattah A, Jajroudi M, Fadafen SA, Manzari MN, Eslami S. Artificial intelligence in clinical decision-making: a scoping review of rule-based systems and their applications in medicine. Cureus 2025;17:e91333. https://doi.org/10.7759/cureus.91333 ArticlePubMedPMC
  • 39. Elkhalladi J, Sefrioui A, Fahssi ME, Tahiri M. Level of knowledge of nurses and healthcare technicians regarding soft skills: an exploratory study. Rev Esc Enferm USP 2024;58:e20240124. https://doi.org/10.1590/1980-220X-REEUSP-2024-0124en ArticlePubMedPMC
  • 40. Kiyak YS, Emekli E, Is Kara T, Coskun O, Budakoglu II. AI teaches surgical diagnostic reasoning to medical students: evidence from an experiment using a fully automated, low-cost feedback system. J Surg Educ 2025;82:103639. https://doi.org/10.1016/j.jsurg.2025.103639 ArticlePubMed
  • 41. Loubbairi S, Lahlou L, Amechghal A, Nassik H. The impact of simulation on the development of critical thinking and reflection among nursing and medical students: a systematic review. Korean J Med Educ 2025;37:187-202. https://doi.org/10.3946/kjme.2025.334 ArticlePubMedPMC
  • 42. Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med Educ 2025;25:129. https://doi.org/10.1186/s12909-025-06719-5 ArticlePubMedPMC
  • 43. Lepri B, Oliver N, Letouze E, Pentland A, Vinck P. Fair, transparent, and accountable algorithmic decision-making processes: the premise, the proposed solutions, and the open challenges. Philos Technol 2018;31:611-627. https://doi.org/10.1007/s13347-017-0279-x Article
  • 44. Akyon S, Akyon F. Digital transformation of clinical education through artificial intelligence: a strengths, weaknesses, opportunities, and threats (SWOT) analysis. Ankara Med J 2025;25:96-118. https://doi.org/10.5505/amj.2025.63373 Article

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Figure
      • 0
      • 1
      The impact of artificial intelligence-driven simulation on the development of non-technical skills in medical education: a systematic review
      Image Image
      Fig. 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. AI, artificial intelligence.
      Graphical abstract
      The impact of artificial intelligence-driven simulation on the development of non-technical skills in medical education: a systematic review
      PICOS items Inclusion and exclusion criteria
      Population Undergraduate and postgraduate medical learners, regardless of their academic level or grades.
      Intervention All type of simulation-based artificial intelligence
      Comparison If a control group was included, it referred to those who had used either the simulation technique or the traditional teaching methods
      Outcome The included studies must assess the non-technical skills under investigation
      Study design Quantitative, qualitative and mixed-methods studies
      Contents
      Cognitive skills Critical thinking, reflection, reflective practice, decision-making, problem-solving, clinical reasoning
      Interpersonal skills Communication, teamwork, leadership
      Emotional-social skills Empathy, emotional regulation, self-confidence, stress management
      Authors (year) (country) Study design Sample size & population Intervention Study outcomes Assessment tool AI approach Key findings MMAT criteria met
      Experimental Control
      Kron et al. [22] (2017) (USA) RCT 421 second-year medical students MPathic-VR Self-directed online CBL module Interprofessional & intercultural communication - MPathic-VR Symbolic Significant effect on overall communication skills, with MPathic-VR students scoring higher (mean=0.806, SD=0.201) than CBL students (mean=0.752, SD=0.198); F(1,414)=6.09, P=0.014, η²=0.0145 3/5 Some criteria unmet
      - OSCE
      Fazlollahi et al. [17] (2022) (Canada) RCT 70 first- or second-year medical students VOA Simulation without feedback Emotional regulation MES Computational Positive emotions increased (mean difference=+0.36; P<0.001) and negative emotions decreased (mean difference=–0.59; P<0.001) after simulation training, with no significant difference between groups (P>0.05). 5/5 All criteria met
      Wang et al. [26] (2022) (China) Quasi-experimental 15 medical graduate students Real records + AI feedback None Clinical thinking AIteach system evaluation Computational Significant improvement in clinical thinking scores between pre-test (mean=69.87, SD=14.69) and post-test (mean=85.6, SD=11.31) (P<0.01) 3/5 Some criteria unmet
      Borg et al. [15] (2024) (Sweden) Qualitative 23 third-year medical students Social robot Computer-based VP Clinical reasoning Semi-structured interviews Computational The social robotics platform improved clinical reasoning through symptom-based reasoning, hypothesis generation, and adapting to new patient information. 5/5 All criteria met
      Holderried et al. [20] (2024) (Germany) Mixed methods 26 medical students GPT-3.5 chatbot None Communication (history-taking) QAPs Computational GPT-3.5 chatbot enabled realistic patient interaction for practicing communication, with 97.9% of answers rated as plausible. 4/5 Most criteria met
      Zheng et al. [29] (2024) (China) Quasi-experimental 66 first-year medical students ChatGPT Theory (PowerPoint) and lab sessions Critical thinking, communication - Mini-CEX Computational The experimental group scored significantly higher in clinical critical thinking (mean=92.94, SD=2.13 vs. mean=89.31, SD=2.53; P<0.001) and communication skills (mean=76.24, SD=12.30 vs. mean=70.19, SD=11.26; P<0.05) than the control group. 5/5 All criteria met
      - Clinical Critical Thinking Scale
      Yang & Shulghuf [28] (2019) (Taiwan) Quasi-experimental 72 medical interns WKS-2RII AI-enabled suturing system Standard clinical training Self-confidence Questionnaire Computational Self-confidence in suturing/ligature skills increased in all groups, with significant gains in the AI-assisted group after repeated practice (1 session: mean=3.8, SD=0.3 → 3 sessions: mean=4.7, SD=1.2; out of 5 points; P<0.05 compared with regular and expert-led groups). 5/5 All criteria met
      Gutterman et al. [18] (2019) (USA) Mixed methods 417 second-year medical students MPathic-VR Standard computer module Communication - MPathic-VR Symbolic Significant effects of the MPathic-VR intervention on communication were observed (intercultural: mean=11.7 → 5.9; interprofessional: mean=7.6 → 4.6; P<0.001), compared to the control group. 5/5 All criteria met
      - OSCE
      Mestre et al. [23] (2022) (Portugal) Quasi-experimental 293 medical students (1st–3rd year) Body Interact None Clinical reasoning, decision-making, communication, confidence, conflict management Questionnaire Symbolic Significant improvements in clinical reasoning (mean=5.09 → 5.46), decision-making (mean=4.72 → 5.20), communication (mean=4.62 → 5.03), confidence (mean=4.75 → 5.18), and conflict management (mean=4.34 → 4.89) 5/5 All criteria met
      Watari et al. [27] (2020) (Japan) Quasi-experimental 169 fourth-year medical students Body Interact None Clinical reasoning Questionnaire Symbolic Significant improvements in clinical reasoning (mean=5.39 → 7.81; P<0.001). 5/5 All criteria met
      Jeddi et al. [21] (2024) (Iran) Quasi-experimental 80 medical interns (6th–7th year) HAMTA computer case-based simulation Traditional lectures Decision-making Scenario-based tests Symbolic No statistically significant differences existed between the groups in clinical decision-making (diagnosis and antibiotic prescription) (P>0.21). 5/5 All criteria met
      Borg et al. [14] (2025) (Sweden) Mixed methods 62 third-year medical students Social robotic platform + LLM None Clinical reasoning, communication, decision-making Questionnaire Computational The social robotic platform improved communication, clinical reasoning, and decision-making (clinical reasoning: mean=4.4 vs. 4.1, P=0.01; decision-making: mean=4.4 vs. 3.9, P=0.03). 4/5 Most criteria met
      Mukadam et al. [24] (2025) (UK) Mixed methods 27 fourth- and fifth-year medical students ChatGPT-4o voice AI standardized patient None Communication, self-confidence - Self-reported assessments Computational ChatGPT improved the perceived usefulness of communication (median 3 → 4; P=0.010) and increased students’ confidence in managing difficult patients, delivering bad news, and counselling anxious patients (P<0.001). 5/5 All criteria met
      - ITEM
      Holderried et al. [19] (2024) (Germany) Quasi-experimental 106 third-year medical students GPT-4 virtual patient None Communication (history-taking) GPT-4–based communication skills assessment system Computational High communication performance, with strong agreement between GPT-4 and the human expert (κ=0.832) 3/5 Some criteria unmet
      Aster et al. [13] (2025) (Germany) Quasi-experimental 35 third-year medical students ChatGPT 3.5 virtual patient None Empathic history-taking ECCS Computational Students demonstrated adequate history taking and empathic communication, with 14% of interactions identified as empathic (ECCS coding, ICC=0.770). 3/5 Some criteria unmet
      Derakhshan et al. [16] (2025) (Iran) Quasi-experimental 404 third-semester medical students Safir humanoid robot Non-robotic virtual agent Medobot Empathy, communication - OSVEs Symbolic Interaction with the AI robot improved students' empathy and communication skills compared to the control group (mean listening score=15.86 vs. 13.65; mean speaking score 16.15 vs. 14.21; P<0.001). 5/5 All criteria met
      - Mini-CEX
      Pears et al. [25] (2024) (UK) Mixed methods 27 urology trainees GPT-4 None Communication, critical thinking, empathy - Self-reported questionnaires Computational Significantly higher scores were observed for communication (linguistic terminology: U=155.5, P=0.003; complexity: U=184, P=0.020), critical thinking (U=496, P=0.020, ES=0.398), and empathy (mean 3.63 vs. 3.04; P=0.021). 5/5 All criteria met
      - Structured custom form (by expert)
      Yamamoto et al. [32] (2024) (Japan) Quasi-experimental 145 fourth-year medical students GPT-4 Turbo via miibo Traditional program without AI Communication (medical interview) OSCE Computational Significant improvements in medical interview scores were observed in the AI group compared to the control group (mean=28.1 vs. 27.1; P<0.05). 3/5 Some criteria unmet
      Brügge et al. [30] (2024) (Germany) RCT 21 medical students (mostly 3rd semester) ChatGPT 3.5 Simulated patient conversation (without AI feedback) Clinical reasoning, decision-making CRI-HTI Computational Significant improvements in clinical reasoning and decision-making were observed in the AI feedback group compared to the control group (F(1,20)=4.44, P=0.049, η²=0.198). 4/5 Most criteria met
      Wang et al. [31] (2025) (China) RCT 56 fifth-year medical students GPT-4 Traditional role-playing with experienced instructors Communication (history taking), clinical reasoning OSCE Computational Significant improvements in history taking and clinical reasoning were observed in the GPT group (mean=86.79, SD=5.46) compared to the control group (mean=73.64, SD=4.76; t=9.60, P<0.001). 5/5 All criteria met
      Table 1. Inclusion and exclusion criteria

      PICOS, population, intervention, comparison, outcome, and study design.

      Table 2. Non-technical skills

      Table 3. Summary of the characteristics of the included studies

      The “MMAT criteria met” column shows the number of criteria fulfilled; the qualitative summary reflects the extent to which criteria were met. No overall score or percentage was calculated, in line with MMAT guidance.

      AI, artificial intelligence; MMAT, Mixed Methods Appraisal Tool; RCT, randomized controlled trial; VR, virtual reality; OSCE, objective structured clinical examination; SD, standard deviation; CBL, computer-based learning; VOA, Virtual Operative Assistant; MES, Medical Emotion Scale; VP, virtual patient; GPT-3, Generative Pre-trained Transformer 3; QAPs, question–answer pairs; Mini-CEX, Mini Clinical Evaluation Exercise; WKS-2RII, Waseda–Kyoto–Kagaka Suture No. 2 Refined II; LLM, large language model; ITEM, Immersive Technology Evaluation Measure; ECCS, Empathic Communication Coding System; ICC, intraclass correlation coefficient; OSVEs, Objective Structured Video Examinations; ES, effect size; CRI-HTI, Clinical Reasoning Indicator–History Taking Inventory.


      JEEHP : Journal of Educational Evaluation for Health Professions
      TOP