The impact of artificial intelligence-driven simulation on the development of non-technical skills in medical education: a systematic review
Article information
Abstract
Purpose
Artificial intelligence (AI)-driven simulation is an emerging approach in healthcare education that enhances learning effectiveness. This review examined its impact on the development of non-technical skills among medical learners.
Methods
Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a systematic review was conducted using the following databases: Web of Science, ScienceDirect, Scopus, and PubMed. The quality of the included studies was assessed using the Mixed Methods Appraisal Tool. The protocol was previously registered in PROSPERO (CRD420251038024).
Results
Of the 1,442 studies identified in the initial search, 20 met the inclusion criteria, involving 2,535 participants. The simulators varied considerably, ranging from platforms built on symbolic AI methods to social robots powered by computational AI. Among the 15 AI-driven simulators, 10 used ChatGPT or its variants as virtual patients. Several studies evaluated multiple non-technical skills simultaneously. Communication and clinical reasoning were the most frequently assessed skills, appearing in 12 and 6 studies, respectively, which generally reported positive outcomes. Improvements were also noted in decision-making, empathy, self-confidence, critical thinking, and problem-solving. In contrast, emotional regulation, assessed in a single study, showed no significant difference. Notably, none of the studies examined reflection, reflective practice, teamwork, or leadership.
Conclusion
AI-driven simulation shows substantial potential for enhancing non-technical skills in medical education, particularly communication and clinical reasoning. However, its effects on several other non-technical skills remain unclear. Given heterogeneity in study designs and outcome measures, these findings should be interpreted cautiously. These considerations highlight the need for further research to support integrating this innovative approach into medical curricula.
Introduction
The integration of soft skills into medical education is increasingly recognized as essential for preparing future physicians to navigate the complexities of patient care and teamwork. Soft skills, defined as non-technical interpersonal abilities, enhance healthcare professionals’ capacity to connect with patients, collaborate with colleagues, and perform tasks safely and effectively in challenging environments [1]. These skills are critical not only for improving patient outcomes and satisfaction but also for fostering a culture of safety and professionalism within healthcare settings [2]. In this context, non-technical skills can be broadly categorized into 3 interrelated domains: cognitive, interpersonal, and emotional-social skills [3]. Each domain contributes meaningfully to personal and professional development, supporting individuals’ ability to function effectively in complex clinical settings [4].
Simulation-based education has emerged as a transformative pedagogical method for developing these non-technical skills. It offers experiential learning opportunities in a safe, controlled environment where students can practice clinical and interpersonal skills without jeopardizing patient safety [5].
The integration of artificial intelligence (AI) into simulation environments is revolutionizing medical education by enabling adaptive and personalized learning experiences. AI algorithms can dynamically adjust scenario complexity based on learner performance, provide detailed analytics on decision-making processes, and deliver individualized feedback [6]. AI-powered simulations therefore represent a promising strategy to optimize training outcomes and better prepare medical learners for the complexities of modern healthcare.
Beyond medical education, the incorporation of AI into healthcare simulators significantly enhances training and educational outcomes by providing more realistic, adaptive, and personalized learning experiences [7]. AI-driven simulators can create virtual patients that respond dynamically to learners’ actions, enabling the simulation of complex clinical scenarios, including rare or high-risk cases, in a safe environment conducive to skill refinement. AI algorithms further personalize instruction by analyzing performance, delivering tailored feedback, and identifying areas for improvement, ensuring that each learner progresses at an appropriate pace [8].
Although these substantial benefits exist, challenges persist, including the need for technical expertise and the importance of addressing potential biases in AI systems to ensure effective and equitable implementation [9]. Despite growing recognition of the importance of non-technical skills, their systematic integration and assessment in medical education remain inconsistent. Traditional curricula often emphasize technical competencies, leaving gaps in the development of interpersonal and cognitive skills essential for holistic patient care.
Objectives
This study aimed to examine the impact of integrating AI within simulation environments on the development and reinforcement of non-technical skills—such as communication, critical thinking, and decision-making—among medical learners.
Methods
Ethics statement
This study is based solely on previously published literature; therefore, neither ethical committee approval nor informed consent was required.
Study design
This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [10]. The protocol was registered and published in PROSPERO (ID: CRD420251038024).
Eligibility criteria
The selection criteria for this review were developed using the population, intervention, comparison, outcome, and study design (PICOS) framework, as shown in Table 1. This review considered eligible studies that had assessed at least one of these non-technical skills, as presented in Table 2.
Information sources
Four databases—PubMed, ScienceDirect, Scopus, and Web of Science—were searched from their inception through July 2025. Only studies published in English were considered eligible. Commentary articles and conference abstracts were excluded. The final search was completed on July 18, 2025.
Search strategy
Two authors (S.L. and L.L.) independently performed the database searches using a strategy developed according to the predefined inclusion criteria. The strategy incorporated relevant keywords and Medical Subject Headings (MeSH), including: “students, medical” [MeSH Term], “medical students,” “medicine students,” “intern,” “resident,” “simulation,” “patient simulation” [MeSH Term], “virtual human,” “simulation training” [MeSH Terms], “computer simulation” [MeSH Term], “virtual patient,” “virtual reality” [MeSH Term], “non-technical skills,” “soft skills,” “mental competency,” “cognitive skills,” “social skills,” “emotional skills,” “interpersonal skills,” “artificial intelligence” [MeSH Term], “AI,” “machine learning” [MeSH Term], “deep learning” [MeSH Term], “natural language processing” [MeSH Term], “intelligent tutoring systems,” “chatbot,” “virtual assistant,” and “ChatGPT.” The detailed search strategy for each database is provided in Supplement 1.
Selection and data collection process
The screening and selection process was performed independently by 2 authors (S.L. and Y.E.). In cases of disagreement, the authors discussed the issue to reach a consensus; if no agreement was reached, a third author (H.N.) mediated the decision. Abstract and title screening, along with duplicate removal, was conducted using Rayyan QCRI (Rayyan) [11]. Full-text articles of potentially eligible studies were then reviewed. References of included articles were hand-searched to identify additional studies. All selected studies were imported into Zotero, a bibliographic management software system.
Data items
The extracted data included author names, year of publication, study location, study design, sample size, learner level, non-technical skills assessed, measurement tools or scales, AI-based simulator used, AI approach, intervention and control group methods (if applicable), and key findings.
Study risk of bias assessment
Two authors (S.L. and I.C.) independently evaluated the risk of bias using the Mixed Methods Appraisal Tool (MMAT) (Supplement 2) [12], which is designed to assess qualitative, randomized controlled, non-randomized, quantitative descriptive, and mixed-methods studies. The tool contains 2 screening questions and 5 core quality criteria tailored to each study design. For each study, the number of criteria met was reported (e.g., 4/5, 3/5), accompanied by a qualitative summary (e.g., “most criteria met,” “some criteria unmet”). Table 3 presents the methodological quality of all included studies.
Effect measures
Data were extracted according to the analytical methods used in each study, including statistical tests such as analysis of variance (ANOVA), the t-test, Wilcoxon, Mann-Whitney, analysis of covariance, and repeated-measures ANOVA/multivariate analysis of covariance, as well as effect size indicators (partial η², r), to facilitate comparisons across continuous, ordinal, and categorical outcomes.
Synthesis methods
To address the aim of this review, a structured synthesis approach was applied. First, a theoretical framework was developed to explore the role of AI-driven simulation in promoting non-technical skills among medical learners. Two authors (S.L. and L.L.) independently performed data extraction and categorization. The extracted data were analyzed using narrative synthesis, beginning with the construction of a framework aligned with the review’s objectives.
Reporting bias assessment
To minimize reporting bias, 2 authors (I.C. and S.B.) reviewed the publication processes and policies of the journals in which the included studies appeared. They then compared the reported results with published protocols and registrations to identify any selective outcome reporting.
Certainty assessment
Not done.
Results
Study selection
The search strategy identified 1,442 articles. After removing duplicates, 1,291 records were screened based on titles and abstracts, leading to the exclusion of 1,231 articles. A total of 60 articles were selected for further analysis, of which 15 met the inclusion criteria and were included in this review. Additionally, 5 studies were added through a manual reference searching. Consequently, 20 studies were ultimately included in this systematic review. This process of data selection is illustrated in Fig. 1.
Study characteristics
A total of 20 studies were included in this review [13-32]. Four were conducted in Germany [13,19,20,30], 3 in China [26,29,31], 2 each in Sweden [14,15], the United States [18,22], Iran [16,21], Japan [27,32], and the United Kingdom [24,25], and one each in Canada [17], Taiwan [28], and Portugal [23]. Ten studies used a quasi-experimental design, 5 employed mixed methods, 4 were RCTs, and one was qualitative. The participants consisted of 2,535 medical learners with varying levels, ranging from first-year students to specialty interns. The key characteristics of these studies are presented in Table 3.
Risk of bias in studies
The MMAT showed that the majority of studies (12 out of 20) met all the criteria, reflecting high methodological quality [15-18,21,23-25,27-29,31]. Three studies met most of the criteria [14,20,30], whereas 5 met only some, indicating a moderate risk of bias [13,19,22,26,32]. Overall, although most findings were derived from high-quality studies, minor limitations were observed, including incomplete outcome data in 2 randomized controlled trials [18,22], representativeness concerns in 3 non-randomized studies [13,19,26], and occasional difficulties in integrating quantitative and qualitative findings in 2 mixed-method studies [14,20] (see Supplement 3 for more details).
Results of individual studies
Types of AI-based simulations
The analyzed studies included 15 AI-based simulators, 11 of which used computational AI approaches spanning several types. Virtual conversational agents were the most common, powered by advanced language models such as ChatGPT and its variants (ChatGPT, ChatGPT 3.5, ChatGPT-4o), as well as GPT models integrated into simulators (GPT-3.5, GPT-4, GPT-4 Turbo via Miibo) [13,15,19,20,24,25,29-32]. Other modern AI systems included the virtual operative assistant [17], AIteach [26], the AI-assisted procedural method “Waseda-Kyoto-Kagaka Suture No. 2 Refined II” [28], and a social robotic platform combined with large language models [14]. Symbolic AI simulators, comprising 4 systems, included immersive virtual platforms such as Body Interact [23,27], MPathic-VR [18,22], and HAMTA [21], and virtual or humanoid social robots such as Safir, Medobot [16].
Traditional teaching methods
For comparative analysis, various methods were employed. Four studies used traditional or digital teaching approaches such as standard computer-based learning [18,22], theory courses supported by PowerPoint presentations [29], traditional lecture [21], or standard clinical training [28]. Another study used traditional simulation with role-play [31], while others employed virtual simulations without AI [15-17,30]. However, several studies did not report a control group [13,14,19,20,23-27].
Non-technical skills assessment tools
The studies identified several tools for assessing soft skills, with each study using either a single instrument or a combination of complementary approaches. Communication was assessed through tools such as the objective structured clinical examination [18,22,31,32], the mini-clinical evaluation exercise [16,29], question-answer pairs [20] and the Immersive Technology Evaluation Measure [24], generally using 5-point scales in which higher scores reflect better performance. Some tools incorporated AI, including automated assessment by GPT-4 for coding communication skills, and MPathic-VR, which combines immersive simulation with interaction analysis through a reverse scoring system (0 for the optimal response and higher scores for inappropriate responses) [18,22]. Empathic communication was measured by the Empathic Communication Coding System, which codes empathic expression according to 7 levels, as well as by objective structured video examinations, based on audio and video recordings and automatically scored from 0 (unacceptable) to 3 (totally acceptable) on 5 dimensions [16].
Clinical reasoning was assessed through various approaches. AIteach analyzed 5 key indicators—rigor, logic, systematicity, agility, and breadth of knowledge—expressed as percentages from 0 to 100 [26]. Semi-structured interviews were also conducted to evaluate clinical reasoning [15]. A 20-item multiple-choice questionnaire scored as “correct” or “incorrect” produced a total score out of 20, with higher scores indicating stronger reasoning skills [27]. Additionally, the Clinical Reasoning Indicator–History Taking Inventory, a validated 5-point Likert scale assessing focusing questions, creating context, and securing information, was used [30].
Decision-making and conflict management were measured using questionnaires [14,23], or scripted tests combining simulation and targeted assessment [21]. Self-confidence was explored through self-assessment questionnaires using a 5-point Likert scale [24]. Emotional regulation was assessed using the Measurement of Emotions Scale, based on a 5-point scale comprising 22 adjectives grouped into 4 categories (basic, success, epistemic, and social emotions) [17]. Finally, critical thinking was measured using the Clinical Critical Thinking Scale, with a 5-point scoring system for an overall score of 100 [29]. Finally, several cross-sectional questionnaires assessed multiple non-technical skills simultaneously [14,23,25]. These tools demonstrated content validity supported by expert review or theoretical frameworks, and acceptable reliability, measured using Cronbach’s α or test–retest methods when available.
Results of syntheses
The majority of studies reported a positive effect of AI–based simulation on the development of non-technical skills among medical learners. The most frequently observed improvement was in communication, identified in 12 studies [14,16,18-20,22-25,29,31,32]. Regarding clinical reasoning, all studies evaluating this skill reported significant improvement [14,23,26,27,30,31]. Empathy and empathic history-taking also showed significant gains [13,16,25], while 3 studies found significant improvements in self-confidence [23,24,28]. Similarly, both studies evaluating critical thinking reported significant enhancement [25,29]. Additionally, decision-making improved in 3 of the 4 studies assessing this outcome [14,23,30]. Only one study examined conflict management skills, reporting a positive effect [23]. In contrast, emotional regulation, evaluated in a single study, showed no statistically significant difference between the experimental and control groups [17]. None of the included studies examined other soft skills, such as reflection, teamwork, leadership, and reflective practice.
Reporting biases
All included studies were published in peer-reviewed journals; however, it could not be confirmed that every measured outcome was fully reported in the published manuscripts.
Discussion
Interpretation
The cumulative evidence reviewed consistently indicates that AI-driven simulation represents a highly effective pedagogical innovation for fostering non-technical skills—including critical thinking, clinical reasoning, communication, decision-making, self-confidence, conflict management, and empathy. Communication and clinical reasoning were the most frequently assessed competencies, and both demonstrated significant improvement across studies.
ChatGPT and its variants were the predominant computational AI tools used, providing adaptive and interactive virtual patient experiences. Similarly, commercial platforms such as Body Interact and MPathic-VR, representing symbolic AI, offered structured virtual patient scenarios. These systems require learners to apply theoretical knowledge in real time, integrate analytical reasoning, and engage in decision-making under pressure. This active, experiential mode of learning has been shown to reinforce knowledge retention and deepen cognitive engagement when compared with traditional didactic approaches [18,22,23,27].
Comparison with previous studies
Twelve studies assessing communication skills generally reported improvements, highlighting the effectiveness of AI-based simulators. Zidoun and El Mardi [33] likewise found that AI-based simulators and simulated patients (SPs) enhanced history-taking skills, suggesting that conventional approaches may also be beneficial. In contrast, Harder et al. [34] reported that AI-driven virtual reality simulations were perceived as more realistic, interactive, and engaging than SPs. These findings align with the present review, which showed that control groups exposed to theory-based courses, computer-based learning, or standard clinical training demonstrated no significant improvements relative to AI-assisted groups [18,22,28,29].
Regarding clinical reasoning, all 6 studies in this review reported significant improvement. This is consistent with findings by García-Torres et al. [35], who confirmed the effectiveness of virtual patients in enhancing clinical reasoning. However, the present results differ from those of Liu et al. [36], who noted that generative AI did not consistently support this competency.
Critical thinking, a key cognitive ability, also demonstrated consistent improvements, with both studies assessing this outcome reporting positive effects. Problem-solving showed similar benefits in one study. These findings are consistent with a previous study demonstrating that AI-based tools enhanced analytical reasoning, critical thinking, and problem-solving skills in medical students [37].
Beyond critical thinking, decision-making showed improvement in 3 of the 4 studies evaluating this competency. A recent study similarly demonstrated that AI-based rule engines and machine learning tools effectively strengthen clinical decision-making [38]. Improvements in empathy were also consistently reported across 3 studies. Importantly, a study evaluating a GPT-powered conversational agent for empathic history-taking showed strong concordance between AI-generated assessments and expert evaluations, further supporting the pedagogical reliability of these systems [13].
Self-confidence consistently improved across the 3 studies evaluating this skill [23,24,28]. This finding is noteworthy, as confidence is among the most critical soft skills required in healthcare practice [39].
The positive impact of AI on soft skills can be largely attributed to its ability to create a psychologically safe learning environment in which learners can repeatedly engage in complex clinical encounters without fear of adverse consequences. Through this mechanism, AI tools allow learners to experiment with diagnostic reasoning, refine communication strategies, and receive immediate, individualized feedback on their performance [40]. Additionally, the provision of multidimensional feedback—spanning technical accuracy, communication quality, and reasoning processes—offers an advantage over passive instruction, enabling learners to identify deficiencies in both cognitive and behavioral strategies and make targeted improvements [18].
Limitations
This review, the first to examine the impact of AI-based simulation on multiple non-technical skills, nevertheless highlights several methodological limitations that temper the generalizability of its findings. The included studies demonstrated considerable heterogeneity in design, ranging from randomized controlled trials to quasi-experimental and mixed-methods investigations. The absence of control groups in many studies further weakens causal inference, making it difficult to attribute observed improvements solely to AI-driven interventions [13,14,19,20,23-27]. Compounding this limitation is the lack of standardized, validated assessment tools for non-technical skills, which introduces inconsistencies and complicates cross-study comparisons. This underscores the urgent need for consensus-based evaluation frameworks in future research. Another notable gap is the absence of studies addressing leadership, teamwork, reflection, and reflective practice—skills that are essential in medical education [41]. This omission represents a clear deficiency in the current literature, and future research must investigate how AI-based simulation can influence these domains. Emotional regulation was also underexplored, with only one study assessing this skill and reporting no significant intergroup differences [17]. This suggests that although AI systems are effective in modeling cognitive and interpersonal competencies, they may be less capable of cultivating deeper emotional skills—an area warranting further exploration. A recent study emphasized the ongoing “dearth of quantitative evidence” regarding the long-term, verifiable impacts of AI models on learner outcomes [42], highlighting the need for longitudinal, rigorously controlled research to validate AI’s contribution to sustained professional development.
Implications
Collectively, these findings position AI-driven simulation as a transformative tool for advancing medical education, particularly in the domain of non-technical skills. To fully leverage its potential, future research must address the methodological limitations identified in the existing literature. Specifically, there is a need for larger, multicenter studies with robust control groups, standardized assessment instruments, and long-term follow-up to establish both the efficacy and durability of outcomes. Research should also explore hybrid instructional models in which AI-based simulation is integrated with human-led debriefing, reflective practice, and emotional skills coaching, thereby targeting complex domains such as empathy, reflective practice, and emotional regulation. Ethical considerations—including algorithmic transparency, equitable access to AI-enhanced learning tools, and safeguards against bias in training content—must also be prioritized [43]. Ultimately, AI-driven simulation should be viewed not as a replacement for human educators but as a powerful complement that offers personalized, adaptive, and scalable learning opportunities [44]. When thoughtfully implemented, these technologies can help cultivate healthcare professionals who are not only clinically competent but also communicatively skilled, emotionally intelligent, and well-equipped to address the humanistic dimensions of patient care.
Conclusion
AI-powered medical simulations are transforming education by offering personalized, adaptive, and immersive learning experiences that enhance non-technical skills, particularly communication and clinical reasoning. These AI-based platforms allow learners to practice complex clinical cases safely, receive immediate feedback, and build confidence and competence more effectively than with traditional methods. They also support self-directed learning by tailoring scenarios to individual needs and enabling flexible, on-demand access. Despite their promise, important challenges remain, including variability in study designs, ethical concerns, and limited exploration of certain cognitive and emotional skills. Future research should prioritize robust, multisite studies and integrate AI tools with human-guided reflection to optimize learning outcomes. Overall, AI-enabled simulation represents a scalable, learner-centered approach that substantially enhances healthcare training and better prepares learners for the complexities of modern clinical practice.
Notes
Authors’ contributions
Conceptualization: SL, YE, LL, HN. Data curation: SL, YE, LL. Methodology/formal analysis/validation: SL, YE, LL, IC, HN. Project administration: SL, LL. Funding acquisition: SL, YE, LL. Writing–original draft: SL, YE, IC. Writing–review & editing: SL, YE, LL, IC, HN.
Conflict of interest
No potential conflict of interest relevant to this article was reported.
Funding
None.
Data availability
Not applicable.
Acknowledgments
None.
Supplementary materials
Supplement files are available from https://doi.org/10.7910/DVN/3ATEYC
Supplement 4. Audio recording of the abstract.
