Paper
Document
Download
Flag content
0

Evaluation of a novel large language model (LLM)-powered chatbot for oral boards scenarios

0
TipTip
Save
Document
Download
Flag content

Abstract

Abstract Purpose While previous studies have demonstrated that generative artificial intelligence (AI) can pass medical licensing exams, AI’s role as an examiner in complex, interactive assessments remains unknown. AI-powered chatbots could serve as educational tools to simulate oral examination dialogues. Here, we present initial validity evidence for an AI-powered chatbot designed for general surgery residents to prepare for the American Board of Surgery (ABS) Certifying Exam (CE). Methods We developed a chatbot using GPT-4 to simulate oral board scenarios. Scenarios were completed by general surgery residents from six different institutions. Two experienced surgeons evaluated the chatbot across five domains: inappropriate content, missing content, likelihood of harm, extent of harm, and hallucinations. We measured inter-rater reliability to determine evaluation consistency. Results Seventeen residents completed a total of 20 scenarios. Commonly tested topics included small bowel obstruction (30%), diverticulitis (20%), and breast disease (15%). Based on two independent reviewers, evaluation revealed 11–25% of chatbot simulations had no errors and an additional 11%–35% contained errors of minimal clinical significance. The chatbot limitations included incorrect management advice and critical omissions of information. Conclusions This study demonstrates the potential of an AI-powered chatbot in enhancing surgical education through oral board simulations. Despite challenges in accuracy and safety, the chatbot offers a novel approach to medical education, underscoring the need for further refinement and standardized evaluation frameworks. Incorporating domain-specific knowledge and expert insights is crucial for improving the efficacy of AI tools in medical education.

Paper PDF

Empty State
This PDF hasn't been uploaded yet.
Do not upload any copyrighted content to the site, only open-access content.
or