Abstract: The integration of artificial intelligence (AI) into patient education provides new opportunities for enhancing physician-patient communication and patient health outcomes. The conversational abilities of large language models (LLMs) powered by natural language processing (NLP) are uniquely suited to interact with users in an approachable and accessible way. When these capabilities are applied to the medical realm, LLMs have the potential to reduce gaps in patients' health literacy or medical knowledge and improve health outcomes. Here, we examine the potential of GPT, an LLM developed by OpenAI, to serve as a tool for patient education in the subspecialty of anesthesia. This study focuses on evaluating the accuracy, completeness, and readability of responses from three iterations of GPT—versions 3.5, 4, and a prompt-engineered GPT4 (GPT-4P) —in addressing common anesthesia questions related to total hip arthroplasty. The accuracy and completeness of responses were assessed by three regional anesthesia experts using 6-point Likert scales. Readability and word counts were analyzed using online tools. When comparing the three models, GPT-3.5 showed the highest overall accuracy and completeness, while GPT-4P had the best readability. On pairwise comparison, no single model consistently outperformed the others across all metrics. All models provided responses that were more accurate than inaccurate and more complete than incomplete. Beyond this examination, the results are heterogeneous and necessitate further studies. Though GPT-3.5 performed best in completeness and accuracy on a Type III F-test, its pairwise comparison performance against GPT-4P was only superior in completeness. Studying prompt language and expanding evaluator pools would be ideal next steps in further examination of this rapidly expanding aspect of patient care.
Support the authors with ResearchCoin