GPT-doctor: Customizing Large Language Models for Medical Consultation

The advent of Large Language Models (LLMs) has ushered in a new era for design science in Information Systems, demanding a paradigm shift in tailoring LLMs design for business contexts. This paper proposes a novel framework to customize LLMs for general business contexts that aims to achieve three fundamental objectives simultaneously: (1) aligning conversational patterns, (2) integrating in-depth domain knowledge, and (3) embodying the soft skills and core principles. We design methodologies to combine domain-specific theory with Supervised Fine Tuning (SFT) in LLMs. We instantiate our proposed framework in the context of medical consultation, creating a GPT-doctor model. Specifically, we construct a comprehensive dataset for SFT by collecting large volume of real doctors consultation records from a leading online medical consultation platform and medical knowledge from professional databases. Additionally, drawing on medical theory, we identify three soft skills and core principles of human doctors including professionalism, explainability, and emotional support, and design approaches to integrate these skills into LLMs. We demonstrate the feasibility and performance of our proposed framework using online experiments with real patients as well as evaluation by domain experts and real consumers. Results demonstrate that fine-tuned GPT-doctor performs on par with human doctors across multiple metrics including medical expertise and consumer preference. Finally, we unravel the black box and examine the sources of model performance improvement from the perspectives of horizontal conversation pattern alignment and vertical medical knowledge evolution. Our proposed framework offers step-by-step principles and guidance for customizing LLMs for real-world business problems.

翻译：大型语言模型（LLMs）的出现为信息系统中的设计科学开启了新纪元，要求我们在为商业场景定制LLMs时实现范式转变。本文提出了一种面向通用商业场景的LLMs定制框架，旨在同时实现三个基本目标：（1）对话模式对齐，（2）深度领域知识整合，以及（3）软技能与核心原则的体现。我们设计了将领域特定理论与LLMs的监督微调（SFT）相结合的方法。在医疗咨询场景中实例化所提框架，构建了GPT-doctor模型。具体而言，通过从主流在线医疗咨询平台收集大量真实医生咨询记录，并从专业数据库获取医学知识，构建了用于SFT的综合数据集。此外，基于医学理论，我们识别了人类医生的三项软技能与核心原则——专业性、可解释性和情感支持，并设计了将这些技能融入LLMs的方法。通过真实患者在线实验、领域专家及真实消费者评估，验证了所提框架的可行性与性能。结果表明，经过微调的GPT-doctor在医疗专业性与消费者偏好等多维度指标上达到与人类医生相当的水平。最后，我们揭开了黑箱，从水平对话模式对齐与垂直医学知识演化两个视角，解析了模型性能提升的根源。本框架为针对真实商业问题定制LLMs提供了循序渐进的准则与指导。