Recent advances in Large Language Models (LLMs) have achieved remarkable breakthroughs in understanding and responding to user intents. However, their performance lag behind general use cases in some expertise domains, such as Chinese medicine. Existing efforts to incorporate Chinese medicine into LLMs rely on Supervised Fine-Tuning (SFT) with single-turn and distilled dialogue data. These models lack the ability for doctor-like proactive inquiry and multi-turn comprehension and cannot align responses with experts' intentions. In this work, we introduce Zhongjing, the first Chinese medical LLaMA-based LLM that implements an entire training pipeline from continuous pre-training, SFT, to Reinforcement Learning from Human Feedback (RLHF). Additionally, we construct a Chinese multi-turn medical dialogue dataset of 70,000 authentic doctor-patient dialogues, CMtMedQA, which significantly enhances the model's capability for complex dialogue and proactive inquiry initiation. We also define a refined annotation rule and evaluation criteria given the unique characteristics of the biomedical domain. Extensive experimental results show that Zhongjing outperforms baselines in various capacities and matches the performance of ChatGPT in some abilities, despite the 100x parameters. Ablation studies also demonstrate the contributions of each component: pre-training enhances medical knowledge, and RLHF further improves instruction-following ability and safety. Our code, datasets, and models are available at https://github.com/SupritYoung/Zhongjing.
翻译:近年来,大语言模型在理解与响应用户意图方面取得了显著突破。然而,在中医等专业领域,其性能仍落后于通用场景。现有将中医融入大语言模型的工作主要依赖单轮和蒸馏对话数据的监督微调,导致模型缺乏医生式的主动询问与多轮理解能力,且无法使响应与专家意图对齐。本文提出首个基于LLaMA的中医大语言模型——中景,实现从持续预训练、监督微调到基于人类反馈的强化学习(RLHF)的完整训练流程。我们构建了包含7万组真实医患对话的中文多轮医学对话数据集CMtMedQA,显著增强了模型的复杂对话与主动询问启动能力。同时,针对生物医学领域的独特性,定义了精细化标注规则与评估标准。广泛实验表明,尽管参数量仅为ChatGPT的1%,中景在多项能力上超越基线模型,并在部分任务中达到与之相当的性能。消融实验进一步验证了各组件贡献:预训练增强医学知识,RLHF提升指令遵循能力与安全性。我们的代码、数据集及模型已开源至https://github.com/SupritYoung/Zhongjing。