The use of large language models in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we propose an approach called preference learning from process feedback~(PLPF), which integrates the doctor's diagnostic logic into LLMs. PLPF involves rule modeling, preference data generation, and preference alignment to train the model to adhere to the diagnostic process. Experimental results using Standardized Patient Testing show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%, outperforming traditional reinforcement learning from human feedback. Additionally, PLPF demonstrates effectiveness in both multi-round and single-round dialogue tasks, showcasing its potential for improving medical dialogue generation.
翻译:大语言模型在医学对话生成中的应用受到广泛关注,重点在于提升回答质量和流畅性。尽管已有研究在优化模型性能以完成单轮医学问答任务方面取得进展,但需增强模型在多轮对话中的能力,以避免逻辑不一致。为此,我们提出一种名为"基于过程反馈的偏好学习"(PLPF)的方法,将医生的诊断逻辑融入大语言模型。PLPF包括规则建模、偏好数据生成和偏好对齐,以训练模型遵循诊断流程。使用标准化病人测试的实验结果表明,PLPF将基线模型在医学对话中的诊断准确率提升了17.6%,优于传统的基于人类反馈的强化学习方法。此外,PLPF在多轮和单轮对话任务中均表现出有效性,展示了其在改善医学对话生成方面的潜力。