Large Language Model (LLM) fine tuning is underutilized in the field of medicine. Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO), but there is little guidance informing users when to use either technique. In this investigation, we compare the performance of SFT and DPO for five common natural language tasks in medicine: Classification with text data, Classification with numeric data, Clinical Reasoning, Summarization, and Clinical Triage. We find that SFT alone is sufficient for Classification with text data, whereas DPO improves performance for the more complex tasks of Clinical Reasoning, Summarization and Clinical Triage. Our results establish the role and importance of DPO fine tuning within medicine, and consequently call attention to current software gaps that prevent widespread deployment of this technique.
翻译:大语言模型(LLM)微调在医学领域尚未得到充分利用。监督微调(SFT)和直接偏好优化(DPO)是两种最常见的微调方法,但关于何时使用何种技术,目前缺乏明确的指导。本研究比较了SFT和DPPO在医学领域五项常见自然语言任务中的表现:文本数据分类、数值数据分类、临床推理、文本摘要和临床分诊。我们发现,仅使用SFT足以应对文本数据分类任务,而DPO则在更复杂的临床推理、文本摘要和临床分诊任务中提升了性能。我们的研究结果确立了DPO微调在医学领域的作用与重要性,并由此指出当前软件层面的不足阻碍了该技术的广泛部署。