As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, it is important to understand their safety. While initial steps have been taken to evaluate the safety of general-knowledge LLMs, exposing some weaknesses, the safety of medical LLMs has not been sufficiently evaluated despite their high risks to personal health and safety, public health and safety, patient rights, and human rights. To address this gap, we conduct, to our knowledge, the first study of its kind to evaluate and improve the safety of medical LLMs. We find that 1) current medical LLMs do not meet standards of general or medical safety, as they readily comply with harmful requests and that 2) fine-tuning medical LLMs on safety demonstrations significantly improves their safety, reducing their tendency to comply with harmful requests. In addition, we present a definition of medical safety for LLMs and develop a benchmark dataset to evaluate and train for medical safety in LLMs. Poised at the intersection of research on machine learning safety and medical machine learning, this work casts light on the status quo of the safety of medical LLMs and motivates future work in this area, mitigating the risks of harm of LLMs in medicine.
翻译:随着大语言模型(LLMs)能力不断增强并在实际场景中应用,理解其安全性至关重要。尽管已有初步研究对通用知识大语言模型进行安全性评估并发现其弱点,但医学大语言模型——尽管对个人健康与安全、公共卫生与安全、患者权利及人权具有高风险——其安全性尚未得到充分评估。为填补这一空白,我们开展了据我们所知首个评估并提升医学大语言模型安全性的研究。研究发现:(1)当前医学大语言模型未能达到通用或医学安全标准,易顺从有害请求;(2)通过安全性示范微调医学大语言模型能显著提升其安全性,降低其对有害请求的顺从倾向。此外,我们提出了面向大语言模型的医学安全性定义,并开发了一套基准数据集用于评估和训练医学大语言模型的安全性。这项立足于机器学习安全与医学机器学习交叉领域的研究,揭示了医学大语言模型安全性的现状,为该领域的后续研究提供了动力,有助于降低大语言模型在医学应用中的危害风险。