As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, it is important to understand their safety. While initial steps have been taken to evaluate the safety of general-knowledge LLMs, exposing some weaknesses, the safety of medical LLMs has not been sufficiently evaluated despite their high risks to personal health and safety, public health and safety, patient rights, and human rights. To address this gap, we conduct, to our knowledge, the first study of its kind to evaluate and improve the safety of medical LLMs. We find that 1) current medical LLMs do not meet standards of general or medical safety, as they readily comply with harmful requests and that 2) fine-tuning medical LLMs on safety demonstrations significantly improves their safety, reducing their tendency to comply with harmful requests. In addition, we present a definition of medical safety for LLMs and develop a benchmark dataset to evaluate and train for medical safety in LLMs. Poised at the intersection of research on machine learning safety and medical machine learning, this work casts light on the status quo of the safety of medical LLMs and motivates future work in this area, mitigating the risks of harm of LLMs in medicine.
翻译:随着大型语言模型(LLMs)能力不断提升并应用于实际场景,理解其安全性变得至关重要。尽管已有初步研究评估通用知识型LLM的安全性并揭示其部分缺陷,但针对医学LLM的安全性评估仍不充分——尽管这类模型在个人健康与安全、公共卫生与安全、患者权利及人权方面存在极高风险。为填补这一空白,我们开展了(据我们所知)首个系统性评估并提升医学LLM安全性的研究。研究发现:1)当前医学LLM未能达到通用安全标准或医学安全标准,表现出对有害请求的轻易遵从;2)通过安全性示范对医学LLM进行微调可显著提升其安全性,降低其遵从有害请求的倾向。此外,我们提出了LLM医学安全的定义,并构建了用于评估和训练LLM医学安全性的基准数据集。这项研究处于机器学习安全与医学机器学习研究的交叉领域,揭示了医学LLM安全性的现状,为该领域的未来研究奠定基础,从而降低LLM在医学应用中可能导致的风险。