Large language models (LLMs) have a transformative impact on a variety of scientific tasks across disciplines including biology, chemistry, medicine, and physics. However, ensuring the safety alignment of these models in scientific research remains an underexplored area, with existing benchmarks primarily focusing on textual content and overlooking key scientific representations such as molecular, protein, and genomic languages. Moreover, the safety mechanisms of LLMs in scientific tasks are insufficiently studied. To address these limitations, we introduce SciSafeEval, a comprehensive benchmark designed to evaluate the safety alignment of LLMs across a range of scientific tasks. SciSafeEval spans multiple scientific languages-including textual, molecular, protein, and genomic-and covers a wide range of scientific domains. We evaluate LLMs in zero-shot, few-shot and chain-of-thought settings, and introduce a "jailbreak" enhancement feature that challenges LLMs equipped with safety guardrails, rigorously testing their defenses against malicious intention. Our benchmark surpasses existing safety datasets in both scale and scope, providing a robust platform for assessing the safety and performance of LLMs in scientific contexts. This work aims to facilitate the responsible development and deployment of LLMs, promoting alignment with safety and ethical standards in scientific research.
翻译:大语言模型(LLMs)对生物学、化学、医学和物理学等多个学科领域的科学任务产生了变革性影响。然而,确保这些模型在科学研究中的安全对齐仍是一个尚未充分探索的领域,现有评测基准主要关注文本内容,忽视了分子、蛋白质和基因组语言等关键科学表征。此外,LLMs在科学任务中的安全机制研究尚不充分。为应对这些局限,我们提出了SciSafeEval,这是一个旨在评估LLMs在一系列科学任务中安全对齐性的综合评测基准。SciSafeEval涵盖多种科学语言——包括文本、分子、蛋白质和基因组语言——并覆盖广泛的科学领域。我们在零样本、少样本和思维链设置下评估LLMs,并引入一种“越狱”增强功能,以挑战配备安全防护机制的LLMs,严格测试其抵御恶意意图的防御能力。我们的基准在规模和范围上均超越了现有安全数据集,为评估LLMs在科学场景下的安全性与性能提供了一个稳健的平台。本工作旨在促进LLMs的负责任开发与部署,推动其与科学研究中的安全及伦理标准对齐。