SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks

Tianhao Li,Jingyu Lu,Chuangxin Chu,Tianyu Zeng,Yujia Zheng,Mei Li,Haotian Huang,Bin Wu,Zuoxian Liu,Kai Ma,Xuejing Yuan,Xingkai Wang,Keyan Ding,Huajun Chen,Qiang Zhang

Large language models (LLMs) have a transformative impact on a variety of scientific tasks across disciplines including biology, chemistry, medicine, and physics. However, ensuring the safety alignment of these models in scientific research remains an underexplored area, with existing benchmarks primarily focusing on textual content and overlooking key scientific representations such as molecular, protein, and genomic languages. Moreover, the safety mechanisms of LLMs in scientific tasks are insufficiently studied. To address these limitations, we introduce SciSafeEval, a comprehensive benchmark designed to evaluate the safety alignment of LLMs across a range of scientific tasks. SciSafeEval spans multiple scientific languages-including textual, molecular, protein, and genomic-and covers a wide range of scientific domains. We evaluate LLMs in zero-shot, few-shot and chain-of-thought settings, and introduce a "jailbreak" enhancement feature that challenges LLMs equipped with safety guardrails, rigorously testing their defenses against malicious intention. Our benchmark surpasses existing safety datasets in both scale and scope, providing a robust platform for assessing the safety and performance of LLMs in scientific contexts. This work aims to facilitate the responsible development and deployment of LLMs, promoting alignment with safety and ethical standards in scientific research.

翻译：大语言模型（LLMs）对生物学、化学、医学和物理学等多个学科领域的科学任务产生了变革性影响。然而，确保这些模型在科学研究中的安全对齐仍是一个尚未充分探索的领域，现有评测基准主要关注文本内容，忽视了分子、蛋白质和基因组语言等关键科学表征。此外，LLMs在科学任务中的安全机制研究尚不充分。为应对这些局限，我们提出了SciSafeEval，这是一个旨在评估LLMs在一系列科学任务中安全对齐性的综合评测基准。SciSafeEval涵盖多种科学语言——包括文本、分子、蛋白质和基因组语言——并覆盖广泛的科学领域。我们在零样本、少样本和思维链设置下评估LLMs，并引入一种“越狱”增强功能，以挑战配备安全防护机制的LLMs，严格测试其抵御恶意意图的防御能力。我们的基准在规模和范围上均超越了现有安全数据集，为评估LLMs在科学场景下的安全性与性能提供了一个稳健的平台。本工作旨在促进LLMs的负责任开发与部署，推动其与科学研究中的安全及伦理标准对齐。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/