LongSafetyBench：长上下文大语言模型面临的安全挑战 (LongSafetyBench: Long-Context LLMs Struggle with Safety Issues)

Mianqiu Huang,Xiaoran Liu,Shaojun Zhou,Mozhi Zhang,Chenkun Tan,Pengyu Wang,Qipeng Guo,Zhe Xu,Linyang Li,Zhikai Lei,Linlin Li,Qun Liu,Yaqian Zhou,Xipeng Qiu,Xuanjing Huang

With the development of large language models (LLMs), the sequence length of these models continues to increase, drawing significant attention to long-context language models. However, the evaluation of these models has been primarily limited to their capabilities, with a lack of research focusing on their safety. Existing work, such as ManyShotJailbreak, has to some extent demonstrated that long-context language models can exhibit safety concerns. However, the methods used are limited and lack comprehensiveness. In response, we introduce \textbf{LongSafetyBench}, the first benchmark designed to objectively and comprehensively evaluate the safety of long-context models. LongSafetyBench consists of 10 task categories, with an average length of 41,889 words. After testing eight long-context language models on LongSafetyBench, we found that existing models generally exhibit insufficient safety capabilities. The proportion of safe responses from most mainstream long-context LLMs is below 50\%. Moreover, models' safety performance in long-context scenarios does not always align with that in short-context scenarios. Further investigation revealed that long-context models tend to overlook harmful content within lengthy texts. We also proposed a simple yet effective solution, allowing open-source models to achieve performance comparable to that of top-tier closed-source models. We believe that LongSafetyBench can serve as a valuable benchmark for evaluating the safety capabilities of long-context language models. We hope that our work will encourage the broader community to pay attention to the safety of long-context models and contribute to the development of solutions to improve the safety of long-context LLMs.

翻译：随着大语言模型（LLM）的发展，其序列长度持续增加，使得长上下文语言模型受到广泛关注。然而，现有评估主要集中于模型的能力，缺乏对其安全性的系统研究。已有工作（如ManyShotJailbreak）在一定程度上揭示了长上下文语言模型可能存在的安全隐患，但所采用的方法较为局限且不够全面。为此，我们提出了首个旨在客观、全面评估长上下文模型安全性的基准——**LongSafetyBench**。该基准包含10个任务类别，平均文本长度达41,889词。通过对八款长上下文语言模型在LongSafetyBench上的测试，我们发现现有模型普遍存在安全能力不足的问题：多数主流长上下文LLM的安全响应比例低于50%。此外，模型在长上下文场景下的安全表现并不总是与短上下文场景一致。进一步研究表明，长上下文模型容易忽略冗长文本中的有害内容。我们还提出了一种简单而有效的解决方案，使开源模型能够达到与顶尖闭源模型相当的性能。我们相信LongSafetyBench可为评估长上下文语言模型的安全能力提供有价值的基准，并期待这项工作能推动学界更广泛地关注长上下文模型的安全问题，共同促进提升长上下文LLM安全性的解决方案发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日