Seeking Human Security Consensus: A Unified Value Scale for Generative AI Value Safety

The rapid development of generative AI has brought value- and ethics-related risks to the forefront, making value safety a critical concern while a unified consensus remains lacking. In this work, we propose an internationally inclusive and resilient unified value framework, the GenAI Value Safety Scale (GVS-Scale): Grounded in a lifecycle-oriented perspective, we develop a taxonomy of GenAI value safety risks and construct the GenAI Value Safety Incident Repository (GVSIR), and further derive the GVS-Scale through grounded theory and operationalize it via the GenAI Value Safety Benchmark (GVS-Bench). Experiments on mainstream text generation models reveal substantial variation in value safety performance across models and value categories, indicating uneven and fragmented value alignment in current systems. Our findings highlight the importance of establishing shared safety foundations through dialogue and advancing technical safety mechanisms beyond reactive constraints toward more flexible approaches. Data and evaluation guidelines are available at https://github.com/acl2026/GVS-Bench. This paper includes examples that may be offensive or harmful.

翻译：生成式人工智能的快速发展使得价值与伦理相关风险日益凸显，在缺乏统一共识的背景下，价值安全已成为关键议题。本研究提出一个具有国际包容性与韧性的统一价值框架——生成式人工智能价值安全尺度（GVS-Scale）：基于生命周期视角，我们构建了生成式人工智能价值安全风险分类体系，建立了生成式人工智能价值安全事件库（GVSIR），进而通过扎根理论推导出GVS-Scale，并借助生成式人工智能价值安全基准（GVS-Bench）实现其操作化。对主流文本生成模型的实验表明，不同模型及价值类别的安全性能存在显著差异，反映出当前系统价值对齐的不均衡与碎片化现状。本研究结果凸显了通过对话建立共同安全基础的重要性，并指出需推动技术安全机制从被动约束向更灵活的方法演进。数据与评估指南详见 https://github.com/acl2026/GVS-Bench。本文包含可能具有冒犯性或危害性的示例。

相关内容

生成式人工智能

关注 38

生成式人工智能是利用复杂的算法、模型和规则，从大规模数据集中学习，以创造新的原创内容的人工智能技术。这项技术能够创造文本、图片、声音、视频和代码等多种类型的内容，全面超越了传统软件的数据处理和分析能力。2022年末，OpenAI推出的ChatGPT标志着这一技术在文本生成领域取得了显著进展，2023年被称为生成式人工智能的突破之年。这项技术从单一的语言生成逐步向多模态、具身化快速发展。在图像生成方面，生成系统在解释提示和生成逼真输出方面取得了显著的进步。同时，视频和音频的生成技术也在迅速发展，这为虚拟现实和元宇宙的实现提供了新的途径。生成式人工智能技术在各行业、各领域都具有广泛的应用前景。

人工智能伦理风险与治理研究

专知会员服务

20+阅读 · 2025年4月22日

《人工智能安全标准体系（V1.0）》（征求意见稿）

专知会员服务

29+阅读 · 2025年3月23日

生成式人工智能大型语言模型的安全性：概述

专知会员服务

35+阅读 · 2024年7月30日

国家标准《网络安全技术生成式人工智能服务安全基本要求》征求意见稿

专知会员服务

28+阅读 · 2024年6月6日