The rapid development of generative AI has brought value- and ethics-related risks to the forefront, making value safety a critical concern while a unified consensus remains lacking. In this work, we propose an internationally inclusive and resilient unified value framework, the GenAI Value Safety Scale (GVS-Scale): Grounded in a lifecycle-oriented perspective, we develop a taxonomy of GenAI value safety risks and construct the GenAI Value Safety Incident Repository (GVSIR), and further derive the GVS-Scale through grounded theory and operationalize it via the GenAI Value Safety Benchmark (GVS-Bench). Experiments on mainstream text generation models reveal substantial variation in value safety performance across models and value categories, indicating uneven and fragmented value alignment in current systems. Our findings highlight the importance of establishing shared safety foundations through dialogue and advancing technical safety mechanisms beyond reactive constraints toward more flexible approaches. Data and evaluation guidelines are available at https://github.com/acl2026/GVS-Bench. This paper includes examples that may be offensive or harmful.
翻译:生成式人工智能的快速发展使得价值与伦理相关风险日益凸显,在缺乏统一共识的背景下,价值安全已成为关键议题。本研究提出一个具有国际包容性与韧性的统一价值框架——生成式人工智能价值安全尺度(GVS-Scale):基于生命周期视角,我们构建了生成式人工智能价值安全风险分类体系,建立了生成式人工智能价值安全事件库(GVSIR),进而通过扎根理论推导出GVS-Scale,并借助生成式人工智能价值安全基准(GVS-Bench)实现其操作化。对主流文本生成模型的实验表明,不同模型及价值类别的安全性能存在显著差异,反映出当前系统价值对齐的不均衡与碎片化现状。本研究结果凸显了通过对话建立共同安全基础的重要性,并指出需推动技术安全机制从被动约束向更灵活的方法演进。数据与评估指南详见 https://github.com/acl2026/GVS-Bench。本文包含可能具有冒犯性或危害性的示例。