Culturally aware safeguards are crucial for AI alignment in real-world settings, where safety extends beyond common sense and encompasses diverse local values, norms, and region-specific regulations. However, building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators. Consequently, many safeguard models rely on machine translation of English datasets, often missing regional and cultural nuances. We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia (SEA). On this foundation, we introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts. Evaluated across multiple benchmarks and cultural variants, SEA-Guard consistently outperforms existing safeguards at detecting regionally sensitive or harmful content while maintaining strong general safety performance.
翻译:在现实世界场景中,具备文化意识的安全防护机制对于人工智能对齐至关重要,因为安全性不仅涉及常识,还涵盖多样化的本土价值观、社会规范以及地区特定法规。然而,由于资源有限且母语标注者稀缺,构建大规模文化根基数据集面临挑战。因此,许多安全防护模型依赖英语数据集的机器翻译,往往遗漏地区与文化层面的细微差异。本文提出一种新型智能体数据生成框架,可扩展地为东南亚地区创建真实且具有区域针对性的安全数据集。在此基础上,我们推出了SEA-Guard系列模型——首个基于东南亚文化背景构建的多语言安全防护模型。通过在多个基准测试及文化变体上的评估,SEA-Guard在检测区域敏感性或有害内容方面持续优于现有防护模型,同时保持卓越的通用安全性能。