Generative AI models ought to be useful and safe across cross-cultural contexts. One critical step toward this goal is understanding how AI models adhere to sociocultural norms. While this challenge has gained attention in NLP, existing work lacks both nuance and coverage in understanding and evaluating models' norm adherence. We address these gaps by introducing a taxonomy of norms that clarifies their contexts (e.g., distinguishing between human-human norms that models should recognize and human-AI interactional norms that apply to the human-AI interaction itself), specifications (e.g., relevant domains), and mechanisms (e.g., modes of enforcement). We demonstrate how our taxonomy can be operationalized to automatically evaluate models' norm adherence in naturalistic, open-ended settings. Our exploratory analyses suggest that state-of-the-art models frequently violate norms, though violation rates vary by model, interactional context, and country. We further show that violation rates also vary by prompt intent and situational framing. Our taxonomy and demonstrative evaluation pipeline enable nuanced, context-sensitive evaluation of cultural norm adherence in realistic settings.
翻译:生成式人工智能模型应当在跨文化环境中既实用又安全。实现这一目标的关键一步在于理解AI模型如何遵循社会文化规范。尽管这一挑战已在自然语言处理领域受到关注,但现有工作在理解和评估模型规范遵循方面既缺乏细致性也缺乏覆盖度。我们通过引入一种规范分类法来弥补这些不足,该分类法明确了规范的语境(例如,区分模型应当识别的人与人之间的规范与适用于人机交互本身的人机交互规范)、规范说明(例如,相关领域)以及实施机制(例如,执行模式)。我们展示了如何将这一分类法操作化,以在自然、开放式的环境中自动评估模型的规范遵循情况。我们的探索性分析表明,最先进的模型经常违反规范,尽管违规率因模型、交互语境和国家而异。我们进一步证明,违规率还因提示意图和情境框架的不同而变化。我们的分类法和示范性评估流程能够实现对现实场景中文化规范遵循情况的细致、语境敏感的评价。