As AI systems become more integrated into society, evaluating their capacity to align with diverse cultural values is crucial for their responsible deployment. Current evaluation methods predominantly rely on multiple-choice question (MCQ) datasets. In this study, we demonstrate that MCQs are insufficient for capturing the complexity of cultural values expressed in open-ended scenarios. Our findings highlight significant discrepancies between MCQ-based assessments and the values conveyed in unconstrained interactions. Based on these findings, we recommend moving beyond MCQs to adopt more open-ended, context-specific assessments that better reflect how AI models engage with cultural values in realistic settings.
翻译:随着人工智能系统日益融入社会,评估其与多元文化价值观的契合能力对其负责任部署至关重要。当前评估方法主要依赖于多项选择题数据集。本研究证明,多项选择题不足以捕捉开放式场景中文化价值观表达的复杂性。我们的研究结果凸显了基于多项选择题的评估与无约束交互中传达的价值观之间存在显著差异。基于这些发现,我们建议超越多项选择题模式,采用更具开放性、情境针对性的评估方法,以更准确地反映人工智能模型在真实场景中与文化价值观的互动方式。