Ensuring safety of Generative AI requires a nuanced understanding of pluralistic viewpoints. In this paper, we introduce a novel data-driven approach for calibrating granular ratings in pluralistic datasets. Specifically, we address the challenge of interpreting responses of a diverse population to safety expressed via ordinal scales (e.g., Likert scale). We distill non-parametric responsiveness metrics that quantify the consistency of raters in scoring the varying levels of the severity of safety violations. Using safety evaluation of AI-generated content as a case study, we investigate how raters from different demographic groups (age, gender, ethnicity) use an ordinal scale to express their perception of the severity of violations in a pluralistic safety dataset. We apply our metrics across violation types, demonstrating their utility in extracting nuanced insights that are crucial for developing reliable AI systems in a multi-cultural contexts. We show that our approach offers improved capabilities for prioritizing safety concerns by capturing nuanced viewpoints across different demographic groups, hence improving the reliability of pluralistic data collection and in turn contributing to more robust AI evaluations.
翻译:确保生成式人工智能的安全性需要对多元观点进行细致入微的理解。本文提出了一种新颖的数据驱动方法,用于在多元数据集中校准细粒度评分。具体而言,我们解决了如何解释多样化人群通过序数尺度(例如李克特量表)表达安全感知的响应这一挑战。我们提炼出非参数响应度指标,用以量化评分者在评判不同程度安全违规严重性时评分的一致性。以人工智能生成内容的安全评估为案例研究,我们探究了来自不同人口统计群体(年龄、性别、族裔)的评分者如何使用序数尺度,在多元安全数据集中表达他们对违规行为严重程度的感知。我们将这些指标应用于不同类型的违规行为,证明了它们在提取细微洞察方面的实用性,这对于在多文化背景下开发可靠的人工智能系统至关重要。我们表明,通过捕捉不同人口统计群体的细微观点,我们的方法为优先处理安全关切提供了更强的能力,从而提高了多元数据收集的可靠性,并进而有助于进行更稳健的人工智能评估。