The rapid advancement of language models (LMs) necessitates robust alignment with diverse user values. However, current preference optimization approaches often fail to capture the plurality of user opinions, instead reinforcing majority viewpoints and marginalizing minority perspectives. We introduce PERSONA, a reproducible test bed designed to evaluate and improve pluralistic alignment of LMs. We procedurally generate diverse user profiles from US census data, resulting in 1,586 synthetic personas with varied demographic and idiosyncratic attributes. We then generate a large-scale evaluation dataset containing 3,868 prompts and 317,200 feedback pairs obtained from our synthetic personas. Leveraging this dataset, we systematically evaluate LM capabilities in role-playing diverse users, verified through human judges, and the establishment of both a benchmark, PERSONA Bench, for pluralistic alignment approaches as well as an extensive dataset to create new and future benchmarks. The full dataset and benchmarks are available here: https://www.synthlabs.ai/research/persona.
翻译:语言模型(LM)的快速发展要求其与多样化的用户价值观实现稳健对齐。然而,当前的偏好优化方法往往未能捕捉用户意见的多元性,反而强化了主流观点并边缘化了少数派视角。我们提出了PERSONA,一个旨在评估和改进语言模型多元对齐能力的可复现测试平台。我们基于美国人口普查数据程序化地生成了多样化的用户画像,从而得到了1,586个具有不同人口统计特征和独特属性的合成角色。随后,我们生成了一个大规模评估数据集,包含来自这些合成角色的3,868个提示和317,200个反馈对。利用该数据集,我们系统性地评估了语言模型在扮演多样化用户角色方面的能力,并通过人工评估进行了验证。同时,我们建立了一个用于评估多元对齐方法的基准——PERSONA Bench,以及一个可用于创建当前及未来新基准的广泛数据集。完整数据集和基准测试可通过以下链接获取:https://www.synthlabs.ai/research/persona。