A Roadmap to Pluralistic Alignment

Taylor Sorensen,Jared Moore,Jillian Fisher,Mitchell Gordon,Niloofar Mireshghallah,Christopher Michael Rytting,Andre Ye,Liwei Jiang,Ximing Lu,Nouha Dziri,Tim Althoff,Yejin Choi

With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution. We also propose and formalize three possible classes of pluralistic benchmarks: 1) Multi-objective benchmarks, 2) Trade-off steerable benchmarks, which incentivize models to steer to arbitrary trade-offs, and 3) Jury-pluralistic benchmarks which explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures might reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment.

翻译：随着人工智能系统能力的增强与普及，确保其服务于拥有多元价值观和视角的全体人群变得前所未有的重要。然而，使模型对齐多元人类价值观仍是一个开放的研究问题。本文以语言模型为试验平台，提出了一条多元对齐的路线图。我们识别并形式化了在人工智能系统中定义和实现多元性的三种可行方式：1）奥弗顿多元模型，呈现合理响应的光谱；2）可操控多元模型，能够导向反映特定视角；3）分布多元模型，在分布上对特定群体实现良好校准。同时，我们提出并形式化了三类多元基准：1）多目标基准，2）权衡可操控基准（激励模型导向任意权衡），3）陪审团多元基准（显式建模多元人类评分）。基于该框架，我们认为当前对齐技术在实现多元人工智能方面可能存在根本性局限；通过自身实验及其他研究的实证证据表明，标准对齐流程可能削弱模型的分布多元性，这凸显了开展多元对齐进一步研究的必要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/