Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a monolith. Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We address this gap by adapting a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals across various problem areas. Misalignment scores in our framework depend on the observed agent population, the domain in question, and conflict between agents' weighted preferences. Through simulations, we demonstrate how our model captures intuitive aspects of misalignment across different scenarios. We then apply our model to two case studies, including an autonomous vehicle setting, showcasing its practical utility. Our approach offers enhanced explanatory power for complex sociotechnical environments and could inform the design of more aligned AI systems in real-world applications.

翻译：现有关于对齐问题的研究主要集中于：(1) 对齐问题的定性描述；(2) 通过关注价值规范与学习，试图使人工智能行为与人类利益对齐；和/或 (3) 聚焦于单个智能体或将人类视为整体。近期的社会技术方法强调需要理解多个人工智能与人类智能体之间复杂的错位现象。为填补这一空白，我们将计算社会科学中的人类争议模型应用于对齐问题。该模型能够量化具有潜在冲突目标的大型多样化智能体群体在不同问题领域的错位程度。在我们的框架中，错位分数取决于观察到的智能体群体、所涉领域以及智能体加权偏好之间的冲突。通过仿真实验，我们展示了该模型如何捕捉不同场景下错位现象的直观特征。随后，我们将模型应用于两个案例研究（包括自动驾驶场景），验证了其实用价值。该方法为复杂社会技术环境提供了更强的解释力，可为现实应用中设计更对齐的人工智能系统提供参考。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日