Many settings of interest involving humans and machines -- from virtual personal assistants to autonomous vehicles -- can naturally be modelled as principals (humans) delegating to agents (machines), which then interact with each other on their principals' behalf. We refer to these multi-principal, multi-agent scenarios as delegation games. In such games, there are two important failure modes: problems of control (where an agent fails to act in line their principal's preferences) and problems of cooperation (where the agents fail to work well together). In this paper we formalise and analyse these problems, further breaking them down into issues of alignment (do the players have similar preferences?) and capabilities (how competent are the players at satisfying those preferences?). We show -- theoretically and empirically -- how these measures determine the principals' welfare, how they can be estimated using limited observations, and thus how they might be used to help us design more aligned and cooperative AI systems.
翻译:许多涉及人类与机器交互的典型场景——从虚拟个人助理到自动驾驶汽车——可自然地建模为委托人(人类)将任务委托给代理人(机器),随后代理人代表其委托人相互交互。我们将这种多委托人、多代理人场景称为委托博弈。在此类博弈中,存在两种重要的失败模式:控制问题(代理人未能按照委托人的偏好行事)与合作问题(代理人未能良好协作)。本文针对这些问题进行形式化定义与分析,并进一步将其分解为对齐性(参与者是否具有相似偏好?)与能力(参与者满足这些偏好的胜任程度如何?)两个维度。我们从理论与实证层面证明:这些度量指标如何决定委托人的福祉、如何通过有限观测进行估计,以及如何利用它们设计更具对齐性与合作性的人工智能系统。