Artificially intelligent agents deployed in the real-world will require the ability to reliably \textit{cooperate} with humans (as well as other, heterogeneous AI agents). To provide formal guarantees of successful cooperation, we must make some assumptions about how partner agents could plausibly behave. Any realistic set of assumptions must account for the fact that other agents may be just as adaptable as our agent is. In this work, we consider the problem of cooperating with a \textit{population} of agents in a finitely-repeated, two player general-sum matrix game with private utilities. Two natural assumptions in such settings are that: 1) all agents in the population are individually rational learners, and 2) when any two members of the population are paired together, with high-probability they will achieve at least the same utility as they would under some Pareto efficient equilibrium strategy. Our results first show that these assumptions alone are insufficient to ensure \textit{zero-shot} cooperation with members of the target population. We therefore consider the problem of \textit{learning} a strategy for cooperating with such a population using prior observations its members interacting with one another. We provide upper and lower bounds on the number of samples needed to learn an effective cooperation strategy. Most importantly, we show that these bounds can be much stronger than those arising from a "naive'' reduction of the problem to one of imitation learning.
翻译:现实世界中部署的人工智能体需要具备与人类(以及其他异构AI智能体)可靠合作的能力。为提供成功合作的形式化保证,我们必须对伙伴智能体的可能行为作出若干假设。任何现实的假设集合都必须考虑其他智能体可能与我们自身智能体具有同等适应能力这一事实。本文研究在具有私有效用函数的有限重复双人一般和矩阵博弈中,与智能体群体合作的问题。此类情境下两个自然假设是:1)群体中所有智能体均为个体理性学习者;2)当群体中任意两个成员配对时,他们以高概率获得的效用至少不低于某些帕累托有效均衡策略下的效用。我们的研究结果首先表明,仅凭这些假设不足以确保与目标群体成员的零样本合作。因此,我们探讨如何利用群体成员相互交互的先验观测数据来学习与此类群体合作的策略。我们给出了学习有效合作策略所需样本数量的上界与下界。最重要的是,我们证明这些界限可能远强于将问题简单归约为模仿学习所得到的界限。