Many emerging agentic paradigms require agents to collaborate with one another (or people) to achieve shared goals. Unfortunately, existing approaches to learning policies for such collaborative problems produce brittle solutions that fail when paired with new partners. We attribute these failures to a combination of free-riding during training and a lack of strategic robustness. To address these problems, we study the concept of strategic risk aversion and interpret it as a principled inductive bias for generalizable cooperation with unseen partners. While strategically risk-averse players are robust to deviations in their partner's behavior by design, we show that, in collaborative games, they also (1) can have better equilibrium outcomes than those at classical game-theoretic concepts like Nash, and (2) exhibit less or no free-riding. Inspired by these insights, we develop a multi-agent reinforcement learning (MARL) algorithm that integrates strategic risk aversion into standard policy optimization methods. Our empirical results across collaborative benchmarks (including an LLM collaboration task) validate our theory and demonstrate that our approach consistently achieves reliable collaboration with heterogeneous and previously unseen partners across collaborative tasks.
翻译:许多新兴的智能体范式要求智能体通过相互协作(或与人类协作)来实现共同目标。然而,现有针对此类协作问题的策略学习方法往往产生脆弱的解决方案,在与新伙伴配对时容易失效。我们将这些失败归因于训练过程中的搭便车行为以及战略鲁棒性的缺失。为解决这些问题,我们研究了战略性风险规避的概念,并将其解释为一种与未见伙伴实现可泛化协作的原则性归纳偏置。虽然战略性风险规避的参与者在设计上能对其伙伴行为偏差保持鲁棒性,但我们证明在协作博弈中,这类参与者还具有以下特征:(1)相较于纳什均衡等经典博弈论概念,能获得更优的均衡结果;(2)表现出较少或完全不存在搭便车行为。受这些发现启发,我们开发了一种多智能体强化学习算法,将战略性风险规避整合到标准策略优化方法中。我们在多个协作基准测试(包括一项大语言模型协作任务)中的实证结果验证了理论,并证明该方法能在各类协作任务中与异构及先前未见的伙伴持续实现可靠协作。