With the advancements of artificial intelligence (AI), we're seeing more scenarios that require AI to work closely with other agents, whose goals and strategies might not be known beforehand. However, existing approaches for training collaborative agents often require defined and known reward signals and cannot address the problem of teaming with unknown agents that often have latent objectives/rewards. In response to this challenge, we propose teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction and utilizes pre-trained, goal-conditioned policies to enable zero-shot policy adaptation. We prove that unbiased reward estimates in our framework are sufficient for optimal teaming with unknown agents. We further evaluate the framework of redesigned multi-agent particle and StarCraft II micromanagement environments with diverse unknown agents of different behaviors/rewards. Empirical results demonstrate that our framework significantly advances the teaming performance of AI and unknown agents in a wide range of collaborative scenarios.
翻译:随着人工智能(AI)的进步,我们越来越多地看到需要AI与其他智能体密切协作的场景,而这些智能体的目标和策略可能事先未知。然而,现有的协作智能体训练方法通常需要定义明确的已知奖励信号,无法解决与具有潜在目标/奖励的未知智能体组队的问题。针对这一挑战,我们提出了一种与未知智能体组队的框架,该框架利用核密度贝叶斯逆学习方法进行主动目标推理,并利用预训练的、以目标为条件的策略实现零样本策略自适应。我们证明了框架中无偏的奖励估计足以实现与未知智能体的最优组队。我们进一步在重新设计的、包含多种具有不同行为/奖励的未知智能体的多智能体微粒和星际争霸II微操环境中评估了该框架。实验结果表明,我们的框架在广泛的协作场景中显著提升了AI与未知智能体的组队性能。