Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred information. To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies, and a planning module that employs Monte Carlo Tree Search (MCTS) to identify the best response. Our approach improves efficiency by updating beliefs about others' goals both across and within episodes and by using information from the opponent modeling module to guide planning. Experimental results demonstrate that in mixed-motive environments, HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios. Furthermore, the emergence of social intelligence during our experiments underscores the potential of our approach in complex multi-agent environments.

翻译：尽管多智能体强化学习（MARL）算法近期取得了诸多成功，但在混合动机环境中高效适应其他智能体仍是一个重大挑战。一种可行的方法是通过推断其他智能体的特征来分层建模其行为。然而，这些方法通常在高效推理和利用推断信息方面存在困难。为解决这些问题，我们提出了一种新颖的多智能体决策算法——分层对手建模与规划（HOP），该算法能够在混合动机环境中实现对未见策略的少样本适应。HOP 由两个模块分层构成：一个对手建模模块，用于推断其他智能体的目标并学习相应的目标条件策略；以及一个规划模块，采用蒙特卡洛树搜索（MCTS）来识别最优响应。我们的方法通过在回合间和回合内更新对其他智能体目标的信念，并利用对手建模模块的信息来指导规划，从而提高了效率。实验结果表明，在混合动机环境中，HOP 在与各种未见智能体交互时展现出卓越的少样本适应能力，并在自我对弈场景中表现优异。此外，实验过程中涌现的社会智能凸显了我们的方法在复杂多智能体环境中的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日