心智理论引导的策略自适应用于零样本协作 (Theory of Mind Guided Strategy Adaptation for Zero-Shot Coordination) - 专知论文

会员服务 ·

0

样本 · 协作 · 零样本 · 自适应 · 心智理论 ·

Theory of Mind Guided Strategy Adaptation for Zero-Shot Coordination

翻译：心智理论引导的策略自适应用于零样本协作

Andrew Ni,Simon Stepputtis,Stefanos Nikolaidis,Michael Lewis,Katia P. Sycara,Woojun Kim

from arxiv, Accepted at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

A central challenge in multi-agent reinforcement learning is enabling agents to adapt to previously unseen teammates in a zero-shot fashion. Prior work in zero-shot coordination often follows a two-stage process, first generating a diverse training pool of partner agents, and then training a best-response agent to collaborate effectively with the entire training pool. While many previous works have achieved strong performance by devising better ways to diversify the partner agent pool, there has been less emphasis on how to leverage this pool to build an adaptive agent. One limitation is that the best-response agent may converge to a static, generalist policy that performs reasonably well across diverse teammates, rather than learning a more adaptive, specialist policy that can better adapt to teammates and achieve higher synergy. To address this, we propose an adaptive ensemble agent that uses Theory-of-Mind-based best-response selection to first infer its teammate's intentions and then select the most suitable policy from a policy ensemble. We conduct experiments in the Overcooked environment to evaluate zero-shot coordination performance under both fully and partially observable settings. The empirical results demonstrate the superiority of our method over a single best-response baseline.

翻译：多智能体强化学习中的一个核心挑战是使智能体能够以零样本方式适应先前未见过的队友。零样本协作的先前工作通常遵循两阶段过程：首先生成多样化的合作伙伴智能体训练池，然后训练一个最佳响应智能体以与整个训练池有效协作。尽管许多先前研究通过设计更好的合作伙伴智能体池多样化方法取得了强劲性能，但如何利用该池构建自适应智能体的关注较少。一个局限性在于，最佳响应智能体可能收敛于一个静态的通用策略，该策略在不同队友间表现尚可，而非学习更具适应性、能更好适应队友并实现更高协同的专家策略。为解决此问题，我们提出一种自适应集成智能体，其使用基于心智理论的最佳响应选择机制：首先推断队友意图，然后从策略集成中选择最合适的策略。我们在Overcooked环境中进行实验，评估完全可观测与部分可观测设置下的零样本协作性能。实证结果表明，我们的方法优于单一最佳响应基线。

0

相关内容

零样本量化：综述

零样本量化：综述

专知会员服务

12+阅读 · 2025年5月15日

中文版 | 集中式与分布式多智能体AI协调策略

中文版 | 集中式与分布式多智能体AI协调策略

专知会员服务

19+阅读 · 2025年5月8日

《多智能体系统的神经协调：多领域任务环境中基于深度学习的智能体最优选择框架》

《多智能体系统的神经协调：多领域任务环境中基于深度学习的智能体最优选择框架》

专知会员服务

25+阅读 · 2025年5月7日

【NTU博士论文】基于协作式多智能体强化学习的决策制定

【NTU博士论文】基于协作式多智能体强化学习的决策制定

专知会员服务

40+阅读 · 2025年4月21日

面向关系建模的合作多智能体深度强化学习综述

面向关系建模的合作多智能体深度强化学习综述

专知会员服务

39+阅读 · 2025年4月18日

《开放环境下协作多智能体强化学习研究进展综述》南大最新62页长综述

《开放环境下协作多智能体强化学习研究进展综述》南大最新62页长综述

专知会员服务

63+阅读 · 2024年2月2日

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

专知会员服务

38+阅读 · 2022年7月21日

推荐！中文版《作战战略机动的多智能体协作强化学习研究综述》美国陆军研究实验室最新33页研究报告

推荐！中文版《作战战略机动的多智能体协作强化学习研究综述》美国陆军研究实验室最新33页研究报告

专知会员服务

312+阅读 · 2022年6月23日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

【强化学习研讨会|Microsoft Research】多智能体强化学习 Scalable and Robust Multi-Agent Reinforcement Learning，46页pdf，美国东北大学|Christopher Amato

【强化学习研讨会|Microsoft Research】多智能体强化学习 Scalable and Robust Multi-Agent Reinforcement Learning，46页pdf，美国东北大学|Christopher Amato

专知会员服务

26+阅读 · 2019年10月3日

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

专知

14+阅读 · 2022年10月15日

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

专知

29+阅读 · 2022年9月19日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

零样本图像识别综述论文

零样本图像识别综述论文

专知

22+阅读 · 2020年4月4日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

领域自适应学习论文大列表

领域自适应学习论文大列表

专知

71+阅读 · 2019年3月2日

新加坡南洋理工最新37页《零样本学习综述》论文

新加坡南洋理工最新37页《零样本学习综述》论文

专知

104+阅读 · 2019年2月27日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

基于适应度值的信息反馈型群智能算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

输入约束下的多智能体系统完全分布式协调控制研究

国家自然科学基金

5+阅读 · 2015年12月31日

随机环境下多个体系统集体行为分析、调控与优化

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

多智能体系统有限时间一致性的自适应控制研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向主体行为网的自适应作战机理研究

国家自然科学基金

24+阅读 · 2014年12月31日

带有通信量化和延时的多智能体系统一致性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation

Arxiv

0+阅读 · 2月18日

Multi-agent cooperation through in-context co-player inference

Arxiv

0+阅读 · 2月18日

Guided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience Retrieval

Arxiv

0+阅读 · 2月14日

Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings

Arxiv

0+阅读 · 2月13日

Towards Adaptive Environment Generation for Training Embodied Agents

Arxiv

0+阅读 · 2月6日

Zero-Shot Off-Policy Learning

Arxiv

0+阅读 · 2月2日

Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration

Arxiv

0+阅读 · 2月1日

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

Arxiv

0+阅读 · 1月27日

Continual Knowledge Adaptation for Reinforcement Learning

Arxiv

0+阅读 · 1月20日

Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance

Arxiv

0+阅读 · 1月13日

VIP会员

文章信息

相关主题

相关VIP内容

零样本量化：综述

零样本量化：综述

专知会员服务

12+阅读 · 2025年5月15日

中文版 | 集中式与分布式多智能体AI协调策略

中文版 | 集中式与分布式多智能体AI协调策略

专知会员服务

19+阅读 · 2025年5月8日

《多智能体系统的神经协调：多领域任务环境中基于深度学习的智能体最优选择框架》

《多智能体系统的神经协调：多领域任务环境中基于深度学习的智能体最优选择框架》

专知会员服务

25+阅读 · 2025年5月7日

【NTU博士论文】基于协作式多智能体强化学习的决策制定

【NTU博士论文】基于协作式多智能体强化学习的决策制定

专知会员服务

40+阅读 · 2025年4月21日

面向关系建模的合作多智能体深度强化学习综述

面向关系建模的合作多智能体深度强化学习综述

专知会员服务

39+阅读 · 2025年4月18日

《开放环境下协作多智能体强化学习研究进展综述》南大最新62页长综述

《开放环境下协作多智能体强化学习研究进展综述》南大最新62页长综述

专知会员服务

63+阅读 · 2024年2月2日

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

专知会员服务

38+阅读 · 2022年7月21日

推荐！中文版《作战战略机动的多智能体协作强化学习研究综述》美国陆军研究实验室最新33页研究报告

推荐！中文版《作战战略机动的多智能体协作强化学习研究综述》美国陆军研究实验室最新33页研究报告

专知会员服务

312+阅读 · 2022年6月23日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

【强化学习研讨会|Microsoft Research】多智能体强化学习 Scalable and Robust Multi-Agent Reinforcement Learning，46页pdf，美国东北大学|Christopher Amato

【强化学习研讨会|Microsoft Research】多智能体强化学习 Scalable and Robust Multi-Agent Reinforcement Learning，46页pdf，美国东北大学|Christopher Amato

专知会员服务

26+阅读 · 2019年10月3日

热门VIP内容

开通专知VIP会员享更多权益服务

《可信人工智能赋能系统的支柱》

《从经典神经网络到不确定性下的拓扑神经网络：军事应用》2026最新40页报告

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

《人工智能：对战略与力量的影响》slides

相关资讯

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

专知

14+阅读 · 2022年10月15日

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

专知

29+阅读 · 2022年9月19日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

零样本图像识别综述论文

零样本图像识别综述论文

专知

22+阅读 · 2020年4月4日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

领域自适应学习论文大列表

领域自适应学习论文大列表

专知

71+阅读 · 2019年3月2日

新加坡南洋理工最新37页《零样本学习综述》论文

新加坡南洋理工最新37页《零样本学习综述》论文

专知

104+阅读 · 2019年2月27日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

相关论文

SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation

Arxiv

0+阅读 · 2月18日

Multi-agent cooperation through in-context co-player inference

Arxiv

0+阅读 · 2月18日

Guided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience Retrieval

Arxiv

0+阅读 · 2月14日

Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings

Arxiv

0+阅读 · 2月13日

Towards Adaptive Environment Generation for Training Embodied Agents

Arxiv

0+阅读 · 2月6日

Zero-Shot Off-Policy Learning

Arxiv

0+阅读 · 2月2日

Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration

Arxiv

0+阅读 · 2月1日

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

Arxiv

0+阅读 · 1月27日

Continual Knowledge Adaptation for Reinforcement Learning

Arxiv

0+阅读 · 1月20日

Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance

Arxiv

0+阅读 · 1月13日

相关基金

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

基于适应度值的信息反馈型群智能算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

输入约束下的多智能体系统完全分布式协调控制研究

国家自然科学基金

5+阅读 · 2015年12月31日

随机环境下多个体系统集体行为分析、调控与优化

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

多智能体系统有限时间一致性的自适应控制研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向主体行为网的自适应作战机理研究

国家自然科学基金

24+阅读 · 2014年12月31日

带有通信量化和延时的多智能体系统一致性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员