Fast Teammate Adaptation in the Presence of Sudden Policy Change - 专知论文

会员服务 ·

0

FAST · Agent · Performer · Processing（编程语言） · 控制器 ·

2023 年 5 月 10 日

Fast Teammate Adaptation in the Presence of Sudden Policy Change

翻译：快速应对策略突变情况下的队友适应

Ziqian Zhang,Lei Yuan,Lihe Li,Ke Xue,Chengxing Jia,Cong Guan,Chao Qian,Yang Yu

from arxiv, In: Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI'23), Pittsburgh, PA, 2023

In cooperative multi-agent reinforcement learning (MARL), where an agent coordinates with teammate(s) for a shared goal, it may sustain non-stationary caused by the policy change of teammates. Prior works mainly concentrate on the policy change during the training phase or teammates altering cross episodes, ignoring the fact that teammates may suffer from policy change suddenly within an episode, which might lead to miscoordination and poor performance as a result. We formulate the problem as an open Dec-POMDP, where we control some agents to coordinate with uncontrolled teammates, whose policies could be changed within one episode. Then we develop a new framework, fast teammates adaptation (Fastap), to address the problem. Concretely, we first train versatile teammates' policies and assign them to different clusters via the Chinese Restaurant Process (CRP). Then, we train the controlled agent(s) to coordinate with the sampled uncontrolled teammates by capturing their identifications as context for fast adaptation. Finally, each agent applies its local information to anticipate the teammates' context for decision-making accordingly. This process proceeds alternately, leading to a robust policy that can adapt to any teammates during the decentralized execution phase. We show in multiple multi-agent benchmarks that Fastap can achieve superior performance than multiple baselines in stationary and non-stationary scenarios.

翻译：在合作型多智能体强化学习（MARL）中，当智能体与队友为共同目标协同配合时，可能因队友策略变化而面临非平稳性问题。现有研究主要关注训练阶段的策略变化或跨回合的队友切换，却忽略了队友可能在同回合内突发策略变化的情况，这可能导致协作失调及性能下降。我们将该问题建模为开放式去中心化部分可观测马尔可夫决策过程（open Dec-POMDP），其中我们控制部分智能体与不受控队友协同，而后者的策略可在单回合内发生变化。为此，我们提出新型框架——快速队友适应（Fastap）。具体而言，首先通过中国餐馆过程（CRP）训练多样化的队友策略并将其分配至不同聚类；随后通过捕获不受控队友的标识作为上下文信息，训练受控智能体与之协同实现快速适应；最后各智能体利用局部信息预测队友上下文以作出决策。该过程交替进行，最终形成可在去中心化执行阶段适应任意队友的鲁棒策略。在多智能体基准测试中，我们证明Fastap在平稳与非平稳场景下均能取得优于多种基线的性能。

0

相关内容

FAST

FAST：Conference on File and Storage Technologies。 Explanation：文件和存储技术会议。 Publisher：USENIX。 SIT:http://dblp.uni-trier.de/db/conf/fast/

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

阿尔茨海默病PLD3基因深度测序及其罕见突变的致病机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

自闭症中基因拷贝数变异及其相互作用网络的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Fur调控霍乱弧菌生物膜形成和TCP合成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速机车轴承早期故障非线性动力学行为演化的仿真软件开发

国家自然科学基金

0+阅读 · 2012年12月31日

多功能金属螯合剂的设计及其治疗阿尔茨海默病的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

半导体基新型多功能磁靶向纳米光敏剂的光动力治疗研究

国家自然科学基金

0+阅读 · 2012年12月31日

肺癌细胞中转录因子NRF2获得性功能的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

磷酸酪氨酸磷酸酶PTP-PEST在肝癌细胞转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Effect-Invariant Mechanisms for Policy Generalization

Arxiv

0+阅读 · 2023年6月27日

Ensemble of Random and Isolation Forests for Graph-Based Intrusion Detection in Containers

Arxiv

0+阅读 · 2023年6月26日

Video object detection for privacy-preserving patient monitoring in intensive care

Arxiv

0+阅读 · 2023年6月26日

The DeCAMFounder: Non-Linear Causal Discovery in the Presence of Hidden Variables

Arxiv

0+阅读 · 2023年6月25日

On the Minimal Knowledge Required for Solving Stellar Consensus

Arxiv

0+阅读 · 2023年6月23日

Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering

Arxiv

0+阅读 · 2023年6月23日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

102+阅读 · 2022年5月11日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

41+阅读 · 2021年9月15日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

50+阅读 · 2021年1月6日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

6+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

2+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

1+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

1+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

6+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

5+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

9+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

10+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Effect-Invariant Mechanisms for Policy Generalization

Arxiv

0+阅读 · 2023年6月27日

Ensemble of Random and Isolation Forests for Graph-Based Intrusion Detection in Containers

Arxiv

0+阅读 · 2023年6月26日

Video object detection for privacy-preserving patient monitoring in intensive care

Arxiv

0+阅读 · 2023年6月26日

The DeCAMFounder: Non-Linear Causal Discovery in the Presence of Hidden Variables

Arxiv

0+阅读 · 2023年6月25日

On the Minimal Knowledge Required for Solving Stellar Consensus

Arxiv

0+阅读 · 2023年6月23日

Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering

Arxiv

0+阅读 · 2023年6月23日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

102+阅读 · 2022年5月11日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

41+阅读 · 2021年9月15日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

50+阅读 · 2021年1月6日

相关基金

阿尔茨海默病PLD3基因深度测序及其罕见突变的致病机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

自闭症中基因拷贝数变异及其相互作用网络的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Fur调控霍乱弧菌生物膜形成和TCP合成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速机车轴承早期故障非线性动力学行为演化的仿真软件开发

国家自然科学基金

0+阅读 · 2012年12月31日

多功能金属螯合剂的设计及其治疗阿尔茨海默病的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

半导体基新型多功能磁靶向纳米光敏剂的光动力治疗研究

国家自然科学基金

0+阅读 · 2012年12月31日

肺癌细胞中转录因子NRF2获得性功能的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

磷酸酪氨酸磷酸酶PTP-PEST在肝癌细胞转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员