Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games - 专知论文

会员服务 ·

0

相互独立的 · 学习器 · 学成 · 强化学习 · 近似 ·

2022 年 1 月 21 日

Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games

翻译：粉碎游戏中的满意路径和独立多机构强化学习

Bora Yongacoglu,Gürdal Arslan,Serdar Yüksel

In multi-agent reinforcement learning (MARL), independent learners are those that do not access the action selections of other agents in the system. Due to the decentralization of information, it is generally difficult to design independent learners that drive the system to equilibrium. This paper investigates the feasibility of using \emph{satisficing} dynamics to guide independent learners to approximate equilibrium policies in non-episodic, discounted stochastic games. Satisficing refers to halting search in an optimization problem upon finding a satisfactory but possibly suboptimal input; here, we take it to mean halting search upon finding an input that achieves a cost within $\epsilon$ of the minimum cost. In this paper, we define a useful structural concept for games, to be termed the $\epsilon$-satisficing paths property, and we prove that this property holds for any $\epsilon \geq 0$ in general two-player stochastic games and in $N$-player stochastic games satisfying a symmetry condition. To illustrate the utility of this property, we present an independent learning algorithm that comes with high probability guarantees of approximate equilibrium in $N$-player symmetric games. This guarantee is made assuming symmetry alone, without additional assumptions such as a zero sum, team, or potential game structure. %

翻译：在多试剂强化学习(MARL)中,独立学习者是指那些无法进入系统中其他代理者的行动选择的学习者。由于信息分散化,通常很难设计能够使系统达到平衡的独立学习者。本文调查使用 emph{satisficting} 动态来指导独立学习者在非刺激、折扣的随机游戏中采用近似平衡政策的可行性。满意是指在寻找满意但可能次优的游戏输入时停止搜索; 这里, 我们的意思是, 在找到一个在最低成本中以$\epslon计的成本达到成本的输入时, 停止搜索。在本文中, 我们定义了一个有用的游戏结构概念, 称为 $\ epsilon- satisficational pathfical 属性, 并且我们证明, 在普通的两玩家随机游戏中, 和 $$$nn- player setectritical 游戏中, 我们独立地算出一种高概率的游戏结构的效用, 。

0

相关内容

相互独立的

相互独立的

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

55页图深度学习导论《A Gentle Introduction to Deep Learning for Graphs》

专知会员服务

104+阅读 · 2020年1月3日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

家蚕丝素酶fibroinase的结构解析及其活性周期性变化的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

金融市场multi-agent异质信息的风险形成机理及预警研究

国家自然科学基金

4+阅读 · 2013年12月31日

基于原位测定的NiTi记忆合金相变塑性的非均匀多场复杂行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

超窄滞后Ti-Ni-Cu-X（X=Pd, Pt, Au）记忆合金薄膜的马氏体相变与记忆效应稳定性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

光基因调控脊髓损伤小鼠步行CPG研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于传感网络的多移动智能体系统协调控制研究

国家自然科学基金

5+阅读 · 2010年12月31日

基于Multi-Agent技术的露天矿山生产调度系统群集拟生态优化研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于全局视野的微操作复杂作业研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

Arxiv

0+阅读 · 2022年4月20日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Arxiv

0+阅读 · 2022年4月18日

Convergence of the Discrete Minimum Energy Path

Convergence of the Discrete Minimum Energy Path

Arxiv

0+阅读 · 2022年4月15日

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

相互独立的

最新内容

乌克兰前线的五项创新

乌克兰前线的五项创新

专知会员服务

2+阅读 · 今天6:14

军事通信系统与设备的技术演进综述

军事通信系统与设备的技术演进综述

专知会员服务

2+阅读 · 今天5:59

《北约 AI手册：作战人员的实用考量》（2026最新64页）

《北约 AI手册：作战人员的实用考量》（2026最新64页）

专知会员服务

4+阅读 · 今天5:54

《北约标准：医疗评估手册》174页

《北约标准：医疗评估手册》174页

专知会员服务

3+阅读 · 今天5:51

《提升生成模型的安全性与保障》博士论文

《提升生成模型的安全性与保障》博士论文

专知会员服务

3+阅读 · 今天5:47

美国当前高超音速导弹发展概述

美国当前高超音速导弹发展概述

专知会员服务

4+阅读 · 4月19日

《高超音速武器：一项再度兴起的技术》120页slides

《高超音速武器：一项再度兴起的技术》120页slides

专知会员服务

10+阅读 · 4月19日

无人机蜂群建模与仿真方法

无人机蜂群建模与仿真方法

专知会员服务

11+阅读 · 4月19日

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

专知会员服务

4+阅读 · 4月19日

《量化反无人机系统对抗无人机蜂群效能的创新方法》

《量化反无人机系统对抗无人机蜂群效能的创新方法》

专知会员服务

13+阅读 · 4月19日

澳大利亚发布《国防战略（2026年）》

澳大利亚发布《国防战略（2026年）》

专知会员服务

6+阅读 · 4月19日

【CMU博士论文】迈向基于基础先验的 4D 感知研究

【CMU博士论文】迈向基于基础先验的 4D 感知研究

专知会员服务

8+阅读 · 4月19日

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

专知会员服务

18+阅读 · 4月19日

全球高超音速武器最新发展趋势

全球高超音速武器最新发展趋势

专知会员服务

5+阅读 · 4月19日

《利用大语言模型增强多域作战兵棋推演》（报告）

《利用大语言模型增强多域作战兵棋推演》（报告）

专知会员服务

15+阅读 · 4月18日

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

55页图深度学习导论《A Gentle Introduction to Deep Learning for Graphs》

专知会员服务

104+阅读 · 2020年1月3日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

军事通信系统与设备的技术演进综述

《北约标准：医疗评估手册》174页

乌克兰前线的五项创新

《北约 AI手册：作战人员的实用考量》（2026最新64页）

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

Arxiv

0+阅读 · 2022年4月20日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Arxiv

0+阅读 · 2022年4月18日

Convergence of the Discrete Minimum Energy Path

Convergence of the Discrete Minimum Energy Path

Arxiv

0+阅读 · 2022年4月15日

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

家蚕丝素酶fibroinase的结构解析及其活性周期性变化的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

金融市场multi-agent异质信息的风险形成机理及预警研究

国家自然科学基金

4+阅读 · 2013年12月31日

基于原位测定的NiTi记忆合金相变塑性的非均匀多场复杂行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

超窄滞后Ti-Ni-Cu-X（X=Pd, Pt, Au）记忆合金薄膜的马氏体相变与记忆效应稳定性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

光基因调控脊髓损伤小鼠步行CPG研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于传感网络的多移动智能体系统协调控制研究

国家自然科学基金

5+阅读 · 2010年12月31日

基于Multi-Agent技术的露天矿山生产调度系统群集拟生态优化研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于全局视野的微操作复杂作业研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员