RoMFAC: A robust mean-field actor-critic reinforcement learning against adversarial perturbations on states - 专知论文

会员服务 ·

0

稳健性 · Learning · 泛函 · 强化学习 · 损失 ·

2023 年 5 月 31 日

RoMFAC: A robust mean-field actor-critic reinforcement learning against adversarial perturbations on states

翻译：RoMFAC：一种针对状态对抗性干扰的鲁棒均值场演员-评论家强化学习

Ziyuan Zhou,Guanjun Liu

Multi-agent deep reinforcement learning makes optimal decisions dependent on system states observed by agents, but any uncertainty on the observations may mislead agents to take wrong actions. The Mean-Field Actor-Critic reinforcement learning (MFAC) is well-known in the multi-agent field since it can effectively handle a scalability problem. However, it is sensitive to state perturbations that can significantly degrade the team rewards. This work proposes a Robust Mean-field Actor-Critic reinforcement learning (RoMFAC) that has two innovations: 1) a new objective function of training actors, composed of a \emph{policy gradient function} that is related to the expected cumulative discount reward on sampled clean states and an \emph{action loss function} that represents the difference between actions taken on clean and adversarial states; and 2) a repetitive regularization of the action loss, ensuring the trained actors to obtain excellent performance. Furthermore, this work proposes a game model named a State-Adversarial Stochastic Game (SASG). Despite the Nash equilibrium of SASG may not exist, adversarial perturbations to states in the RoMFAC are proven to be defensible based on SASG. Experimental results show that RoMFAC is robust against adversarial perturbations while maintaining its competitive performance in environments without perturbations.

翻译：多智能体深度强化学习依赖智能体观测到的系统状态做出最优决策，但观测中的任何不确定性都可能误导智能体采取错误行动。均值场演员-评论家强化学习（MFAC）因其能有效处理可扩展性问题而在多智能体领域广为人知，然而它对状态干扰敏感，可能显著降低团队奖励。本工作提出了一种鲁棒均值场演员-评论家强化学习（RoMFAC），其包含两项创新：1）一种新的演员训练目标函数，由与采样干净状态上的期望累积折扣奖励相关的策略梯度函数和表示干净状态与对抗状态下行动差异的行动损失函数组成；2）对行动损失的重复正则化，确保训练后的演员获得优异性能。此外，本工作提出了一种名为状态对抗随机博弈（SASG）的博弈模型。尽管SASG的纳什均衡可能不存在，但基于SASG可证明RoMFAC中针对状态的对抗性干扰是可防御的。实验结果表明，RoMFAC在无干扰环境中保持竞争性能的同时，对对抗性干扰具有鲁棒性。

0

相关内容

稳健性

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

随机偏微分方程及其障碍问题的研究

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于法向热传导的垂直取向液晶高分子薄膜研究

国家自然科学基金

0+阅读 · 2012年12月31日

低维度纳米银透明导电薄膜的制备及其光学减反射特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

硅基纳米多级复合结构集成气体传感器

国家自然科学基金

0+阅读 · 2011年12月31日

分段光滑机械系统的亚谐振动及分岔机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

旋转障碍下不可压缩粘性流体数值方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

超高密度热辅助磁记录用FePt/FeRh垂直取向双层薄膜介质的研究

国家自然科学基金

0+阅读 · 2009年12月31日

湍流边界层激励下典型壳体结构的声辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

Leveraging Offline Data in Online Reinforcement Learning

Arxiv

0+阅读 · 2023年7月20日

Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks

Arxiv

0+阅读 · 2023年7月20日

Improving the Transferability of Adversarial Attacks on Face Recognition with Beneficial Perturbation Feature Augmentation

Arxiv

0+阅读 · 2023年7月19日

Internally Rewarded Reinforcement Learning

Arxiv

0+阅读 · 2023年7月17日

Explainable Deep Learning: A Field Guide for the Uninitiated

Arxiv

51+阅读 · 2021年9月13日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Generating Adversarial Examples with Adversarial Networks

Arxiv

10+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

3+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

4+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

9+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

7+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

4+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

6+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

6+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

8+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

4+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

6+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

7+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

5+阅读 · 6月17日

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

5+阅读 · 6月17日

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

6+阅读 · 6月16日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Leveraging Offline Data in Online Reinforcement Learning

Arxiv

0+阅读 · 2023年7月20日

Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks

Arxiv

0+阅读 · 2023年7月20日

Improving the Transferability of Adversarial Attacks on Face Recognition with Beneficial Perturbation Feature Augmentation

Arxiv

0+阅读 · 2023年7月19日

Internally Rewarded Reinforcement Learning

Arxiv

0+阅读 · 2023年7月17日

Explainable Deep Learning: A Field Guide for the Uninitiated

Arxiv

51+阅读 · 2021年9月13日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Generating Adversarial Examples with Adversarial Networks

Arxiv

10+阅读 · 2018年1月15日

相关基金

随机偏微分方程及其障碍问题的研究

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于法向热传导的垂直取向液晶高分子薄膜研究

国家自然科学基金

0+阅读 · 2012年12月31日

低维度纳米银透明导电薄膜的制备及其光学减反射特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

硅基纳米多级复合结构集成气体传感器

国家自然科学基金

0+阅读 · 2011年12月31日

分段光滑机械系统的亚谐振动及分岔机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

旋转障碍下不可压缩粘性流体数值方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

超高密度热辅助磁记录用FePt/FeRh垂直取向双层薄膜介质的研究

国家自然科学基金

0+阅读 · 2009年12月31日

湍流边界层激励下典型壳体结构的声辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员