Balancing Value Underestimation and Overestimation with Realistic Actor-Critic - 专知论文

会员服务 ·

0

欠估计 · 过估计 · Learning · 样本 · Continuity ·

2022 年 10 月 26 日

Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

翻译：与现实主义行动者-批评者相比,对价值的平衡低估和高估

Sicen Li,Qinyun Tang,Yiming Pang,Xinmeng Ma,Gang Wang

Model-free deep reinforcement learning (RL) has been successfully applied to challenging continuous control domains. However, poor sample efficiency prevents these methods from being widely used in real-world domains. This paper introduces a novel model-free algorithm, Realistic Actor-Critic(RAC), which can be incorporated with any off-policy RL algorithms to improve sample efficiency. RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn a policy family with the same neural network, each with different trade-offs between underestimation and overestimation. To learn such policies, we introduce uncertainty punished Q-learning, which uses uncertainty from the ensembling of multiple critics to build various confidence-bounds of Q-function. We evaluate RAC on the MuJoCo benchmark, achieving 10x sample efficiency and 25\% performance improvement on the most challenging Humanoid environment compared to SAC.

翻译：在挑战连续控制领域方面成功应用了无模型深度强化学习(RL),但是,由于抽样效率差,这些方法无法被广泛用于现实世界领域。本文介绍了一种新的无模型算法,即现实动作-批评(RAC),它可以与任何非政策性RL算法相结合,以提高抽样效率。RAC使用通用价值函数比对器(UVFA),同时学习一个具有相同神经网络的政策家庭,每个系统在低估和高估之间取舍不同。为了了解这些政策,我们引入了受罚的不确定性Q-学习,利用多重批评者组合产生的不确定性来建立各种Q功能的信任度。我们根据MuJoCo基准评估RAC,在最具挑战性的人类环境方面实现10x抽样效率和25 ⁇ 的性能改进。

0

相关内容

欠估计

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

两类典型脉冲延时神经网络的Hopf分岔研究

国家自然科学基金

0+阅读 · 2015年12月31日

水平井随钻阵列感应测井环境动态建模及参数估计

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

低维费米冷原子体系中的s-波拓扑超流和无序效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

PM2.5暴露诱发胰岛素抵抗的分子作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

土壤地下水污染过程与原位修复

国家自然科学基金

1+阅读 · 2012年12月31日

地下水耦合模型的有限元方法及反演

国家自然科学基金

0+阅读 · 2011年12月31日

我国能源安全预警理论与方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic

Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic

Arxiv

0+阅读 · 2022年12月12日

Learning on non-stationary data with re-weighting

Arxiv

0+阅读 · 2022年12月12日

Where to go: Agent Guidance with Deep Reinforcement Learning in A City-Scale Online Ride-Hailing Service

Arxiv

0+阅读 · 2022年12月12日

Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks

Arxiv

0+阅读 · 2022年12月12日

Robust angle-based transfer learning in high dimensions

Arxiv

0+阅读 · 2022年12月12日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Arxiv

10+阅读 · 2018年4月28日

VIP会员

文章信息

相关主题

最新内容

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

5+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

5+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

5+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

3+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

5+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

5+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

7+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

8+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

12+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

13+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

10+阅读 · 7月18日

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

印度精确打击与指挥架构的断层

美空军AI完成F-16战斗机自主空战历史性试飞

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic

Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic

Arxiv

0+阅读 · 2022年12月12日

Learning on non-stationary data with re-weighting

Arxiv

0+阅读 · 2022年12月12日

Where to go: Agent Guidance with Deep Reinforcement Learning in A City-Scale Online Ride-Hailing Service

Arxiv

0+阅读 · 2022年12月12日

Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks

Arxiv

0+阅读 · 2022年12月12日

Robust angle-based transfer learning in high dimensions

Arxiv

0+阅读 · 2022年12月12日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Arxiv

10+阅读 · 2018年4月28日

相关基金

两类典型脉冲延时神经网络的Hopf分岔研究

国家自然科学基金

0+阅读 · 2015年12月31日

水平井随钻阵列感应测井环境动态建模及参数估计

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

低维费米冷原子体系中的s-波拓扑超流和无序效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

PM2.5暴露诱发胰岛素抵抗的分子作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

土壤地下水污染过程与原位修复

国家自然科学基金

1+阅读 · 2012年12月31日

地下水耦合模型的有限元方法及反演

国家自然科学基金

0+阅读 · 2011年12月31日

我国能源安全预警理论与方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员