Truncating Trajectories in Monte Carlo Reinforcement Learning - 专知论文

会员服务 ·

0

蒙特卡罗 · 期望回报 · Learning · Agent · 总回报 ·

2023 年 5 月 7 日

Truncating Trajectories in Monte Carlo Reinforcement Learning

翻译：截断蒙特卡洛强化学习中的轨迹

Riccardo Poiani,Alberto Maria Metelli,Marcello Restelli

In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i.e., the expected return. In practice, in many tasks of interest, such as policy optimization, the agent usually spends its interaction budget by collecting episodes of fixed length within a simulator (i.e., Monte Carlo simulation). However, given the discounted nature of the RL objective, this data collection strategy might not be the best option. Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths, i.e., truncated. The proposed approach provably minimizes the width of the confidence intervals around the empirical estimates of the expected return of a policy. After discussing the theoretical properties of our method, we make use of our trajectory truncation mechanism to extend Policy Optimization via Importance Sampling (POIS, Metelli et al., 2018) algorithm. Finally, we conduct a numerical comparison between our algorithm and POIS: the results are consistent with our theory and show that an appropriate truncation of the trajectories can succeed in improving performance.

翻译：在强化学习中，智能体在未知环境中行动，以最大化外部奖励信号的期望累积折现和（即期望回报）。在实际应用中，在许多感兴趣的任务（如策略优化）中，智能体通常通过在模拟器内收集固定长度的回合（即蒙特卡洛模拟）来使用其交互预算。然而，考虑到强化学习目标的折现特性，这种数据收集策略可能并非最佳选择。事实上，早期模拟步骤中的奖励权重指数级地高于未来奖励。基于这一直觉，本文设计了一种先验预算分配策略，该策略导致收集不同长度的轨迹（即截断轨迹）。所提出的方法可证明地最小化了策略期望回报经验估计周围置信区间的宽度。在讨论了我们方法的理论性质后，我们将轨迹截断机制用于扩展基于重要性采样的策略优化（POIS，Metelli等人，2018）算法。最后，我们将我们的算法与POIS进行了数值比较：结果与我们的理论一致，表明适当的轨迹截断能够成功提升性能。

0

相关内容

蒙特卡罗

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

12+阅读 · 2018年11月10日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

间接优化的高效Monte Carlo声传播研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于共享孔径MTM的天线宽带RCS减缩技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

三维结构表面上石墨烯的可控制备方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于β/B2相自润滑的TiAl-RE合金优化设计与管材制备基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属表面化学气相沉积法生长石墨烯的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

强电流滑动电接触下的最优载荷研究

国家自然科学基金

0+阅读 · 2009年12月31日

金属催化基于SiC单晶的石墨烯可控制备及其界面反应研究

国家自然科学基金

0+阅读 · 2009年12月31日

刚柔嵌段共聚物自组装行为的快速非格子Monte Carlo模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

ICF中高能电子和离子输运的Monte-Carlo算法研究和程序研制

国家自然科学基金

0+阅读 · 2009年12月31日

Value Gradient weighted Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月20日

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月19日

TrajFlow: Learning the Distribution over Trajectories

Arxiv

0+阅读 · 2023年6月19日

Bootstrapped Representations in Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Fairness in Preference-based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Robotic Packaging Optimization with Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Cooperative Multi-Objective Reinforcement Learning for Traffic Signal Control and Carbon Emission Reduction

Arxiv

0+阅读 · 2023年6月16日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

3+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

4+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

9+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

8+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

4+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

7+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

6+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

9+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

4+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

6+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

7+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

6+阅读 · 6月17日

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

5+阅读 · 6月17日

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

6+阅读 · 6月16日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

12+阅读 · 2018年11月10日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Value Gradient weighted Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月20日

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月19日

TrajFlow: Learning the Distribution over Trajectories

Arxiv

0+阅读 · 2023年6月19日

Bootstrapped Representations in Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Fairness in Preference-based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Robotic Packaging Optimization with Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Cooperative Multi-Objective Reinforcement Learning for Traffic Signal Control and Carbon Emission Reduction

Arxiv

0+阅读 · 2023年6月16日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

相关基金

间接优化的高效Monte Carlo声传播研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于共享孔径MTM的天线宽带RCS减缩技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

三维结构表面上石墨烯的可控制备方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于β/B2相自润滑的TiAl-RE合金优化设计与管材制备基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属表面化学气相沉积法生长石墨烯的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

强电流滑动电接触下的最优载荷研究

国家自然科学基金

0+阅读 · 2009年12月31日

金属催化基于SiC单晶的石墨烯可控制备及其界面反应研究

国家自然科学基金

0+阅读 · 2009年12月31日

刚柔嵌段共聚物自组装行为的快速非格子Monte Carlo模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

ICF中高能电子和离子输运的Monte-Carlo算法研究和程序研制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员