Loss Dynamics of Temporal Difference Reinforcement Learning - 专知论文

会员服务 ·

0

Learning · 泛函 · 学习率 · 强化学习 · 衰减系数 ·

2023 年 11 月 7 日

Loss Dynamics of Temporal Difference Reinforcement Learning

翻译：时序差分强化学习的损失动态

Blake Bordelon,Paul Masset,Henry Kuo,Cengiz Pehlevan

from arxiv, Advances in Neural Information Processing Systems 36 (2023) Camera Ready

Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

翻译：强化学习已在多个需要智能体在稀疏反馈环境中学习行动的领域取得成功。然而，尽管取得了实证成功，目前仍缺乏对强化学习模型参数与状态表征特征如何共同作用以控制学习动态的理论理解。本研究运用统计物理学概念，针对线性函数逼近器的价值函数时序差分学习，分析了典型情况下的学习曲线。我们的理论基于高斯等价假设，将随机轨迹的平均替换为时间相关的高斯特征平均，并在小规模马尔可夫决策过程中验证了假设。研究发现，与传统的梯度下降动态不同，由于对可能轨迹空间进行子采样而产生的随机半梯度噪声会导致价值误差出现显著平台期。我们分析了学习动态与平台期如何依赖特征结构、学习率、折扣因子及奖励函数，进而研究了学习率退火和奖励塑形等策略如何优化学习动态与平台期。本文的工作为发展强化学习动态理论提供了新工具和新方向。

0

相关内容

Learning

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

语义Web知识库补全关键技术研究

国家自然科学基金

18+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

JAK/STAT/SOCS3信号通路介导卒中后认知损害的分子机制及中医“活血通络”法对其的干预研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Neolaxiflorin B的全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding

Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding

Arxiv

0+阅读 · 2023年12月26日

Optimal Decision Tree with Noisy Outcomes

Arxiv

0+阅读 · 2023年12月23日

A Comprehensive Survey on Source-free Domain Adaptation

Arxiv

10+阅读 · 2023年2月23日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

An Attentive Survey of Attention Models

Arxiv

19+阅读 · 2019年4月5日

Learning Discrete Structures for Graph Neural Networks

Arxiv

18+阅读 · 2019年3月28日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

最新内容

速度优先于谨慎：NSPM-11意味着什么（将人工智能融入美国国防和情报行动最全面的声明）

速度优先于谨慎：NSPM-11意味着什么（将人工智能融入美国国防和情报行动最全面的声明）

专知会员服务

2+阅读 · 今天16:15

《基于深度强化学习的反无人机技术研究》178页

《基于深度强化学习的反无人机技术研究》178页

专知会员服务

1+阅读 · 今天16:06

技术突破与战略优势竞争：美军人工智能技术运用阶段分析

技术突破与战略优势竞争：美军人工智能技术运用阶段分析

专知会员服务

1+阅读 · 今天15:54

“史诗怒火”行动与“AI中心战”模式的浮现

“史诗怒火”行动与“AI中心战”模式的浮现

专知会员服务

1+阅读 · 今天15:31

【CVPR2026教程】扩散模型的解析理解

【CVPR2026教程】扩散模型的解析理解

专知会员服务

0+阅读 · 今天14:49

【CVPR2026教程】从感知到模拟：多模态推理中世界模型的涌现

【CVPR2026教程】从感知到模拟：多模态推理中世界模型的涌现

专知会员服务

0+阅读 · 今天14:40

马赛克战：俄乌战场透析

马赛克战：俄乌战场透析

专知会员服务

13+阅读 · 今天4:12

《利用人工智能增强军事决策》

《利用人工智能增强军事决策》

专知会员服务

4+阅读 · 今天4:09

《自动机器学习在军事数据耕耘法中的应用》

《自动机器学习在军事数据耕耘法中的应用》

专知会员服务

6+阅读 · 今天4:02

为何指挥所生存能力要求范式转变

为何指挥所生存能力要求范式转变

专知会员服务

5+阅读 · 今天3:54

打造“新蛛网”模式与高科技动员

打造“新蛛网”模式与高科技动员

专知会员服务

4+阅读 · 今天3:33

“蛛网”行动一周年：远程无人机战争

“蛛网”行动一周年：远程无人机战争

专知会员服务

3+阅读 · 今天3:23

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

专知会员服务

3+阅读 · 今天3:15

为何“第一次人工智能战争（美以伊冲突）”仍是人类主导的斗争

为何“第一次人工智能战争（美以伊冲突）”仍是人类主导的斗争

专知会员服务

3+阅读 · 今天3:09

【剑桥博士论文】智能体-环境协同优化

【剑桥博士论文】智能体-环境协同优化

专知会员服务

7+阅读 · 6月9日

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于深度强化学习的反无人机技术研究》178页

“史诗怒火”行动与“AI中心战”模式的浮现

速度优先于谨慎：NSPM-11意味着什么（将人工智能融入美国国防和情报行动最全面的声明）

技术突破与战略优势竞争：美军人工智能技术运用阶段分析

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding

Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding

Arxiv

0+阅读 · 2023年12月26日

Optimal Decision Tree with Noisy Outcomes

Arxiv

0+阅读 · 2023年12月23日

A Comprehensive Survey on Source-free Domain Adaptation

Arxiv

10+阅读 · 2023年2月23日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

An Attentive Survey of Attention Models

Arxiv

19+阅读 · 2019年4月5日

Learning Discrete Structures for Graph Neural Networks

Arxiv

18+阅读 · 2019年3月28日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

语义Web知识库补全关键技术研究

国家自然科学基金

18+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

JAK/STAT/SOCS3信号通路介导卒中后认知损害的分子机制及中医“活血通络”法对其的干预研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Neolaxiflorin B的全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员