一阶Sobolev强化学习 (First-order Sobolev Reinforcement Learning) - 专知论文

会员服务 ·

0

差分 · 一致 · 梯度 · 目标函数 · 结构 ·

2025 年 11 月 24 日

First-order Sobolev Reinforcement Learning

翻译：一阶Sobolev强化学习

Fabian Schramm,Nicolas Perrin-Gilbert,Justin Carpentier

from arxiv, Workshop paper at Differentiable Systems and Scientific Machine Learning, EurIPS 2025

We propose a refinement of temporal-difference learning that enforces first-order Bellman consistency: the learned value function is trained to match not only the Bellman targets in value but also their derivatives with respect to states and actions. By differentiating the Bellman backup through differentiable dynamics, we obtain analytically consistent gradient targets. Incorporating these into the critic objective using a Sobolev-type loss encourages the critic to align with both the value and local geometry of the target function. This first-order TD matching principle can be seamlessly integrated into existing algorithms, such as Q-learning or actor-critic methods (e.g., DDPG, SAC), potentially leading to faster critic convergence and more stable policy gradients without altering their overall structure.

翻译：我们提出了一种时间差分学习的改进方法，该方法强制实现一阶贝尔曼一致性：学习的价值函数不仅被训练以匹配贝尔曼目标的价值，还需匹配其关于状态和动作的导数。通过对可微分动态系统的贝尔曼备份进行微分，我们获得了解析一致性的梯度目标。将这些目标通过Sobolev型损失纳入评论家目标函数中，促使评论家与目标函数的价值及局部几何结构对齐。这一阶时间差分匹配原则可无缝集成到现有算法中，例如Q学习或演员-评论家方法（如DDPG、SAC），在不改变其整体结构的前提下，可能实现更快的评论家收敛和更稳定的策略梯度。

0

相关内容

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

18+阅读 · 2024年5月23日

【华盛顿大学Simon S. Du】离线单智能体和多智能体强化学习

【华盛顿大学Simon S. Du】离线单智能体和多智能体强化学习

专知会员服务

46+阅读 · 2022年11月10日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知

26+阅读 · 2020年4月3日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Arxiv

0+阅读 · 2025年12月31日

Bellman Calibration for V-Learning in Offline Reinforcement Learning

Arxiv

0+阅读 · 2025年12月29日

Calibrated Multi-Level Quantile Forecasting

Arxiv

0+阅读 · 2025年12月29日

Profile Bayesian Optimization for Expensive Computer Experiments

Arxiv

0+阅读 · 2025年12月29日

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Arxiv

0+阅读 · 2025年12月25日

VIP会员

文章信息

相关主题

相关VIP内容

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

18+阅读 · 2024年5月23日

【华盛顿大学Simon S. Du】离线单智能体和多智能体强化学习

【华盛顿大学Simon S. Du】离线单智能体和多智能体强化学习

专知会员服务

46+阅读 · 2022年11月10日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

热门VIP内容

开通专知VIP会员享更多权益服务

《伊朗-以色列对抗中的算法瞄准：技术现实、法律门槛与人类控制的边界》

《分散即不确定性：ISR赋能目标定位时代生存能力再思考》

网络中心战实践：美国“绝对决心”行动解析

《人工智能与先进材料：战略与安全影响》最新23页报告

相关资讯

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知

26+阅读 · 2020年4月3日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

相关论文

MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Arxiv

0+阅读 · 2025年12月31日

Bellman Calibration for V-Learning in Offline Reinforcement Learning

Arxiv

0+阅读 · 2025年12月29日

Calibrated Multi-Level Quantile Forecasting

Arxiv

0+阅读 · 2025年12月29日

Profile Bayesian Optimization for Expensive Computer Experiments

Arxiv

0+阅读 · 2025年12月29日

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Arxiv

0+阅读 · 2025年12月25日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员