扩散策略策略优化 (Diffusion Policy Policy Optimization)

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. Through experimental investigation, we find that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization, leading to structured and on-manifold exploration, stable training, and strong policy robustness. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware in a long-horizon, multi-stage manipulation task. Website with code: diffusion-ppo.github.io

翻译：本文提出扩散策略策略优化（DPPO），这是一个算法框架，包含使用强化学习中的策略梯度方法，在连续控制和机器人学习任务中微调基于扩散的策略（例如Diffusion Policy）的最佳实践。策略梯度方法在训练其他策略参数化的强化学习策略中无处不在；然而，人们曾推测它们对于基于扩散的策略效率较低。令人惊讶的是，我们证明，与针对基于扩散策略的其他强化学习方法相比，以及与其他策略参数化的策略梯度微调相比，DPPO在常见基准测试中实现了最强的整体性能和微调效率。通过实验研究，我们发现DPPO利用了强化学习微调与扩散参数化之间独特的协同作用，从而实现了结构化且位于流形上的探索、稳定的训练以及强大的策略鲁棒性。我们进一步在一系列现实场景中展示了DPPO的优势，包括具有像素观测的模拟机器人任务，以及通过在长视野、多阶段操作任务中将仿真训练的策略零样本部署到机器人硬件上。代码网站：diffusion-ppo.github.io

相关内容

关注 0

Pacific Graphics是亚洲图形协会的旗舰会议。作为一个非常成功的会议系列，太平洋图形公司为太平洋沿岸以及世界各地的研究人员，开发人员，从业人员提供了一个高级论坛，以介绍和讨论计算机图形学及相关领域的新问题，解决方案和技术。太平洋图形会议的目的是召集来自各个领域的研究人员，以展示他们的最新成果，开展合作并为研究领域的发展做出贡献。会议将包括定期的论文讨论会，进行中的讨论会，教程以及由与计算机图形学和交互系统相关的所有领域的国际知名演讲者的演讲。官网地址：http://dblp.uni-trier.de/db/conf/pg/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日