Policy Representation via Diffusion Probability Model for Reinforcement Learning

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal distributions, which has shown promising and potential applications to RL. In this paper, we formally build a theoretical foundation of policy representation via the diffusion probability model and provide practical implementations of diffusion policy for online model-free RL. Concretely, we character diffusion policy as a stochastic process, which is a new approach to representing a policy. Then we present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy. Furthermore, we propose the DIPO which is an implementation for model-free online RL with DIffusion POlicy. To the best of our knowledge, DIPO is the first algorithm to solve model-free online RL problems with the diffusion model. Finally, extensive empirical results show the effectiveness and superiority of DIPO on the standard continuous control Mujoco benchmark.

翻译：主流的强化学习算法倾向于生成单峰策略分布，这削弱了复杂策略的表达能力并降低了探索性能。扩散概率模型能够有效学习复杂的多峰分布，在强化学习领域展现出广阔的应用前景。本文通过扩散概率模型系统构建了策略表示的理论基础，并提供了扩散策略在在线无模型强化学习中的实践方案。具体而言，我们将扩散策略表征为随机过程，这是一种新颖的策略表示方法；随后给出了扩散策略的收敛性保证，为理解其多峰特性提供了理论支撑。在此基础上，我们提出了基于扩散策略的在线无模型强化学习实现框架DIPO。据我们所知，DIPO是首个利用扩散模型解决在线无模型强化学习问题的算法。最后，大量实验结果表明，DIPO在标准连续控制基准Mujoco上具有显著的有效性和优越性。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日