Diffusion Spectral Representation for Reinforcement Learning

Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions. Despite existing methods being promising, the key challenge of extending existing methods for broader real-world applications lies in the computational cost at inference time, i.e., sampling from a diffusion model is considerably slow as it often requires tens to hundreds of iterations to generate even one sample. To circumvent this issue, we propose to leverage the flexibility of diffusion models for RL from a representation learning perspective. In particular, by exploiting the connection between diffusion models and energy-based models, we develop Diffusion Spectral Representation (Diff-SR), a coherent algorithm framework that enables extracting sufficient representations for value functions in Markov decision processes (MDP) and partially observable Markov decision processes (POMDP). We further demonstrate how Diff-SR facilitates efficient policy optimization and practical algorithms while explicitly bypassing the difficulty and inference cost of sampling from the diffusion model. Finally, we provide comprehensive empirical studies to verify the benefits of Diff-SR in delivering robust and advantageous performance across various benchmarks with both fully and partially observable settings.

翻译：基于扩散的模型因其在建模复杂分布方面的强大表达能力，已在强化学习（RL）领域取得了显著的实证成功。尽管现有方法前景广阔，但将其扩展至更广泛现实应用的关键挑战在于推理时的计算成本，即从扩散模型中采样速度相当缓慢，通常需要数十至数百次迭代才能生成单个样本。为规避此问题，我们提出从表示学习的角度利用扩散模型的灵活性进行强化学习。具体而言，通过利用扩散模型与基于能量的模型之间的关联，我们开发了扩散谱表示（Diff-SR），这是一个连贯的算法框架，能够为马尔可夫决策过程（MDP）和部分可观测马尔可夫决策过程（POMDP）中的价值函数提取充分的表示。我们进一步展示了Diff-SR如何促进高效策略优化及实用算法，同时显式地规避了从扩散模型中采样的困难与推理成本。最后，我们通过全面的实证研究验证了Diff-SR在完全可观测与部分可观测的多种基准测试中，均能提供鲁棒且优越的性能表现。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日