Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. Similar to how IRL trains a policy based on the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Diffusion by Maximum Entropy IRL (DxMI), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in DxMI, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by Dynamic Programming (DxDP), a novel reinforcement learning algorithm for diffusion models, as a subroutine in DxMI. DxDP makes the diffusion model update in DxMI efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time. Our empirical studies show that diffusion models fine-tuned using DxMI can generate high-quality samples in as few as 4 and 10 steps. Additionally, DxMI enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing anomaly detection performance.

翻译：本文提出了一种最大熵逆强化学习方法，用于提升扩散生成模型的样本质量，特别是在生成时间步数较少的情况下。类似于逆强化学习基于专家演示数据学习奖励函数来训练策略，我们利用训练数据估计的对数概率密度来训练（或微调）扩散模型。由于我们采用基于能量的模型来表示对数密度，该方法可归结为扩散模型与基于能量模型的联合训练。我们提出的逆强化学习框架，称为最大熵逆强化学习扩散法，是一个极小极大优化问题，当两个模型均收敛至数据分布时达到均衡。熵最大化在该框架中起着关键作用，既促进了扩散模型的探索，又确保了基于能量模型的收敛。我们还提出了扩散动态规划算法，作为一种用于扩散模型的新型强化学习子程序，将其作为该框架的组成部分。该算法通过将原始问题转化为最优控制形式，用时序反向传播替代价值函数，从而实现了扩散模型在该框架中的高效更新。实证研究表明，经该框架微调的扩散模型仅需4至10步即可生成高质量样本。此外，该框架能够在无需马尔可夫链蒙特卡罗采样的条件下训练基于能量模型，稳定了其训练动态并提升了异常检测性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日