Deep Generative Models (DGMs), including Energy-Based Models (EBMs) and Score-based Generative Models (SGMs), have advanced high-fidelity data generation and complex continuous distribution approximation. However, their application in Markov Decision Processes (MDPs), particularly in distributional Reinforcement Learning (RL), remains underexplored, with conventional histogram-based methods dominating the field. This paper rigorously highlights that this application gap is caused by the nonlinearity of modern DGMs, which conflicts with the linearity required by the Bellman equation in MDPs. For instance, EBMs involve nonlinear operations such as exponentiating energy functions and normalizing constants. To address this, we introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling. With divergence-based training techniques to optimize neural network proxies and a new type of stochastic differential equation (SDE) for sampling, Bellman Diffusion is guaranteed to converge to the target distribution. Our empirical results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks. This work enables the effective integration of DGMs into MDP applications, unlocking new avenues for advanced decision-making frameworks.
翻译:深度生成模型(DGMs),包括基于能量的模型(EBMs)和基于分数的生成模型(SGMs),在高保真数据生成和复杂连续分布近似方面取得了进展。然而,它们在马尔可夫决策过程(MDPs)中的应用,尤其是在分布强化学习(RL)中,仍未得到充分探索,该领域目前仍由传统的基于直方图的方法主导。本文严格指出,这一应用差距是由现代DGMs的非线性特性引起的,这与MDPs中贝尔曼方程所要求的线性特性相冲突。例如,EBMs涉及指数化能量函数和归一化常数等非线性操作。为解决此问题,我们提出了贝尔曼扩散,这是一种新颖的DGM框架,它通过梯度和标量场建模在MDPs中保持线性。借助基于散度的训练技术来优化神经网络代理,以及一种用于采样的新型随机微分方程(SDE),贝尔曼扩散被保证能收敛到目标分布。我们的实证结果表明,贝尔曼扩散实现了准确的场估计,并且是一个有能力的图像生成器,在分布RL任务中,其收敛速度比传统的基于直方图的基线快1.5倍。这项工作使得DGMs能够有效地集成到MDP应用中,为高级决策框架开辟了新途径。