In distributional reinforcement learning not only expected returns but the complete return distributions of a policy are taken into account. The return distribution for a fixed policy is given as the solution of an associated distributional Bellman equation. In this note we consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions. We give necessary and sufficient conditions for existence and uniqueness of return distributions and identify cases of regular variation. We link distributional Bellman equations to multivariate affine distributional equations. We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation. This makes the general theory of such equations applicable to the distributional reinforcement learning setting.
翻译:在分布强化学习中,不仅要考虑期望回报,还需考虑策略的完整回报分布。固定策略的回报分布由相应的分布贝尔曼方程的解给出。本文研究一般分布贝尔曼方程,探讨其解的存在性与唯一性,以及回报分布的尾部性质。我们给出了回报分布存在且唯一的充要条件,并识别了正则变化的情形。本文将分布贝尔曼方程与多元仿射分布方程联系起来,证明任何分布贝尔曼方程的解均可表示为多元仿射分布方程解的各边际分布律向量。这使得此类方程的一般理论可应用于分布强化学习设定。