Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle in the face of uncertainty for exploration. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models. Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings. Empirically, we demonstrate superior performance over current state-of-the-art algorithms across various benchmarks.
翻译:深层隐变量模型因其在建模复杂转移动力学方面的强大表达能力,在基于模型的强化学习中取得了显著的经验成功。然而,从理论和经验层面来看,隐变量模型如何促进学习、规划与探索以提升强化学习的样本效率尚不明确。本文从状态-动作价值函数的角度出发,提供了隐变量模型的表示视角,该视角既支持可处理的变分学习算法,又能在面对不确定性进行探索时有效实现乐观/悲观原则。特别地,我们通过整合隐变量模型的核嵌入方法,提出了一种结合UCB探索的高效规划算法。在理论上,我们建立了所提方法在线与离线场景下的样本复杂度;在经验上,我们展示了该方法在多个基准测试中优于当前最先进算法的表现。