Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline policy learning exhibits great potential, and numerous studies have explored in this direction. However, this field still lacks a comprehensive review and so developments of different branches are relatively independent. Thus, we provide the first systematic review on the applications of deep generative models for offline policy learning. In particular, we cover five mainstream deep generative models, including Variational Auto-Encoders, Generative Adversarial Networks, Normalizing Flows, Transformers, and Diffusion Models, and their applications in both offline reinforcement learning (offline RL) and imitation learning (IL). Offline RL and IL are two main branches of offline policy learning and are widely-adopted techniques for sequential decision-making. Specifically, for each type of DGM-based offline policy learning, we distill its fundamental scheme, categorize related works based on the usage of the DGM, and sort out the development process of algorithms in that field. Subsequent to the main content, we provide in-depth discussions on deep generative models and offline policy learning as a summary, based on which we present our perspectives on future research directions. This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms.

翻译：深度生成模型（DGMs）在多个领域取得了显著成功，特别是在利用离线数据训练的模型生成文本、图像和视频方面。同样，数据驱动的决策制定和机器人控制也需要从离线数据中学习一个生成函数，以充当策略或方针。在这种情况下，将深度生成模型应用于离线策略学习展现出巨大潜力，已有大量研究朝这一方向探索。然而，该领域仍缺乏系统性综述，不同分支的发展相对独立。为此，我们首次对深度生成模型在离线策略学习中的应用进行了系统回顾。具体而言，我们涵盖了五种主流深度生成模型，包括变分自编码器、生成对抗网络、归一化流、Transformer和扩散模型，及其在离线强化学习（离线RL）和模仿学习（IL）中的应用。离线RL和IL是离线策略学习的两大分支，也是序列决策中广泛采用的技术。针对每种基于DGM的离线策略学习，我们提炼了其基本框架，根据DGM的使用方式对相关研究进行了归类，并梳理了该领域内算法的发展历程。在主要内容之后，我们作为总结对深度生成模型与离线策略学习进行了深入讨论，并据此提出了对未来研究方向的展望。本工作为深度生成模型在离线策略学习中的研究进展提供了实用参考，旨在启发改进基于DGM的离线RL或IL算法。