With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining conversations with humans, and controlling robotic agents. However, there is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment. Offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions, making it feasible to extract policies from large and diverse training datasets. Effective offline RL algorithms have a much wider range of applications than online RL, being particularly appealing for real-world applications, such as education, healthcare, and robotics. In this work, we contribute with a unifying taxonomy to classify offline RL methods. Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field using a unified notation as well as a review of existing benchmarks' properties and shortcomings. Additionally, we provide a figure that summarizes the performance of each method and class of methods on different dataset properties, equipping researchers with the tools to decide which type of algorithm is best suited for the problem at hand and identify which classes of algorithms look the most promising. Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field.
翻译:随着深度学习的广泛应用,强化学习(RL)的受欢迎程度显著提升,已扩展至此前难以解决的问题,例如基于像素观测玩复杂游戏、维持与人类对话以及控制机器人智能体。然而,由于与环境交互的高成本和高风险,仍有大量领域无法应用RL。离线强化学习是一种仅从先前收集的交互静态数据集中学习的范式,使得从大规模多样化训练数据集中提取策略成为可能。与在线RL相比,有效的离线RL算法具有更广泛的应用范围,尤其在教育、医疗和机器人等现实场景中极具吸引力。本文提出了一种统一的分类法对离线RL方法进行归类,并采用一致符号系统综述了该领域最新的算法突破,同时回顾了现有基准测试集的特性与不足。此外,我们提供了一张汇总图表,展示不同方法及方法类别在不同数据集属性上的性能表现,为研究者提供选择最适问题算法类型的工具,并识别最具潜力的算法类别。最后,我们就这一快速发展领域的开放问题提出见解,并建议未来的研究方向。