Visual Reinforcement Learning (RL) methods often require extensive amounts of data. As opposed to model-free RL, model-based RL (MBRL) offers a potential solution with efficient data utilization through planning. Additionally, RL lacks generalization capabilities for real-world tasks. Prior work has shown that incorporating pre-trained visual representations (PVRs) enhances sample efficiency and generalization. While PVRs have been extensively studied in the context of model-free RL, their potential in MBRL remains largely unexplored. In this paper, we benchmark a set of PVRs on challenging control tasks in a model-based RL setting. We investigate the data efficiency, generalization capabilities, and the impact of different properties of PVRs on the performance of model-based agents. Our results, perhaps surprisingly, reveal that for MBRL current PVRs are not more sample efficient than learning representations from scratch, and that they do not generalize better to out-of-distribution (OOD) settings. To explain this, we analyze the quality of the trained dynamics model. Furthermore, we show that data diversity and network architecture are the most important contributors to OOD generalization performance.
翻译:视觉强化学习方法通常需要大量数据。与无模型强化学习不同,基于模型的强化学习通过规划提供了数据高效利用的潜在解决方案。此外,强化学习在现实任务中缺乏泛化能力。先前研究表明,引入预训练视觉表征可提升样本效率与泛化性能。尽管预训练视觉表征在无模型强化学习背景下已被广泛研究,其在基于模型的强化学习中的潜力仍很大程度上未被探索。本文在基于模型强化学习设置下,对一系列预训练视觉表征在具有挑战性的控制任务上进行基准测试。我们研究了数据效率、泛化能力以及预训练视觉表征的不同特性对基于模型智能体性能的影响。我们的结果(或许令人惊讶地)表明,对于基于模型的强化学习,当前预训练视觉表征并不比从头学习表征更具样本效率,且在分布外场景中并未表现出更好的泛化能力。为解释此现象,我们分析了训练所得动态模型的质量。此外,我们证明数据多样性和网络架构是影响分布外泛化性能的最重要因素。