Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as ``Does OCR pre-training improve performance on object-centric tasks?'' and ``Can OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.
翻译:无监督对象中心表示(OCR)学习近期作为一种新的视觉表示范式引起了关注,这是因为它具备在下游任务中提升样本效率、系统泛化能力和推理能力的潜力,有望成为有效的预训练技术。尽管基于图像的强化学习(RL)是最重要且常被提及的此类下游任务之一,但令人意外的是,目前尚未系统性地研究过OCR在RL中的益处。相反,大多数评估主要侧重于间接度量,如分割质量和对象属性预测准确性。本文通过实证实验探究了OCR预训练对基于图像的强化学习的有效性。为进行系统评估,我们引入了一个简单的对象中心视觉RL基准,并开展实验以回答诸如“OCR预训练能否提升对象中心任务的性能?”以及“OCR预训练是否有助于分布外泛化?”等问题。我们的结果为OCR预训练在RL中的有效性和在某些场景下使用的潜在局限性提供了有价值的实证见解。此外,本研究还考察了将OCR预训练融入RL的关键方面,包括在视觉复杂环境中的性能以及用于聚合对象表示的适当池化层。