Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as ``Does OCR pre-training improve performance on object-centric tasks?'' and ``Can OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.
翻译:无监督的对象为中心表征(Object-Centric Representation, OCR)学习近期作为一种新的视觉表征范式引起关注。其潜力在于能够作为样本效率、系统性泛化及推理能力等各类下游任务的有效预训练技术。尽管基于图像的强化学习(Reinforcement Learning, RL)是最重要且被频繁提及的下游任务之一,但令人惊讶的是,该技术对RL的益处尚未被系统性地探究。现有评估多聚焦于分割质量、物体属性预测准确率等间接指标。本文通过实证实验系统研究了OCR预训练对基于图像的强化学习的有效性。为进行系统性评估,我们引入了一个简单的以对象为中心的视觉强化学习基准,并通过实验回答诸如“OCR预训练能否提升以对象为中心的任务性能?”及“OCR预训练能否辅助分布外泛化?”等问题。我们的结果为OCR预训练对RL的有效性及其在某些场景中应用的潜在局限性提供了有价值的实证见解。此外,本研究还探讨了在RL中整合OCR预训练的关键方面,包括在视觉复杂环境中的表现及用于聚合对象表征的合适池化层。