In recent years, increasing attention has been directed to leveraging pre-trained vision models for motor control. While existing works mainly emphasize the importance of this pre-training phase, the arguably equally important role played by downstream policy learning during control-specific fine-tuning is often neglected. It thus remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies. To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF). Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm. We show that conventionally accepted evaluation based on RL methods is highly variable and therefore unreliable, and further advocate for using more robust methods like VRF and BC. To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work.
翻译:近年来,预训练视觉模型在电机控制中的应用日益受到关注。现有研究主要强调预训练阶段的重要性,但可能同样关键的控制特定微调中的下游策略学习作用却常被忽视。因此,不同控制策略下预训练视觉模型的有效性是否一致仍不明确。为弥补这一认知缺失,我们采用三类不同的策略学习方法——包括强化学习、通过行为克隆的模仿学习以及基于视觉奖励函数的模仿学习——对14个预训练视觉模型进行了全面研究。研究得出系列有趣结果,包括发现预训练的有效性高度依赖于下游策略学习算法的选择。研究表明,基于传统强化学习方法的评估结果波动显著且不可靠,因此我们进一步倡导采用如视觉奖励函数和行为克隆等更稳健的方法。为促进未来对预训练模型及其策略学习方法进行更通用的评估,我们还随研究发布了横跨3个不同环境的21项任务基准测试集。