Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. This paper compares the performance of RL algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs). We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC). We use the Metaworld Push-v2 and Drawer-Open-v2 tasks for our comparison. Our results show that the choice of training from scratch compared to using PVRs for maximising performance is task-dependent, but PVRs offer advantages in terms of reduced replay buffer size and faster training times. We also identify a strong correlation between the dormant ratio and model performance, highlighting the importance of exploration in visual RL. Our study provides insights into the trade-offs between training from scratch and using PVRs, informing the design of future visual RL algorithms.
翻译:视觉强化学习近年来取得了显著进展,但视觉特征提取器的选择仍是关键的设计决策。本文比较了从头训练卷积神经网络与利用预训练视觉表征的强化学习算法性能。我们评估了最先进的视觉强化学习方法——休眠比率最小化算法,并将其与三种预训练视觉表征进行对比:ResNet18、DINOv2和视觉皮层模型。我们使用Metaworld的Push-v2和Drawer-Open-v2任务进行比较。结果表明,为获得最佳性能而选择从头训练或使用预训练视觉表征具有任务依赖性,但预训练视觉表征在减小经验回放缓冲区大小和加快训练速度方面具有优势。我们还发现休眠比率与模型性能之间存在强相关性,这凸显了探索在视觉强化学习中的重要性。本研究深入探讨了从头训练与使用预训练视觉表征之间的权衡,为未来视觉强化学习算法的设计提供了参考。