Learning policies from high-dimensional visual inputs, such as pixels and point clouds, is crucial in various applications. Visual reinforcement learning is a promising approach that directly trains policies from visual observations, although it faces challenges in sample efficiency and computational costs. This study conducts an empirical comparison of State-to-Visual DAgger, a two-stage framework that initially trains a state policy before adopting online imitation to learn a visual policy, and Visual RL across a diverse set of tasks. We evaluate both methods across 16 tasks from three benchmarks, focusing on their asymptotic performance, sample efficiency, and computational costs. Surprisingly, our findings reveal that State-to-Visual DAgger does not universally outperform Visual RL but shows significant advantages in challenging tasks, offering more consistent performance. In contrast, its benefits in sample efficiency are less pronounced, although it often reduces the overall wall-clock time required for training. Based on our findings, we provide recommendations for practitioners and hope that our results contribute valuable perspectives for future research in visual policy learning.
翻译:从高维视觉输入(如像素和点云)中学习策略在多种应用中至关重要。视觉强化学习是一种直接从视觉观测中训练策略的有效方法,但其在样本效率和计算成本方面面临挑战。本研究对State-to-Visual DAgger(一种先训练状态策略再通过在线模仿学习视觉策略的两阶段框架)与视觉强化学习在多样化任务中进行了实证比较。我们在三个基准测试的16项任务中评估了两种方法的渐近性能、样本效率和计算成本。令人惊讶的是,研究发现State-to-Visual DAgger并未全面优于视觉强化学习,但在复杂任务中展现出显著优势,提供更稳定的性能。相比之下,其在样本效率方面的优势并不突出,尽管通常能减少训练所需的总体挂钟时间。基于研究结果,我们为实践者提供了建议,并希望这些发现能为视觉策略学习的未来研究提供有价值的视角。