In imitation learning for robotics, cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. This work seeks to elucidate basic principles of this sim-and-real cotraining to help inform simulation design, sim-and-real dataset creation, and policy training. Focusing narrowly on the canonical task of planar pushing from camera inputs enabled us to be thorough in our study. These experiments confirm that cotraining with simulated data \emph{can} dramatically improve performance in real, especially when real data is limited. Performance gains scale with simulated data, but eventually plateau; real-world data increases this performance ceiling. The results also suggest that reducing the domain gap in physics may be more important than visual fidelity for non-prehensile manipulation tasks. Perhaps surprisingly, having some visual domain gap actually helps the cotrained policy -- binary probes reveal that high-performing policies learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. In total, our experiments span over 40 real-world policies (evaluated on 800+ trials) and 200 simulated policies (evaluated on 40,000+ trials).
翻译:在机器人模仿学习中,利用仿真与真实硬件生成的演示数据进行协同训练已成为克服仿真到现实差距的有效方法。本研究旨在阐明此类仿真-真实协同训练的基本原理,以期为仿真设计、仿真-真实数据集构建及策略训练提供指导。通过聚焦于从相机输入执行平面推动这一经典任务,我们得以开展系统性研究。实验证实,仿真数据协同训练确实能显著提升真实环境中的性能表现,尤其在真实数据有限的情况下更为明显。性能提升随仿真数据量增加而扩展,但最终趋于饱和;真实世界数据则能提高此性能上限。结果表明,对于非抓取式操作任务,缩小物理领域的差距可能比视觉保真度更为重要。值得注意的是,适度的视觉领域差异反而有助于协同训练策略——二元探针分析表明,高性能策略能够学会区分仿真域与真实域。最后,我们深入探讨了这种微妙现象以及促进仿真与真实间正向迁移的机制。本研究总计涵盖40余个真实世界策略(通过800多次试验评估)和200余个仿真策略(通过40,000多次试验评估)。