Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR). While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks. In particular, we propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors and camera parameters. Notably, we demonstrate that DR parameters have similar impact on our off-line proxy task and on-line policies. We, hence, use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot. Our approach achieves 93% success rate on average when tested on a diverse set of challenging manipulation tasks. Moreover, we evaluate the robustness of policies to visual variations in real scenes and show that our simulator-trained policies outperform policies learned using real but limited data. Code, simulation environment, real robot datasets and trained models are available at https://www.di.ens.fr/willow/research/robust_s2r/.
翻译:在仿真环境中学习视觉运动策略比在真实世界中更安全且成本更低。然而,由于仿真数据与真实数据之间存在差异,仿真训练的策略在迁移到真实机器人时往往失败。弥合视觉仿真到现实领域差距的常见方法是域随机化。先前的工作主要评估域随机化在非实体任务(如姿态估计和目标检测)中的效果,而本文系统性地探索了视觉域随机化方法,并在丰富且具有挑战性的机器人操作任务集上进行了基准测试。特别地,我们提出了一个离线代理任务——立方体定位,用于选择纹理随机化、光照随机化、物体颜色变化和相机参数等域随机化参数。值得注意的是,我们证明了域随机化参数对离线代理任务和在线策略具有相似的影响。因此,我们使用离线优化的域随机化参数在仿真环境中训练视觉运动策略,并直接将这些策略应用于真实机器人。我们的方法在多样化且具有挑战性的操作任务测试中平均实现了93%的成功率。此外,我们评估了策略对真实场景中视觉变化的鲁棒性,并表明我们的仿真训练策略优于使用真实但有限数据学习的策略。代码、仿真环境、真实机器人数据集及训练模型可访问 https://www.di.ens.fr/willow/research/robust_s2r/ 获取。