Imitation learning methods need significant human supervision to learn policies robust to changes in object poses, physical disturbances, and visual distractors. Reinforcement learning, on the other hand, can explore the environment autonomously to learn robust behaviors but may require impractical amounts of unsafe real-world data collection. To learn performant, robust policies without the burden of unsafe real-world data collection or extensive human supervision, we propose RialTo, a system for robustifying real-world imitation learning policies via reinforcement learning in "digital twin" simulation environments constructed on the fly from small amounts of real-world data. To enable this real-to-sim-to-real pipeline, RialTo proposes an easy-to-use interface for quickly scanning and constructing digital twins of real-world environments. We also introduce a novel "inverse distillation" procedure for bringing real-world demonstrations into simulated environments for efficient fine-tuning, with minimal human intervention and engineering required. We evaluate RialTo across a variety of robotic manipulation problems in the real world, such as robustly stacking dishes on a rack, placing books on a shelf, and six other tasks. RialTo increases (over 67%) in policy robustness without requiring extensive human data collection. Project website and videos at https://real-to-sim-to-real.github.io/RialTo/
翻译:模仿学习方法需要大量的人类监督,才能学习到对物体位姿变化、物理干扰和视觉干扰具有鲁棒性的策略。另一方面,强化学习可以自主探索环境以学习鲁棒行为,但可能需要进行不切实际且不安全的真实世界数据收集。为了在不承担不安全的真实世界数据收集负担或需要大量人类监督的情况下,学习高性能、鲁棒的策略,我们提出了RialTo系统,该系统通过“数字孪生”仿真环境中的强化学习来鲁棒化真实世界的模仿学习策略,而这些仿真环境是基于少量真实世界数据即时构建的。为了实现这一真实到仿真再到真实的流水线,RialTo提出了一种易于使用的接口,用于快速扫描和构建真实环境的数字孪生。我们还引入了一种新颖的“逆蒸馏”过程,用于将真实世界的演示引入仿真环境进行高效微调,且所需的人工干预和工程工作量极少。我们在真实世界中评估了RialTo在多种机器人操控问题上的表现,例如鲁棒地将盘子堆叠在架子上、将书籍放置在书架上,以及其他六项任务。RialTo在策略鲁棒性上提升了超过67%,且无需进行大量的人类数据收集。项目网站和视频请访问 https://real-to-sim-to-real.github.io/RialTo/