Training control policies in simulation is more appealing than on real robots directly, as it allows for exploring diverse states in a safe and efficient manner. Yet, robot simulators inevitably exhibit disparities from the real world, yielding inaccuracies that manifest as the simulation-to-real gap. Existing literature has proposed to close this gap by actively modifying specific simulator parameters to align the simulated data with real-world observations. However, the set of tunable parameters is usually manually selected to reduce the search space in a case-by-case manner, which is hard to scale up for complex systems and requires extensive domain knowledge. To address the scalability issue and automate the parameter-tuning process, we introduce an approach that aligns the simulator with the real world by discovering the causal relationship between the environment parameters and the sim-to-real gap. Concretely, our method learns a differentiable mapping from the environment parameters to the differences between simulated and real-world robot-object trajectories. This mapping is governed by a simultaneously-learned causal graph to help prune the search space of parameters, provide better interpretability, and improve generalization. We perform experiments to achieve both sim-to-sim and sim-to-real transfer, and show that our method has significant improvements in trajectory alignment and task success rate over strong baselines in a challenging manipulation task.
翻译:在仿真环境中训练控制策略比直接在真实机器人上训练更具吸引力,因为它能够以安全高效的方式探索多样化的状态。然而,机器人仿真器不可避免地与真实世界存在差异,导致不准确性,这种差异表现为仿真到现实的差距。现有文献提出通过主动修改特定仿真参数来弥合这一差距,使仿真数据与真实世界观测对齐。然而,可调节参数集通常是手动选择的,以逐案方式减少搜索空间,这难以扩展至复杂系统,且需要大量领域知识。为了解决可扩展性问题并自动化参数调整过程,我们引入了一种方法,通过发现环境参数与仿真到现实差距之间的因果关系,使仿真器与真实世界对齐。具体来说,我们的方法学习了一个从环境参数到仿真与真实机器人-物体轨迹差异的可微映射。该映射由一个同步学习的因果图控制,有助于修剪参数搜索空间,提供更好的可解释性,并改进泛化能力。我们进行了实验,以实现仿真到仿真和仿真到现实的迁移,结果表明,在具有挑战性的操控任务中,我们的方法在轨迹对齐和任务成功率上相较于强基线有显著提升。