In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale domains. In such settings, simulators will likely fail to model all relevant details of a given target task and this observation motivates the study of sim2real with simulators that leave out key task details. In this paper, we formalize and study the abstract sim2real problem: given an abstract simulator that models a target task at a coarse level of abstraction, how can we train a policy with RL in the abstract simulator and successfully transfer it to the real-world? Our first contribution is to formalize this problem using the language of state abstraction from the RL literature. This framing shows that an abstract simulator can be grounded to match the target task if the grounded abstract dynamics take the history of states into account. Based on the formalism, we then introduce a method that uses real-world task data to correct the dynamics of the abstract simulator. We then show that this method enables successful policy transfer both in sim2sim and sim2real evaluation.
翻译:近年来,强化学习在拥有快速且精准模拟器的任务中取得了显著成功。然而,当机器人被部署到日益复杂和广泛的领域时,提高模拟器的真实感通常是有益的,但实现起来却愈发困难。在此类场景中,模拟器很可能无法建模目标任务的所有关键细节,这一观察促使我们研究使用忽略关键任务细节的模拟器进行"模拟到现实"迁移。本文针对抽象模拟到现实问题展开形式化与研究:给定一个对目标任务进行粗粒度抽象建模的模拟器,如何通过强化学习在该模拟器中训练策略并成功迁移至真实环境?我们的首个贡献是运用强化学习文献中的状态抽象语言对该问题进行形式化。这一框架表明,若考虑状态历史记录的抽象动态过程能与现实接轨,抽象模拟器便可与目标任务相匹配。基于该形式体系,我们提出一种利用真实任务数据修正抽象模拟器动态特性的方法。实验证明,该方法在模拟到模拟与模拟到现实两种评估中均能实现成功的策略迁移。