Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data generation system that can autonomously produce a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39x with only 10 human interventions. Videos and more results are available at https://sites.google.com/view/intervengen2024.
翻译:模仿学习是训练机器人控制策略的一种有前景的范式,但这些策略可能面临分布偏移问题,即评估时的条件与训练数据中的条件不同。提升策略对分布偏移鲁棒性的一种流行方法是交互式模仿学习(如DAgger及其变体),该方法通过人类操作者在策略执行过程中提供纠正性干预。然而,收集足够数量的干预以覆盖策略错误的分布可能给人类操作者带来沉重负担。我们提出IntervenGen(I-Gen),这是一种新颖的数据生成系统,能够从少量人类干预中自主生成大量具有丰富状态空间覆盖的纠正性干预数据。我们将I-Gen应用于4个模拟环境和1个存在物体位姿估计误差的物理环境,结果表明仅需10次人类干预即可将策略鲁棒性提升至最高39倍。视频及更多结果见https://sites.google.com/view/intervengen2024。