We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage local continuity in the environment dynamics to generate corrective labels. Our method first constructs a dynamics model from the expert demonstration, encouraging local Lipschitz continuity in the learned model. In locally continuous regions, this model allows us to generate corrective labels within the neighborhood of the demonstrations but beyond the actual set of states and actions in the dataset. Training on this augmented data enhances the agent's ability to recover from perturbations and deal with compounding errors. We demonstrate the effectiveness of our generated labels through experiments in a variety of robotics domains in simulation that have distinct forms of continuity and discontinuity, including classic control problems, drone flying, navigation with high-dimensional sensor observations, legged locomotion, and tabletop manipulation.
翻译:我们提出了一种新技术,通过生成修正数据来应对复合误差和外部扰动,从而增强模仿学习方法的鲁棒性。现有方法通常依赖交互式专家标注、额外离线数据集或领域特定的不变性,而我们的方法仅需在专家数据访问基础上增加极少的额外假设。核心思想是利用环境动力学的局部连续性来生成修正标签。该方法首先从专家示范中构建动力学模型,并通过约束使学习模型具有局部Lipschitz连续性。在局部连续区域中,该模型允许我们在示范轨迹的邻域内(但超出数据集中实际状态-动作集合的范围)生成修正标签。通过增强数据训练,智能体恢复扰动影响及处理复合误差的能力得到提升。我们在模拟环境中多个机器人领域(包括经典控制问题、无人机飞行、高维传感器观测导航、足式运动和桌面操作)通过实验验证了生成标签的有效性,这些领域具有不同形式的连续性与非连续性特征。