We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage local continuity in the environment dynamics to generate corrective labels. Our method first constructs a dynamics model from the expert demonstration, encouraging local Lipschitz continuity in the learned model. In locally continuous regions, this model allows us to generate corrective labels within the neighborhood of the demonstrations but beyond the actual set of states and actions in the dataset. Training on this augmented data enhances the agent's ability to recover from perturbations and deal with compounding errors. We demonstrate the effectiveness of our generated labels through experiments in a variety of robotics domains in simulation that have distinct forms of continuity and discontinuity, including classic control problems, drone flying, navigation with high-dimensional sensor observations, legged locomotion, and tabletop manipulation.
翻译:我们提出了一种新技术,通过生成矫正数据来增强模仿学习方法的鲁棒性,以应对累积误差和外部干扰。现有方法依赖于交互式专家标注、额外离线数据集或领域特定的不变性假设,而我们的方法仅需专家数据这一基本前提,无需引入过多额外假设。其核心思想是利用环境动力学的局部连续性来生成矫正标签。我们的方法首先根据专家示范数据构建动力学模型,并通过约束学习模型满足局部Lipschitz连续性。在局部连续区域内,该模型允许我们在示范数据邻域内(但超出数据集中实际状态与动作的集合)生成矫正标签。基于此增强数据的训练能提升智能体从扰动中恢复的能力,并有效处理累积误差。我们通过在多种具有不同连续性与非连续性特征的仿真机器人领域进行实验,验证了所生成标签的有效性,包括经典控制问题、无人机飞行、基于高维传感器观测的导航、足式运动以及桌面操作任务。