Imitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations. However, traditional IL methods are limited by their reliance on high-quality, often scarce, expert data, and suffer from covariate shift. To address these challenges, recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training. In this paper, we propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics. Specifically, we introduce a state-based search framework that stitches state-action pairs from imperfect demonstrations, generating more diverse and informative training trajectories. Experimental results on standard IL benchmarks and real-world robotic tasks showcase that our proposed method significantly improves both generalization and performance.
翻译:模仿学习(IL)已证明能有效使机器人通过专家演示获取视觉运动技能。然而,传统IL方法受限于其对高质量专家数据的依赖(此类数据通常稀缺),且易受协变量偏移影响。为应对这些挑战,离线IL的最新进展已将次优、未标注数据集纳入训练。本文提出一种新颖方法,通过利用任务相关的轨迹片段和丰富的环境动态,从混合质量的离线数据集中增强策略学习。具体而言,我们引入一种基于状态的搜索框架,该框架拼接来自不完美演示的状态-动作对,从而生成更多样化且信息丰富的训练轨迹。在标准IL基准测试和真实世界机器人任务上的实验结果表明,我们提出的方法显著提升了泛化能力和性能。