Despite its promise, imitation learning often fails in long-horizon environments where perfect replication of demonstrations is unrealistic and small errors can accumulate catastrophically. We introduce Cago (Capability-Aware Goal Sampling), a novel learning-from-demonstrations method that mitigates the brittle dependence on expert trajectories for direct imitation. Unlike prior methods that rely on demonstrations only for policy initialization or reward shaping, Cago dynamically tracks the agent's competence along expert trajectories and uses this signal to select intermediate steps--goals that are just beyond the agent's current reach--to guide learning. This results in an adaptive curriculum that enables steady progress toward solving the full task. Empirical results demonstrate that Cago significantly improves sample efficiency and final performance across a range of sparse-reward, goal-conditioned tasks, consistently outperforming existing learning from-demonstrations baselines.
翻译:尽管模仿学习前景广阔,但在长视野环境中往往失败,因为完美复现演示是不现实的,且微小误差可能灾难性地累积。我们提出Cago(能力感知目标采样),一种新颖的演示学习方法,旨在缓解直接模仿对专家轨迹的脆弱依赖。与先前仅将演示用于策略初始化或奖励塑形的方法不同,Cago动态追踪智能体沿专家轨迹的能力,并利用该信号选择中间步骤——即刚好超出智能体当前能力范围的目标——来引导学习。这形成了一种自适应课程,使智能体能够稳步推进完整任务的求解。实证结果表明,在一系列稀疏奖励、目标条件任务中,Cago显著提升了样本效率和最终性能,持续优于现有的演示学习基线方法。