Learning robot manipulation policies with deep neural networks from a single demonstration remains highly challenging, as even small deviations from the demonstrated trajectory can quickly compound into failure, while collecting substantial online interaction data is costly. We propose ReGIL, a retrieval-guided imitation learning framework that treats a single demonstration as an external memory. ReGIL repeatedly queries this static memory throughout training to simultaneously guide exploration, generate the regularization buffer, and construct rewards. Specifically, it computes rewards through local temporal alignment between the current trajectory and the retrieved segment, providing step-wise and informative feedback for policy improvement. We evaluate ReGIL on robotic manipulation tasks from the LIBERO and Meta-World benchmarks under the single demonstration setting. ReGIL outperforms prior baselines in both success rate and training efficiency. In real-robot experiments, using only one demonstration and less than one hour of online training, ReGIL achieves over 75% success rate across three manipulation tasks with randomness in both initial robot pose and target position. These results demonstrate that leveraging the single demonstration as reusable memory can provide more than static supervision for efficient robot learning. More details can be found on our website: https://regil2026.github.io/
翻译:摘要:利用深度神经网络从单次演示中学习机器人操作策略仍然极具挑战性,因为即便与演示轨迹存在微小偏差,也可能迅速累积导致失败,而收集大量在线交互数据成本高昂。我们提出ReGIL,一种检索引导的模仿学习框架,将单次演示视为外部记忆。ReGIL在整个训练过程中反复查询该静态记忆,同时指导探索、生成正则化缓冲区并构建奖励函数。具体而言,它通过计算当前轨迹与检索片段之间的局部时间对齐来提供奖励,从而为策略改进提供逐步且信息丰富的反馈。我们在LIBERO和Meta-World基准测试的机器人操作任务上,在单次演示设置下评估了ReGIL。ReGIL在成功率和训练效率上均优于先前基线方法。在真实机器人实验中,仅使用一次演示和不到一小时的在线训练,ReGIL在三个操作任务中(机器人初始位姿和目标位置均具有随机性)实现了超过75%的成功率。这些结果表明,将单次演示作为可重用记忆可提供超越静态监督的效能,用于高效的机器人学习。更多详情请访问我们的网站:https://regil2026.github.io/