Bimanual manipulation tasks typically involve multiple stages which require efficient interactions between two arms, posing step-wise and stage-wise challenges for imitation learning systems. Specifically, failure and delay of one step will broadcast through time, hinder success and efficiency of each sub-stage task, and thereby overall task performance. Although recent works have made strides in addressing certain challenges, few approaches explicitly consider the multi-stage nature of bimanual tasks while simultaneously emphasizing the importance of inference speed. In this paper, we introduce a novel keypose-conditioned consistency policy tailored for bimanual manipulation. It is a hierarchical imitation learning framework that consists of a high-level keypose predictor and a low-level trajectory generator. The predicted keyposes provide guidance for trajectory generation and also mark the completion of one sub-stage task. The trajectory generator is designed as a consistency model trained from scratch without distillation, which generates action sequences conditioning on current observations and predicted keyposes with fast inference speed. Simulated and real-world experimental results demonstrate that the proposed approach surpasses baseline methods in terms of success rate and operational efficiency.
翻译:双手操作任务通常涉及多个阶段,需要双臂间高效交互,这对模仿学习系统提出了分步与分阶段的挑战。具体而言,单个步骤的失败或延迟会随时间传播,阻碍各子阶段任务的成功率与效率,进而影响整体任务性能。尽管近期研究在应对部分挑战方面取得进展,但少有方法能同时显式考虑双手任务的多阶段特性并强调推理速度的重要性。本文提出一种专为双手操作设计的新型关键姿态条件一致性策略。该策略是一个分层模仿学习框架,包含高层关键姿态预测器与低层轨迹生成器。预测的关键姿态为轨迹生成提供指引,同时标志着子阶段任务的完成。轨迹生成器设计为无需蒸馏从头训练的一致性模型,能够基于当前观测与预测关键姿态快速推理生成动作序列。仿真与真实世界实验结果表明,所提方法在成功率和操作效率方面均超越基线方法。