Diffusion policies have shown to be very efficient at learning complex, multi-modal behaviors for robotic manipulation. However, errors in generated action sequences can compound over time which can potentially lead to failure. Some approaches mitigate this by augmenting datasets with expert demonstrations or learning predictive world models which might be computationally expensive. We introduce Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time. PPGuide makes use of a novel self-supervised process: it uses attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure. We then train a performance predictor on this self-labeled data. During inference, this predictor provides a real-time gradient to guide the policy toward more robust actions. We validated our proposed PPGuide across a diverse set of tasks from the Robomimic and MimicGen benchmarks, demonstrating consistent improvements in performance.
翻译:扩散策略已被证明在机器人操作任务中学习复杂多模态行为方面非常高效。然而,生成动作序列中的误差会随时间累积,可能导致任务失败。现有方法通过增加专家演示数据或学习预测性世界模型来缓解此问题,但这些方法可能计算成本高昂。本文提出性能预测引导(PPGuide),一种基于分类器的轻量级框架,可在推理阶段引导预训练扩散策略规避失败模式。PPGuide采用新颖的自监督流程:利用基于注意力的多示例学习自动识别策略 rollout 中哪些观测-动作片段与任务成败相关,并基于此自标注数据训练性能预测器。在推理过程中,该预测器提供实时梯度以引导策略生成更鲁棒的动作。我们在 Robomimic 和 MimicGen 基准测试的多样化任务中验证了 PPGuide,实验结果表明该方法能持续提升策略性能。