Acting in cluttered environments requires predicting and avoiding collisions while still achieving precise control. Conventional optimization-based controllers can enforce physical constraints, but they struggle to produce feasible solutions quickly when many obstacles are present. Diffusion models can generate diverse trajectories around obstacles, yet prior approaches lacked a general and efficient way to condition them on scene structure. In this paper, we show that combining diffusion-based warm-starting conditioned with a latent object-centric representation of the scene and with a collision-aware model predictive controller (MPC) yields reliable and efficient motion generation under strict time limits. Our approach conditions a diffusion transformer on the system state, task, and surroundings, using an object-centric slot attention mechanism to provide a compact obstacle representation suitable for control. The sampled trajectories are refined by an optimal control problem that enforces rigid-body dynamics and signed-distance collision constraints, producing feasible motions in real time. On benchmark tasks, this hybrid method achieved markedly higher success rates and lower latency than sampling-based planners or either component alone. Real-robot experiments with a torque-controlled Panda confirm reliable and safe execution with MPC.
翻译:在杂乱环境中执行动作需要预测并规避碰撞,同时仍需实现精确控制。传统的基于优化的控制器能够强制实施物理约束,但当存在大量障碍物时,它们难以快速生成可行的解决方案。扩散模型可以生成绕过障碍物的多样化轨迹,但先前的方法缺乏一种通用且高效的方式使其以场景结构为条件。在本文中,我们证明,将基于扩散的预热启动与场景的潜在物体中心表示以及具有碰撞感知能力的模型预测控制器(MPC)相结合,能够在严格的时间限制下产生可靠且高效的运动生成。我们的方法以系统状态、任务和周围环境为条件,利用物体中心的槽注意力机制来提供一种适用于控制的紧凑障碍物表示。采样得到的轨迹通过一个最优控制问题进行细化,该问题强制执行刚体动力学和符号距离碰撞约束,从而实时生成可行的运动。在基准测试任务中,这种混合方法比基于采样的规划器或任一单独组件都实现了显著更高的成功率和更低的延迟。在配备扭矩控制Panda机器人的真实机器人实验中,该方法通过MPC实现了可靠且安全的执行。