Acting in cluttered environments requires predicting and avoiding collisions while still achieving precise control. Conventional optimization-based controllers can enforce physical constraints, but they struggle to produce feasible solutions quickly when many obstacles are present. Diffusion models can generate diverse trajectories around obstacles, yet prior approaches lacked a general and efficient way to condition them on scene structure. In this paper, we show that combining diffusion-based warm-starting conditioned with a latent object-centric representation of the scene and with a collision-aware model predictive controller (MPC) yields reliable and efficient motion generation under strict time limits. Our approach conditions a diffusion transformer on the system state, task, and surroundings, using an object-centric slot attention mechanism to provide a compact obstacle representation suitable for control. The sampled trajectories are refined by an optimal control problem that enforces rigid-body dynamics and signed-distance collision constraints, producing feasible motions in real time. On benchmark tasks, this hybrid method achieved markedly higher success rates and lower latency than sampling-based planners or either component alone. Real-robot experiments with a torque-controlled Panda confirm reliable and safe execution with MPC.
翻译:在密集环境中执行动作需要预测并避免碰撞,同时仍需实现精确控制。传统的基于优化的控制器能够强制执行物理约束,但当存在大量障碍物时,它们难以快速生成可行的解决方案。扩散模型可以生成绕过障碍物的多样化轨迹,但先前的方法缺乏一种通用且高效的方式来使其以场景结构为条件。本文中,我们证明,将基于扩散的热启动(以场景的潜在对象中心表示为条件)与碰撞感知模型预测控制器(MPC)相结合,能够在严格的时间限制下产生可靠且高效的运动生成。我们的方法以系统状态、任务和周围环境为条件,训练一个扩散Transformer模型,并使用对象中心的槽注意力机制来提供适用于控制的紧凑障碍物表示。采样得到的轨迹通过一个最优控制问题进行优化,该问题强制执行刚体动力学和带符号距离碰撞约束,从而实时生成可行的运动。在基准测试任务中,这种混合方法相较于基于采样的规划器或任一单独组件,实现了显著更高的成功率和更低的延迟。在配备扭矩控制的Panda机器人上进行的真实机器人实验证实了MPC能够可靠且安全地执行动作。