Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.
翻译:扩散模型因其灵活性和多模态性而成为机器人学中的强大工具。尽管部分方法能有效解决复杂问题,但它们往往严重依赖推理时的障碍物检测,并需要额外设备。针对这些挑战,我们提出了一种方法,该方法在推理过程中仅通过单一视觉输入,即可同步生成可达目标并规划避障运动。我们方法的核心在于创新性地使用碰撞避免扩散核进行训练。通过与行为克隆及经典扩散模型的评估对比,我们的框架证明了其鲁棒性。该方法在多模态环境中尤为有效,能引导朝向目标运动,同时避开被障碍物阻断的不可达目标,并确保碰撞避免。