Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.
翻译:扩散模型因其灵活性和多模态特性,已成为机器人领域中的一种强大工具。尽管其中一些方法能够有效解决复杂问题,但它们通常严重依赖推理时的障碍物检测,并需要额外设备。为应对这些挑战,我们提出一种方法,该方法在推理时仅从单一视觉输入中同时生成可达目标并规划避障运动。我们方法的核心是将一种避碰扩散核创新性地用于训练。通过与行为克隆和经典扩散模型进行对比评估,我们的框架证明了其稳健性。该方法在多模态环境中尤其有效,能够引导朝向目标运动,避开被障碍物阻挡的不可达目标,同时确保避免碰撞。