Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent challenges, such as the necessity for a target image at test time, the requirement of substantial overlap between initial and target images, and the reliance on feedback from a single camera. This paper introduces Imagine2Servo, an innovative approach leveraging diffusion-based image editing techniques to enhance visual servoing algorithms by generating intermediate goal images. This methodology allows for the extension of visual servoing applications beyond traditional constraints, enabling tasks like long-range navigation and manipulation without predefined goal images. We propose a pipeline that synthesizes subgoal images grounded in the task at hand, facilitating servoing in scenarios with minimal initial and target image overlap and integrating multi-camera feedback for comprehensive task execution. Our contributions demonstrate a novel application of image generation to robotic control, significantly broadening the capabilities of visual servoing systems. Real-world experiments validate the effectiveness and versatility of the Imagine2Servo framework in accomplishing a variety of tasks, marking a notable advancement in the field of visual servoing.
翻译:视觉伺服是一种通过视觉传感器反馈来控制机器人运动的方法,其与基于光流的方法结合已取得显著进展。然而,其应用仍受到固有挑战的限制,例如测试时需要目标图像、初始图像与目标图像之间需要大量重叠,以及对单摄像头反馈的依赖。本文提出了Imagine2Servo,这是一种利用基于扩散的图像编辑技术来增强视觉伺服算法的新方法,通过生成中间目标图像来实现。该方法使得视觉伺服应用能够超越传统限制,实现诸如长距离导航和操作等任务,而无需预定义的目标图像。我们提出了一种基于当前任务合成子目标图像的流程,该流程促进了初始图像与目标图像重叠度极低场景下的伺服控制,并集成了多摄像头反馈以实现全面的任务执行。我们的贡献展示了图像生成在机器人控制中的新颖应用,显著拓宽了视觉伺服系统的能力。真实世界实验验证了Imagine2Servo框架在完成各种任务时的有效性和多功能性,标志着视觉伺服领域的一项显著进步。