Learning-based manipulation policies from image inputs often show weak task transfer capabilities. In contrast, visual servoing methods allow efficient task transfer in high-precision scenarios while requiring only a few demonstrations. In this work, we present a framework that formulates the visual servoing task as graph traversal. Our method not only extends the robustness of visual servoing, but also enables multitask capability based on a few task-specific demonstrations. We construct demonstration graphs by splitting existing demonstrations and recombining them. In order to traverse the demonstration graph in the inference case, we utilize a similarity function that helps select the best demonstration for a specific task. This enables us to compute the shortest path through the graph. Ultimately, we show that recombining demonstrations leads to higher task-respective success. We present extensive simulation and real-world experimental results that demonstrate the efficacy of our approach.
翻译:基于图像输入的习得性操控策略通常表现出弱任务迁移能力。相比之下,视觉伺服方法在高精度场景中能够实现高效的任务迁移,且仅需少量演示。本文提出一个将视觉伺服任务形式化为图遍历的框架。该方法不仅扩展了视觉伺服的鲁棒性,还能基于少量特定任务演示实现多任务能力。我们通过拆分现有演示并将其重新组合来构建演示图。为在推理场景中遍历演示图,我们利用相似度函数辅助选择特定任务的最佳演示,从而计算图中的最短路径。最终研究表明,演示重组可提升任务对应的成功率。我们通过大量仿真与真实世界实验验证了该方法的效果。