Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolved challenges. In this paper, we extend our previous work on keypoints-based visual imitation learning (\mbox{K-VIL})~\cite{gao_kvil_2023} to bimanual manipulation tasks. The proposed Bi-KVIL jointly extracts so-called \emph{Hybrid Master-Slave Relationships} (HMSR) among objects and hands, bimanual coordination strategies, and sub-symbolic task representations. Our bimanual task representation is object-centric, embodiment-independent, and viewpoint-invariant, thus generalizing well to categorical objects in novel scenes. We evaluate our approach in various real-world applications, showcasing its ability to learn fine-grained bimanual manipulation tasks from a small number of human demonstration videos. Videos and source code are available at https://sites.google.com/view/bi-kvil.
翻译:视觉模仿学习借助计算机视觉的最新进展,在从少量视觉观测中学习单臂操作任务方面取得了显著进展。然而,从双臂视觉演示中学习双臂协调策略和复杂物体关系,并将其泛化至新颖杂波场景中的类别物体,仍是未解决的挑战。本文我们将先前基于关键点的视觉模仿学习(K-VIL)~\cite{gao_kvil_2023}工作扩展至双臂操作任务。所提出的Bi-KVIL联合提取物体与手部之间所谓的“混合主从关系”(HMSR)、双臂协调策略及亚符号任务表征。我们的双臂任务表征具有物体中心性、具身无关性和视角不变性,因此能良好泛化至新颖场景中的类别物体。我们在多种真实世界应用中评估了该方法,展示了其从少量人类演示视频中学习精细双臂操作任务的能力。视频和源代码见 https://sites.google.com/view/bi-kvil。