Generating hand grasps with language instructions is a widely studied topic that benefits from embodied AI and VR/AR applications. While transferring into hand articulatied object interaction (HAOI), the hand grasps synthesis requires not only object functionality but also long-term manipulation sequence along the object deformation. This paper proposes a novel HAOI sequence generation framework SynHLMA, to synthesize hand language manipulation for articulated objects. Given a complete point cloud of an articulated object, we utilize a discrete HAOI representation to model each hand object interaction frame. Along with the natural language embeddings, the representations are trained by an HAOI manipulation language model to align the grasping process with its language description in a shared representation space. A joint-aware loss is employed to ensure hand grasps follow the dynamic variations of articulated object joints. In this way, our SynHLMA achieves three typical hand manipulation tasks for articulated objects of HAOI generation, HAOI prediction and HAOI interpolation. We evaluate SynHLMA on our built HAOI-lang dataset and experimental results demonstrate the superior hand grasp sequence generation performance comparing with state-of-the-art. We also show a robotics grasp application that enables dexterous grasps execution from imitation learning using the manipulation sequence provided by our SynHLMA. Our codes and datasets will be made publicly available.
翻译:根据语言指令生成手部抓取动作是一个被广泛研究的课题,其受益于具身人工智能及虚拟/增强现实应用。当任务转向手部与关节化物体交互时,手部抓取合成不仅需要考虑物体功能,还需考虑伴随物体形变的长期操控序列。本文提出了一种新颖的HAOI序列生成框架SynHLMA,用于合成针对关节化物体的手部语言操控。给定一个关节化物体的完整点云,我们采用一种离散的HAOI表示来建模每一帧的手-物交互。结合自然语言嵌入,这些表示通过一个HAOI操控语言模型进行训练,以在共享表示空间中将抓取过程与其语言描述对齐。我们采用关节感知损失来确保手部抓取遵循关节化物体关节的动态变化。通过这种方式,我们的SynHLMA实现了针对关节化物体的三种典型手部操控任务:HAOI生成、HAOI预测和HAOI插值。我们在自建的HAOI-lang数据集上评估了SynHLMA,实验结果表明,与现有最先进方法相比,我们的方法在手部抓取序列生成方面具有优越性能。我们还展示了一个机器人抓取应用,该应用能够利用SynHLMA提供的操控序列,通过模仿学习实现灵巧的抓取执行。我们的代码和数据集将公开提供。