Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between prediction and reality. For this reason, effect prediction and parameter estimation with pushing actions have been heavily investigated in the literature. However, current approaches are limited because they either model systems with a fixed number of objects or use image-based representations whose outputs are not very interpretable and quickly accumulate errors. In this paper, we propose a graph neural network based framework for effect prediction and parameter estimation of pushing actions by modeling object relations based on contacts or articulations. Our framework is validated both in real and simulated environments containing different shaped multi-part objects connected via different types of joints and objects with different masses, and it outperforms image-based representations on physics prediction. Our approach enables the robot to predict and adapt the effect of a pushing action as it observes the scene. It can also be used for tool manipulation with never-seen tools. Further, we demonstrate 6D effect prediction in the lever-up action in the context of robot-based hard-disk disassembly.
翻译:推操作是一种基本的非抓取操控技能,广泛应用于从预抓取操作到场景重排等任务中,涉及对场景中物体关系的推理,因此推动作在机器人领域得到了广泛研究。有效利用推操作通常需要理解被操控物体的动力学特性,并适应预测与实际情况之间的差异。为此,文献中对推动作的效果预测和参数估计进行了大量研究。然而,当前方法存在局限性:要么是针对固定数量物体的系统建模,要么是使用基于图像的表示方法,其输出可解释性较差且误差会迅速累积。本文提出一种基于图神经网络框架的方法,通过建模物体间的接触或关节关系,实现对推动作的效果预测和参数估计。该框架在包含不同形状、通过不同类型关节连接的多部件物体以及不同质量物体的真实与仿真环境中均得到了验证,并在物理预测方面优于基于图像的表示方法。我们的方法使机器人能够通过观察场景来预测和调整推动作的效果,还可用于从未见过的工具进行工具操控。此外,我们还在机器人硬盘拆卸场景中演示了杠杆抬起动作的六维效果预测。