Learning object affordances is an effective tool in the field of robot learning. While the data-driven models investigate affordances of single or paired objects, there is a gap in the exploration of affordances of compound objects composed of an arbitrary number of objects. We propose the Multi-Object Graph Affordance Network which models complex compound object affordances by learning the outcomes of robot actions that facilitate interactions between an object and a compound. Given the depth images of the objects, the object features are extracted via convolution operations and encoded in the nodes of graph neural networks. Graph convolution operations are used to encode the state of the compounds, which are used as input to decoders to predict the outcome of the object-compound interactions. After learning the compound object affordances, given different tasks, the learned outcome predictors are used to plan sequences of stack actions that involve stacking objects on top of each other, inserting smaller objects into larger containers and passing through ring-like objects through poles. We showed that our system successfully modeled the affordances of compound objects that include concave and convex objects, in both simulated and real-world environments. We benchmarked our system with a baseline model to highlight its advantages.
翻译:学习物体感知是机器人学习领域的有效工具。现有数据驱动模型主要研究单个或成对物体的感知能力,而对由任意数量物体组成的复合物体感知的探索尚存空白。本文提出多物体图感知网络,通过学习促进物体与复合体之间交互的机器人动作结果,对复杂复合物体感知进行建模。给定物体的深度图像,通过卷积操作提取物体特征并编码至图神经网络的节点中。利用图卷积操作对复合体状态进行编码,将其作为解码器的输入以预测物体-复合体交互结果。在习得复合物体感知后,针对不同任务,使用习得的结果预测器规划堆叠动作序列,包括物体相互堆叠、将较小物体插入较大容器、使环状物体穿过杆状物体等操作。实验表明,我们的系统在仿真和真实环境中均成功建模了包含凹凸物体的复合物体感知能力。通过与基线模型对比,凸显了本系统的优势。