Learning object affordances is an effective tool in the field of robot learning. While the data-driven models investigate affordances of single or paired objects, there is a gap in the exploration of affordances of compound objects composed of an arbitrary number of objects. We propose the Multi-Object Graph Affordance Network which models complex compound object affordances by learning the outcomes of robot actions that facilitate interactions between an object and a compound. Given the depth images of the objects, the object features are extracted via convolution operations and encoded in the nodes of graph neural networks. Graph convolution operations are used to encode the state of the compounds, which are used as input to decoders to predict the outcome of the object-compound interactions. After learning the compound object affordances, given different tasks, the learned outcome predictors are used to plan sequences of stack actions that involve stacking objects on top of each other, inserting smaller objects into larger containers and passing through ring-like objects through poles. We showed that our system successfully modeled the affordances of compound objects that include concave and convex objects, in both simulated and real-world environments. We benchmarked our system with a baseline model to highlight its advantages.
翻译:学习对象可泛化性是机器人学习领域的一种有效工具。尽管数据驱动模型研究了单个或成对对象的可泛化性,但在探索由任意数量对象组成的复合对象可泛化性方面仍存在空白。我们提出了一种多目标图可泛化网络,该网络通过学习促进对象与复合体之间交互的机器人动作结果,来建模复杂的复合对象可泛化性。给定对象的深度图像,通过卷积操作提取对象特征,并将其编码到图神经网络的节点中。利用图卷积操作对复合体的状态进行编码,并作为解码器的输入以预测对象-复合体交互的结果。在学习复合对象可泛化性后,针对不同的任务,使用学习到的结果预测器来规划一系列堆叠动作,包括将对象相互堆叠、将较小的对象插入较大的容器以及将环状对象穿过杆状物体。我们证明了该系统在模拟和真实环境中均成功建模了包含凹凸形状对象的复合对象可泛化性。我们通过基线模型对我们的系统进行了基准测试,以突出其优势。