Discovering Predictive Relational Object Symbols with Symbolic Attentive Layers

In this paper, we propose and realize a new deep learning architecture for discovering symbolic representations for objects and their relations based on the self-supervised continuous interaction of a manipulator robot with multiple objects on a tabletop environment. The key feature of the model is that it can handle a changing number number of objects naturally and map the object-object relations into symbolic domain explicitly. In the model, we employ a self-attention layer that computes discrete attention weights from object features, which are treated as relational symbols between objects. These relational symbols are then used to aggregate the learned object symbols and predict the effects of executed actions on each object. The result is a pipeline that allows the formation of object symbols and relational symbols from a dataset of object features, actions, and effects in an end-to-end manner. We compare the performance of our proposed architecture with state-of-the-art symbol discovery methods in a simulated tabletop environment where the robot needs to discover symbols related to the relative positions of objects to predict the observed effect successfully. Our experiments show that the proposed architecture performs better than other baselines in effect prediction while forming not only object symbols but also relational symbols. Furthermore, we analyze the learned symbols and relational patterns between objects to learn about how the model interprets the environment. Our analysis shows that the learned symbols relate to the relative positions of objects, object types, and their horizontal alignment on the table, which reflect the regularities in the environment.

翻译：本文提出并实现了一种新的深度学习架构，用于基于自监督连续交互的机械臂在桌面环境中操控多个对象，从而发现对象及其关系的符号表示。该模型的关键特征在于，它能自然地处理数量变化的对象，并将对象-对象关系显式映射到符号领域。模型中，我们采用自注意力层从对象特征中计算离散注意力权重，这些权重被视为对象间的关系符号。随后，这些关系符号被用于聚合学习到的对象符号，并预测所执行动作对每个对象的影响。最终形成一种端到端的流程，能从对象特征、动作和影响的数据集中生成对象符号和关系符号。我们在模拟桌面环境中将所提架构的性能与最先进的符号发现方法进行比较，在该环境中机器人需要发现与对象相对位置相关的符号，以成功预测观察到的效果。实验表明，所提架构在效果预测方面优于其他基线方法，同时不仅能形成对象符号，还能形成关系符号。此外，我们分析了学习到的符号和对象间的关系模式，以了解模型如何解释环境。分析显示，学习到的符号与对象的相对位置、对象类型及其在桌面上的水平对齐方式相关，这反映了环境中的规律性。