Arranging objects correctly is a key capability for robots which unlocks a wide range of useful tasks. A prerequisite for creating successful arrangements is the ability to evaluate the desirability of a given arrangement. Our method "SceneScore" learns a cost function for arrangements, such that desirable, human-like arrangements have a low cost. We learn the distribution of training arrangements offline using an energy-based model, solely from example images without requiring environment interaction or human supervision. Our model is represented by a graph neural network which learns object-object relations, using graphs constructed from images. Experiments demonstrate that the learned cost function can be used to predict poses for missing objects, generalise to novel objects using semantic features, and can be composed with other cost functions to satisfy constraints at inference time.
翻译:正确排列物体是机器人的一项关键能力,它解锁了广泛的有用任务。创建成功排列的前提是能够评估给定排列的合意性。我们的方法“SceneScore”学习一种用于排列的成本函数,使得合意且类人的排列具有低成本。我们通过基于能量的模型离线学习训练排列的分布,仅依赖示例图像,无需环境交互或人工监督。该模型由图神经网络表示,使用从图像构建的图来学习物体-物体关系。实验表明,学习到的成本函数可用于预测缺失物体的姿态,利用语义特征泛化到新物体,并可在推理时与其他成本函数组合以满足约束条件。