Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy. The codebase and more can be found at https://www.scenegraphmemory.com.
翻译:具身AI智能体在家庭等大型环境中搜索物体时,通常需要基于部分信息预测物体位置以做出高效决策。我们将此问题定义为一种新型链接预测问题:部分可观测动态图上的链接预测。我们的图是一种场景表征,其中房间和物体为节点,其关系编码在边中;智能体在每个时间步仅能观测到动态图的部分信息。这种部分可观测性对现有链接预测方法构成挑战,我们对此提出了解决方案。我们提出一种新颖的状态表征——场景图记忆(SGM)——用于捕捉智能体累积的观测集合,同时设计了一种名为节点边预测器(NEP)的神经网络架构,可从SGM中提取信息以实现高效搜索。我们在动态房屋模拟器(Dynamic House Simulator)中评估了该方法,这是一个遵循家庭常见语义模式生成多样化动态图的新基准。实验表明,NEP可被训练用于预测多种具有不同物体移动动态的环境中的物体位置,在新场景适应性和整体准确率方面均优于基线方法。代码库及更多信息请访问 https://www.scenegraphmemory.com。