Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy. The codebase and more can be found at https://www.scenegraphmemory.com.
翻译:具身智能体在大型环境(如家庭)中搜索物体时,通常需要基于部分信息预测物体位置以做出高效决策。我们将此问题形式化为一种新型的链接预测问题:面向部分可观测动态图的链接预测。本文的图是场景的表示,其中房间和物体为节点,其关系通过边编码;在每个时间步,智能体仅能获知动态图的部分信息。这种部分可观测性对现有链接预测方法构成挑战,我们对此进行攻克。我们提出一种新型状态表示——场景图记忆(SGM),用于捕获智能体累积的观测集合,并设计名为节点边预测器(NEP)的神经网络架构,通过从SGM中提取信息实现高效搜索。我们在动态房屋模拟器(一个新型基准测试平台,能根据家庭常见语义模式生成多样化动态图)中评估所提方法,结果表明NEP可被训练用于预测多种动态环境中的物体位置,在环境适应性与整体准确率上均优于基线方法。代码库及更多信息请参见https://www.scenegraphmemory.com。