A fundamental aspect for building intelligent autonomous robots that can assist humans in their daily lives is the construction of rich environmental representations. While advances in semantic scene representations have enriched robotic scene understanding, current approaches lack a connection between spatial features and dynamic events; e.g., connecting the blue mug to the event washing a mug. In this work, we introduce the event-grounding graph (EGG), a framework grounding event interactions to spatial features of a scene. This representation allows robots to perceive, reason, and respond to complex spatio-temporal queries. Experiments using real robotic data demonstrate EGG's capability to retrieve relevant information and respond accurately to human inquiries concerning the environment and events within. Furthermore, the EGG framework's source code and evaluation dataset are released as open-source at: https://github.com/aalto-intelligent-robotics/EGG.
翻译:构建能够协助人类日常生活的智能自主机器人的一个基本方面在于构建丰富的环境表征。尽管语义场景表征的进展丰富了机器人场景理解,但现有方法缺乏空间特征与动态事件之间的关联;例如,将蓝色马克杯与清洗马克杯的事件联系起来。在本研究中,我们提出了事件-基础图(EGG),这是一个将事件交互与场景空间特征相连接的理论框架。该表征使机器人能够感知、推理并响应复杂的时空查询。使用真实机器人数据的实验证明,EGG能够检索相关信息,并准确响应关于环境及其中事件的人类查询。此外,EGG框架的源代码与评估数据集已在以下地址开源发布:https://github.com/aalto-intelligent-robotics/EGG。