RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Robots need to explore their surroundings to adapt to and tackle tasks in unknown environments. Prior work has proposed building scene graphs of the environment but typically assumes that the environment is static, omitting regions that require active interactions. This severely limits their ability to handle more complex tasks in household and office environments: before setting up a table, robots must explore drawers and cabinets to locate all utensils and condiments. In this work, we introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment. The ACSG accounts for both low-level information, such as geometry and semantics, and high-level information, such as the action-conditioned relationships between different entities in the scene. To this end, we present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system's capabilities. The robot reasons about what and how to explore an object, accumulating new information through the interaction process and incrementally constructing the ACSG. We apply our system across various real-world settings in a zero-shot manner, demonstrating its effectiveness in exploring and modeling environments it has never seen before. Leveraging the constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP system in facilitating a wide range of real-world manipulation tasks involving rigid, articulated objects, nested objects like Matryoshka dolls, and deformable objects like cloth.

翻译：机器人需要通过探索周围环境来适应并解决未知环境中的任务。先前的工作已提出构建环境场景图，但通常假设环境是静态的，忽略了需要主动交互的区域。这严重限制了其在家庭和办公环境中处理更复杂任务的能力：在布置桌子前，机器人必须探索抽屉和橱柜以找到所有餐具和调味品。本文提出交互式场景探索这一新任务，使机器人能够自主探索环境，并生成捕捉底层环境结构的行为条件场景图（ACSG）。ACSG既包含几何与语义等低层级信息，也包含场景中不同实体间的行为条件关系等高层次信息。为此，我们提出机器人探索（RoboEXP）系统，该系统融合大型多模态模型（LMM）与显式记忆设计以增强系统能力。机器人通过推理确定探索对象的内容与方式，在交互过程中积累新信息并增量构建ACSG。我们将系统以零样本方式应用于多种真实场景，验证了其在探索和建模未知环境方面的有效性。借助构建的ACSG，我们展示了RoboEXP系统在促进各类真实操纵任务（涉及刚性物体、铰接物体、嵌套物体如套娃以及可变形物体如布料）中的高效性和有效性。