To perform a wide range of daily tasks, robots need to construct a 3D representation that is semantically rich, physically grounded, and structured enough to support task planning and affordance prediction. However, existing approaches primarily focus on semantic retrieval, often overlooking physical and kinematic factors. Methods that attempt to model physical properties typically rely on narrow training sets or single-object modeling, limiting scalability and generalization across diverse object types. To address these challenges, we present PhysGraph, a framework that unifies symbolic reasoning with structured 3D geometry to model kinematic and physical properties in cluttered scenes. Given RGB-D observations, PhysGraph reconstructs object-centric 3D geometry and associates object instances across views. It then decomposes objects into functional parts and infers materials and articulations through visual reasoning. Evaluated on both synthetic and real-world datasets, PhysGraph achieves state-of-the-art results in semantic segmentation, multi-object mass estimation, and articulation prediction. With its simple yet effective design, PhysGraph produces physically consistent and semantically structured scene graphs, serving as a structured 3D representation for downstream tasks such as constraint-aware 3D affordance prediction and real-to-sim transfer, both of which are demonstrated in our experiments.
翻译:摘要:为完成各类日常任务,机器人需构建兼具语义丰富性、物理真实性与结构化程度的三维表征,以支持任务规划与可操作属性预测。然而现有方法主要聚焦语义检索,常忽视物理与运动学因素。当前尝试建立物理属性建模的方法多依赖有限训练集或单一物体建模,限制了其在不同物体类型间的扩展性与泛化能力。针对上述挑战,我们提出PhysGraph框架,该框架将符号推理与结构化三维几何建模相统一,用于建模杂乱场景中的运动学与物理属性。基于RGB-D观测数据,PhysGraph重建以物体为中心的三维几何结构,并跨视角关联物体实例,进而将物体分解为功能部件,通过视觉推理推断材料属性与关节结构。在合成数据集与真实数据集上的评估表明,PhysGraph在语义分割、多物体质量估计与关节预测任务中均达到最优性能。凭借简洁高效的设计,PhysGraph可生成物理一致且语义结构化的场景图,为约束感知的三维可操作性预测及真实-仿真迁移等下游任务提供结构化三维表征——这两项应用均在实验中得到了验证。