Reconstructing physically valid 3D scenes from single-view observations is a prerequisite for bridging the gap between visual perception and robotic control. However, in scenarios requiring precise contact reasoning, such as robotic manipulation in highly cluttered environments, geometric fidelity alone is insufficient. Standard perception pipelines often neglect physical constraints, resulting in invalid states, e.g., floating objects or severe inter-penetration, rendering downstream simulation unreliable. To address these limitations, we propose a novel physics-constrained Real-to-Sim pipeline that reconstructs physically consistent 3D scenes from single-view RGB-D data. Central to our approach is a differentiable optimization pipeline that explicitly models spatial dependencies via a contact graph, jointly refining object poses and physical properties through differentiable rigid-body simulation. Extensive evaluations in both simulation and real-world settings demonstrate that our reconstructed scenes achieve high physical fidelity and faithfully replicate real-world contact dynamics, enabling stable and reliable contact-rich manipulation.
翻译:从单视角观测重建物理有效的三维场景是连接视觉感知与机器人控制的前提。然而,在需要精确接触推理的场景中(例如高度杂乱环境下的机器人操作),仅靠几何保真度是不够的。标准感知流程通常忽略物理约束,导致无效状态(如物体悬浮或严重相互穿透),使得下游仿真不可靠。为解决这些局限,我们提出一种新颖的物理约束真实到仿真流程,可从单视角RGB-D数据重建物理一致的三维场景。我们方法的核心是一个可微分优化流程,该流程通过接触图显式建模空间依赖关系,并借助可微分刚体仿真联合优化物体位姿与物理属性。在仿真和真实环境中的大量评估表明,我们重建的场景实现了高物理保真度,并忠实地复现了真实世界的接触动力学,从而实现了稳定可靠的密集接触操作。