Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods struggle in cluttered environments, often exhibiting prohibitive computational cost, poor robustness, and restricted generality when scaling to multiple interacting objects. We propose a unified optimization-based formulation for real-to-sim scene estimation that jointly recovers the shapes and poses of multiple rigid objects under physical constraints. Our method is built on two key technical innovations. First, we leverage the recently introduced shape-differentiable contact model, whose global differentiability permits joint optimization over object geometry and pose while modeling inter-object contacts. Second, we exploit the structured sparsity of the augmented Lagrangian Hessian to derive an efficient linear system solver whose computational cost scales favorably with scene complexity. Building on this formulation, we develop an end-to-end Simulation-ready Physics-Aware Reconstruction for Cluttered Scenes (SPARCS) pipeline, which integrates learning-based object initialization, physics-constrained joint shape-pose optimization, and differentiable texture refinement. Experiments on cluttered scenes with up to 5 objects and 22 convex hulls demonstrate that our approach robustly reconstructs physically valid, simulation-ready object shapes and poses.Project webpage: https://rory-weicheng.github.io/SPARCS/.
翻译:从真实世界观测中估计面向仿真的场景,对于下游规划与策略学习任务至关重要。然而,现有方法在杂乱环境中表现欠佳,当扩展至多个相互作用物体时,往往面临计算代价高昂、鲁棒性差及泛化能力有限等问题。我们提出一种基于优化的统一公式,用于实现真实到仿真的场景估计,该公式可联合恢复多个刚性物体在物理约束下的形状与位姿。我们的方法建立在两项关键技术革新之上。首先,我们利用近期提出的形状可微接触模型,其全局可微性允许在建模物体间接触的同时,联合优化物体几何形状与位姿。其次,我们利用增广拉格朗日海森矩阵的结构化稀疏性,推导出一种高效线性系统求解器,其计算代价随场景复杂度呈有利比例增长。基于此公式,我们开发了面向杂乱场景的端到端仿真就绪物理感知重建(SPARCS)流程,该流程集成了基于学习的物体初始化、物理约束下的形状-位姿联合优化以及可微纹理细化。在包含最多5个物体和22个凸包的杂乱场景上的实验表明,我们的方法可稳健重建物理一致、仿真就绪的物体形状与位姿。项目网页:https://rory-weicheng.github.io/SPARCS/。