Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.
翻译:从单张图像重建交互式、仿真就绪的三维场景是机器人操作的关键瓶颈。尽管近期基于单张图像的提升方法能够合理恢复各物体的形状,但将其组合后,由于物体间的相互穿透、悬浮或沉陷,场景在物理仿真中会崩溃。现有感知物理的方法严格将其视为后处理布局校正,未能解决潜在的几何误差。为此,我们提出SimuScene——一种将物理融入形状与布局估计的组合式三维重建流水线。不同于仅将物理用于布局清理,我们在生成过程中利用物理引擎作为诊断测量工具。通过对重建物体在重力作用下进行诊断性仿真,我们将穿透与支撑失败转化为定量校正信号,从而驱动重力轴拉伸与非模态形状重采样。这种物理信息反馈机制可缓解重建误差的累积,生成稳定且仿真就绪的组合式三维场景。大量实验表明,该方法在物理稳定性与几何对齐基准上达到最优性能。我们进一步将重建环境部署至人形控制与机械臂操作任务中,凸显了SimuScene的实用价值。