In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.
翻译:在存在遮挡和测量噪声的情况下,几何精确的场景重建——即使与传感器数据吻合——仍可能在物理上不正确。例如,当估计场景中物体的位姿与形状并将结果估计值导入仿真器时,微小误差可能导致物体相互穿透或处于不稳定平衡等不符合物理规律的构型。这使得利用数字孪生预测场景的动态行为变得困难,而该预测是基于仿真的接触密集型行为规划与控制的重要环节。本文主张,物体位姿与形状估计需要对场景进行整体推理(而非孤立地推理每个物体),同时考虑物体间的相互作用与物理合理性。为此,我们的首要贡献是提出Picasso:一种物理约束的重建流程,通过综合考量几何、非穿透性与物理规律来构建多物体场景重建。Picasso依赖一种快速拒绝采样方法,该方法基于推断的物体接触图引导采样,从而实现对多物体相互作用的推理。其次,我们提出Picasso数据集——包含10个具有真实标注的接触密集型真实场景的集合,以及用于量化物理合理性的评估指标,该数据集已作为我们基准测试的一部分开源。最后,我们在新引入的数据集及YCB-V数据集上对Picasso进行了全面评估,结果表明其性能显著优于现有技术,同时提供既符合物理规律又更贴合人类直觉的重建结果。