Monocular 3D scene reconstruction has recently seen significant progress. Powered by the modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robotics and animation require physically-plausible scene reconstruction where objects obey physical principles of non-penetration and realistic contacts. In this work we advance object-level scene reconstruction along two directions. First, we introduceMessyKitchens, a new dataset with real-world scenes featuring cluttered environments and providing high-fidelity object-level ground truth in terms of 3D object shapes, poses and accurate object contacts. Second, we build on the recent SAM 3D approach for single-object reconstruction and extend it with Multi-Object Decoder (MOD) for joint object-level scene reconstruction. To validate our contributions, we demonstrate MessyKitchens to significantly improve previous datasets in registration accuracy and inter-object penetration. We also compare our multi-object reconstruction approach on three datasets and demonstrate consistent and significant improvements of MOD over the state of the art. Our new benchmark, code and pre-trained models will become publicly available on our project website: https://messykitchens.github.io/.
翻译:单目三维场景重建近期取得显著进展。借助现代神经架构和大规模数据,现有方法在单图像深度估计方面实现了高性能。然而,由于物体种类繁多、遮挡频繁以及物体关系复杂,将常见场景重建并分解为独立三维物体仍面临严峻挑战。值得注意的是,除了单个物体的形状与姿态估计,机器人学和动画应用需要符合物理规律的三维场景重建,即物体遵循非穿透与真实接触的物理原理。本研究从两个方向推进物体级场景重建:首先,我们提出MessyKitchens数据集,该数据集包含真实世界的杂乱环境场景,并提供三维物体形状、姿态及精确物体接触的高保真物体级真值标注。其次,我们在近期单物体重建方法SAM 3D的基础上,引入多物体解码器(MOD)以实现联合物体级场景重建。为验证贡献,我们证明MessyKitchens在配准精度和物体间穿透误差方面显著优于现有数据集。同时,我们在三个数据集上对比多物体重建方法,证明MOD相较现有技术取得持续且显著的改进。新的基准数据集、代码与预训练模型将在项目网站公开:https://messykitchens.github.io/。