Existing methods for reconstructing objects and humans from a monocular image suffer from severe mesh collisions and performance limitations for interacting occluding objects. This paper introduces a method to obtain a globally consistent 3D reconstruction of interacting objects and people from a single image. Our contributions include: 1) an optimization framework, featuring a collision loss, tailored to handle human-object and human-human interactions, ensuring spatially coherent scene reconstruction; and 2) a novel technique to robustly estimate 6 degrees of freedom (DOF) poses, specifically for heavily occluded objects, exploiting image inpainting. Notably, our proposed method operates effectively on images from real-world scenarios, without necessitating scene or object-level 3D supervision. Extensive qualitative and quantitative evaluation against existing methods demonstrates a significant reduction in collisions in the final reconstructions of scenes with multiple interacting humans and objects and a more coherent scene reconstruction.
翻译:现有基于单目图像重建物体与人的方法在处理交互遮挡物体时存在严重的网格碰撞问题且性能受限。本文提出一种从单幅图像中获取交互物体与人体全局一致三维重建的方法。我们的贡献包括:1) 针对人-物及人-人交互场景设计的优化框架,通过引入碰撞损失函数确保空间一致性的场景重建;2) 针对严重遮挡物体的六自由度姿态鲁棒估计算法,该算法利用图像修复技术实现。值得注意的是,所提方法在真实场景图像上无需场景或物体级别的三维监督即可有效运行。通过大量定性与定量实验对比现有方法,结果表明在多人/物交互场景的重建中,本方法能显著降低最终重建结果的碰撞率,并获得更具一致性的场景重建。