Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations. Code and video results are available at https://www.tmonnier.com/DBW .
翻译:我们提出了一种方法,利用场景的一组已标定图像,通过三维基元产生一种简单、紧凑且可操作的三维世界表示。尽管许多方法专注于恢复高保真度的三维场景,我们则侧重于将场景解析为由少量纹理基元组成的中层三维表示。此类表示具有可解释性、易于操作,并适用于基于物理的模拟。此外,与依赖三维输入数据的现有基元分解方法不同,我们的方法通过可微分渲染直接对图像进行操作。具体而言,我们将基元建模为纹理化的超二次曲面网格,并通过图像渲染损失从头优化其参数。我们强调了为每个基元建模透明性的重要性,这对优化至关重要,并能够处理不同数量的基元。我们展示了最终的纹理基元能够忠实重建输入图像,并精确建模可见三维点,同时为未见的物体区域提供模态形状补全。我们将该方法与DTU数据集上多种场景的最新方法进行了比较,并在BlendedMVS和Nerfstudio的真实场景捕获中验证了其鲁棒性。我们还展示了如何利用我们的结果轻松编辑场景或进行物理模拟。代码和视频结果可见于https://www.tmonnier.com/DBW。