In this paper, we study the problem of 3D scene geometry decomposition and manipulation from 2D views. By leveraging the recent implicit neural representation techniques, particularly the appealing neural radiance fields, we introduce an object field component to learn unique codes for all individual objects in 3D space only from 2D supervision. The key to this component is a series of carefully designed loss functions to enable every 3D point, especially in non-occupied space, to be effectively optimized even without 3D labels. In addition, we introduce an inverse query algorithm to freely manipulate any specified 3D object shape in the learned scene representation. Notably, our manipulation algorithm can explicitly tackle key issues such as object collisions and visual occlusions. Our method, called DM-NeRF, is among the first to simultaneously reconstruct, decompose, manipulate and render complex 3D scenes in a single pipeline. Extensive experiments on three datasets clearly show that our method can accurately decompose all 3D objects from 2D views, allowing any interested object to be freely manipulated in 3D space such as translation, rotation, size adjustment, and deformation.
翻译:本文研究从二维视角进行三维场景几何分解与操控的问题。通过利用近期隐式神经表示技术,特别是神经辐射场,我们引入了一个对象场组件,仅在二维监督下学习三维空间中所有独立物体的唯一编码。该组件的关键是一系列精心设计的损失函数,使得每个三维点(尤其是非占据空间中的点)即便没有三维标签也能得到有效优化。此外,我们引入了一种逆查询算法,可在学习到的场景表示中自由操控任意指定的三维物体形状。值得注意的是,我们的操控算法能够显式解决物体碰撞和视觉遮挡等关键问题。所提出的方法(称为DM-NeRF)是首批在单一流程中同时实现复杂三维场景重建、分解、操控与渲染的方法之一。在三个数据集上的广泛实验表明,我们的方法能够从二维视角准确分解所有三维物体,并允许在三维空间中对任意目标物体进行自由操控(如平移、旋转、尺寸调整及形变)。