Point clouds offer an attractive source of information to complement images in neural scene representations, especially when few images are available. Neural rendering methods based on point clouds do exist, but they do not perform well when the point cloud quality is low -- e.g., sparse or incomplete, which is often the case with real-world data. We overcome these problems with a simple representation that aggregates point clouds at multiple scale levels with sparse voxel grids at different resolutions. To deal with point cloud sparsity, we average across multiple scale levels -- but only among those that are valid, i.e., that have enough neighboring points in proximity to the ray of a pixel. To help model areas without points, we add a global voxel at the coarsest scale, thus unifying ``classical'' and point-based NeRF formulations. We validate our method on the NeRF Synthetic, ScanNet, and KITTI-360 datasets, outperforming the state of the art, with a significant gap compared to other NeRF-based methods, especially on more challenging scenes.
翻译:点云为神经场景表示中补充图像信息提供了有吸引力的数据来源,尤其在图像数量有限时。基于点云的神经渲染方法确实存在,但当点云质量较低(例如稀疏或不完整,这在真实世界数据中常见)时,这些方法表现不佳。我们通过一种简单的表示方法克服了这些问题,该方法以多尺度级别聚合点云,并采用不同分辨率下的稀疏体素网格。为应对点云稀疏性,我们在多个尺度级别上进行平均,但仅限于有效的那些——即沿像素光线邻近区域内具有足够相邻点的级别。为帮助建模无点区域,我们在最粗糙尺度上添加一个全局体素,从而统一了“经典”NeRF公式与基于点的NeRF公式。我们在NeRF合成数据集、ScanNet和KITTI-360数据集上验证了该方法,其性能显著优于现有技术,尤其是在更具挑战性的场景中,与其他基于NeRF的方法相比具有明显优势。