Existing learning-based methods for point cloud rendering adopt various 3D representations and feature querying mechanisms to alleviate the sparsity problem of point clouds. However, artifacts still appear in rendered images, due to the challenges in extracting continuous and discriminative 3D features from point clouds. In this paper, we present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds. Our TriVol consists of triple slim volumes, each of which is encoded from the point cloud. TriVol has two advantages. First, it fuses respective fields at different scales and thus extracts local and non-local features for discriminative representation. Second, since the volume size is greatly reduced, our 3D decoder can be efficiently inferred, allowing us to increase the resolution of the 3D space to render more point details. Extensive experiments on different benchmarks with varying kinds of scenes/objects demonstrate our framework's effectiveness compared with current approaches. Moreover, our framework has excellent generalization ability to render a category of scenes/objects without fine-tuning.
翻译:现有基于学习的点云渲染方法采用多种三维表示与特征查询机制以缓解点云的稀疏性问题。然而,由于从点云中提取连续且具有判别性的三维特征存在挑战,渲染图像中仍会出现伪影。本文提出一种名为TriVol的密集轻量级三维表示方法,可与NeRF结合实现从点云生成逼真图像。我们的TriVol由三个轻薄体素组成,每个体素均由点云编码得到。TriVol具有两大优势:首先,它融合不同尺度的对应场,从而提取局部与非局部特征以形成判别性表示;其次,由于体素尺寸大幅缩减,三维解码器可高效推理,从而提升三维空间分辨率以渲染更精细的点云细节。在涵盖多种场景/物体的不同基准上的大量实验表明,与现有方法相比,我们的框架具有更优性能。此外,该框架具备卓越的泛化能力,无需微调即可渲染同一类别的场景/物体。