We propose a differentiable rendering algorithm for efficient novel view synthesis. By departing from volume-based representations in favor of a learned point representation, we improve on existing methods more than an order of magnitude in memory and runtime, both in training and inference. The method begins with a uniformly-sampled random point cloud and learns per-point position and view-dependent appearance, using a differentiable splat-based renderer to evolve the model to match a set of input images. Our method is up to 300x faster than NeRF in both training and inference, with only a marginal sacrifice in quality, while using less than 10~MB of memory for a static scene. For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at near interactive rate, while maintaining high image quality and temporal coherence even without imposing any temporal-coherency regularizers.
翻译:我们提出一种可微分渲染算法,用于高效的新视角合成。通过摒弃基于体素的表示方法,转而采用学习的点表示,我们的方法在训练和推理的内存与运行时间上比现有方法提升了一个数量级以上。该方法从均匀采样的随机点云开始,学习每个点的位置和视角相关外观,并利用可微分的面片渲染器演化模型以匹配一组输入图像。在静态场景中,我们的方法训练和推理速度比NeRF快高达300倍,质量仅有轻微损失,同时静态场景内存占用不足10MB。对于动态场景,我们的方法训练速度比STNeRF快两个数量级,可实现接近交互式的渲染速率,且即使不施加任何时间一致性正则化项,仍能保持高图像质量与时间连续性。