Most Neural Radiance Fields (NeRFs) have poor generalization ability, limiting their application when representing multiple scenes by a single model. To ameliorate this problem, existing methods simply condition NeRF models on image features, lacking the global understanding and modeling of the entire 3D scene. Inspired by the significant success of mask-based modeling in other research fields, we propose a masked ray and view modeling method for generalizable NeRF (MRVM-NeRF), the first attempt to incorporate mask-based pretraining into 3D implicit representations. Specifically, considering that the core of NeRFs lies in modeling 3D representations along the rays and across the views, we randomly mask a proportion of sampled points along the ray at fine stage by discarding partial information obtained from multi-viewpoints, targeting at predicting the corresponding features produced in the coarse branch. In this way, the learned prior knowledge of 3D scenes during pretraining helps the model generalize better to novel scenarios after finetuning. Extensive experiments demonstrate the superiority of our proposed MRVM-NeRF under various synthetic and real-world settings, both qualitatively and quantitatively. Our empirical studies reveal the effectiveness of our proposed innovative MRVM which is specifically designed for NeRF models.
翻译:大多数神经辐射场(NeRF)的泛化能力较弱,限制了其通过单一模型表征多个场景的应用。为缓解该问题,现有方法简单地将NeRF模型与图像特征进行条件化建模,缺乏对完整三维场景的全局理解和建模能力。受掩码建模在其他研究领域取得显著成功的启发,我们提出了一种适用于可泛化NeRF的掩码射线与视图建模方法(MRVM-NeRF),这是首次将基于掩码的预训练引入三维隐式表征。具体而言,考虑到NeRF的核心在于沿射线和跨视图建模三维表征,我们在精细阶段沿射线随机遮盖一定比例的采样点,通过丢弃从多视点获取的部分信息,旨在预测粗分支中生成的特征。通过这种方式,预训练过程中学习到的三维场景先验知识有助于模型在微调后更好地泛化到新场景。大量实验表明,我们提出的MRVM-NeRF在多种合成和真实场景设置下,无论在定性还是定量评估上均具有优越性。实证研究揭示了专为NeRF模型设计的创新性MRVM方法的有效性。