Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.
翻译:神经辐射场(Neural Radiance Fields, NeRF)已成为捕捉场景与物体高质量外观和形状的日益流行的表征方式。然而,由于网络权重空间的高维特性,学习跨场景或物体类别的可泛化NeRF先验一直充满挑战。针对现有工作在泛化性、多视图一致性方面的局限性并提升生成质量,我们提出HyP-NeRF——一种利用超网络学习可泛化类别级NeRF先验的隐式条件化方法。不同于仅用超网络估计NeRF权重,我们同时估计权重与多分辨率哈希编码,从而实现显著的质量提升。为进一步提升效果,我们引入去噪与微调策略:既对超网络生成的NeRF渲染图像进行去噪,又在保持多视图一致性的前提下对模型进行微调。这些改进使HyP-NeRF能作为可泛化先验用于多项下游任务,包括单视图或杂乱场景的NeRF重建以及文本到NeRF生成。我们通过定性对比,并在泛化性、压缩比和检索准确率三项任务上评估HyP-NeRF,展示了其领先水平的结果。