Neural Radiance Field (NeRF) has shown impressive performance in novel view synthesis via implicit scene representation. However, it usually suffers from poor scalability as requiring densely sampled images for each new scene. Several studies have attempted to mitigate this problem by integrating Multi-View Stereo (MVS) technique into NeRF while they still entail a cumbersome fine-tuning process for new scenes. Notably, the rendering quality will drop severely without this fine-tuning process and the errors mainly appear around the high-frequency features. In the light of this observation, we design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF to achieve generalizable yet high-quality synthesis without any per-scene optimization. To preserve high-frequency information when generating 3D feature volumes, WaveNeRF builds Multi-View Stereo in the Wavelet domain by integrating the discrete wavelet transform into the classical cascade MVS, which disentangles high-frequency information explicitly. With that, disentangled frequency features can be injected into classic NeRF via a novel hybrid neural renderer to yield faithful high-frequency details, and an intuitive frequency-guided sampling strategy can be designed to suppress artifacts around high-frequency regions. Extensive experiments over three widely studied benchmarks show that WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.
翻译:神经辐射场(NeRF)通过隐式场景表示在新视角合成中展现了卓越性能。然而,它通常因需要为每个新场景采集密集采样图像而面临可扩展性差的问题。若干研究尝试通过将多视图立体(MVS)技术集成到NeRF中缓解此问题,但仍需对新场景进行繁琐的微调。值得注意的是,若无此微调过程,渲染质量将严重下降,且误差主要出现在高频特征附近。基于此观察,我们设计了WaveNeRF,其将小波频率分解集成到MVS与NeRF中,无需任何逐场景优化即可实现泛化且高质量的合成。为在生成三维特征体时保留高频信息,WaveNeRF在小波域中构建多视图立体——通过将离散小波变换融入经典级联MVS,显式分离高频信息。由此,分离的频率特征可通过新型混合神经渲染器注入经典NeRF,以生成忠实的高频细节;同时可设计直观的频率引导采样策略,抑制高频区域附近的伪影。在三个广泛使用的基准上的大量实验表明,当仅输入三张图像时,WaveNeRF实现了卓越的泛化辐射场建模。