Recently, Neural Radiance Field (NeRF) has shown great success in rendering novel-view images of a given scene by learning an implicit representation with only posed RGB images. NeRF and relevant neural field methods (e.g., neural surface representation) typically optimize a point-wise loss and make point-wise predictions, where one data point corresponds to one pixel. Unfortunately, this line of research failed to use the collective supervision of distant pixels, although it is known that pixels in an image or scene can provide rich structural information. To the best of our knowledge, we are the first to design a nonlocal multiplex training paradigm for NeRF and relevant neural field methods via a novel Stochastic Structural SIMilarity (S3IM) loss that processes multiple data points as a whole set instead of process multiple inputs independently. Our extensive experiments demonstrate the unreasonable effectiveness of S3IM in improving NeRF and neural surface representation for nearly free. The improvements of quality metrics can be particularly significant for those relatively difficult tasks: e.g., the test MSE loss unexpectedly drops by more than 90% for TensoRF and DVGO over eight novel view synthesis tasks; a 198% F-score gain and a 64% Chamfer $L_{1}$ distance reduction for NeuS over eight surface reconstruction tasks. Moreover, S3IM is consistently robust even with sparse inputs, corrupted images, and dynamic scenes.
翻译:最近,神经辐射场(NeRF)通过仅使用带姿态的RGB图像学习隐式表示,在渲染给定场景的新视角图像方面取得了巨大成功。NeRF及相关神经场方法(如神经表面表示)通常优化逐点损失并进行逐点预测,其中每个数据点对应一个像素。遗憾的是,尽管图像或场景中的像素能提供丰富的结构信息,但这一研究方向未能利用远距离像素的集体监督。据我们所知,我们是首个通过新颖的随机结构相似性(S3IM)损失为NeRF及相关神经场方法设计非局部多重训练范式的,该损失将多个数据点作为整体集合处理,而非独立处理多个输入。我们的广泛实验表明,S3IM在几乎零成本下提升NeRF和神经表面表示方面具有不合理有效性。对于相对困难的任务,质量指标的提升尤为显著:例如,在八个新视角合成任务中,TensoRF和DVGO的测试MSE损失意外下降超过90%;在八个表面重建任务中,NeuS的F-score提升198%,Chamfer $L_{1}$距离减少64%。此外,即使面对稀疏输入、损坏图像和动态场景,S3IM也始终保持稳健。