Although many recent works have investigated generalizable NeRF-based novel view synthesis for unseen scenes, they seldom consider the synthetic-to-real generalization, which is desired in many practical applications. In this work, we first investigate the effects of synthetic data in synthetic-to-real novel view synthesis and surprisingly observe that models trained with synthetic data tend to produce sharper but less accurate volume densities. For pixels where the volume densities are correct, fine-grained details will be obtained. Otherwise, severe artifacts will be produced. To maintain the advantages of using synthetic data while avoiding its negative effects, we propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints. Meanwhile, we adopt cross-view attention to further enhance the geometry perception of features by querying features across input views. Experiments demonstrate that under the synthetic-to-real setting, our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS. When trained on real data, our method also achieves state-of-the-art results.
翻译:尽管近期许多研究探索了面向未见场景的泛化NeRF新视角合成,但鲜有工作关注合成到真实的泛化问题,而这正是许多实际应用所需。本文首先探究了合成数据在合成到真实新视角合成中的影响,并意外发现:用合成数据训练的模型倾向于生成更锐利但精度较低的体密度。对于体密度正确的像素,能获得精细细节;否则会产生严重伪影。为保留合成数据优势同时避免其负面影响,我们提出引入几何感知对比学习,通过几何约束学习多视图一致特征。同时,采用跨视图注意力机制,通过跨输入视图查询特征进一步增强特征的几何感知。实验表明,在合成到真实场景下,本方法能渲染出更高品质、更优精细细节的图像,在PSNR、SSIM和LPIPS指标上超越现有泛化新视角合成方法。在真实数据上训练时,本方法也达到了最先进水平。