Although many recent works have investigated generalizable NeRF-based novel view synthesis for unseen scenes, they seldom consider the synthetic-to-real generalization, which is desired in many practical applications. In this work, we first investigate the effects of synthetic data in synthetic-to-real novel view synthesis and surprisingly observe that models trained with synthetic data tend to produce sharper but less accurate volume densities. For pixels where the volume densities are correct, fine-grained details will be obtained. Otherwise, severe artifacts will be produced. To maintain the advantages of using synthetic data while avoiding its negative effects, we propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints. Meanwhile, we adopt cross-view attention to further enhance the geometry perception of features by querying features across input views. Experiments demonstrate that under the synthetic-to-real setting, our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS. When trained on real data, our method also achieves state-of-the-art results.
翻译:尽管近期许多工作研究了针对未见场景的可泛化NeRF新视角合成,但鲜有涉及合成到真实的泛化能力,而这在众多实际应用中至关重要。本研究首先探究了合成数据在合成到真实新视角合成中的影响,并惊讶地发现:使用合成数据训练的模型往往生成更锐利但精度较低的体密度。对于体积密度正确的像素,可获得细粒度细节;反之则会产生严重伪影。为保持合成数据优势的同时规避其负面影响,我们提出引入几何感知对比学习,通过几何约束学习多视角一致特征。同时采用跨视角注意力机制,通过跨输入视图的特征查询进一步增强几何感知能力。实验表明,在合成到真实场景下,本方法可合成更高质量且包含更优细节的图像,在PSNR、SSIM和LPIPS指标上超越现有可泛化新视角合成方法。当使用真实数据训练时,本方法同样达到了最优性能。