Recent work has shown that neural networks can perform 3D tasks such as Novel View Synthesis (NVS) without explicit 3D reconstruction. Even so, we argue that strong 3D inductive biases are still helpful in the design of such networks. We show this point by introducing LagerNVS, an encoder-decoder neural network for NVS that builds on `3D-aware' latent features. The encoder is initialized from a 3D reconstruction network pre-trained using explicit 3D supervision. This is paired with a lightweight decoder, and trained end-to-end with photometric losses. LagerNVS achieves state-of-the-art deterministic feed-forward Novel View Synthesis (including 31.4 PSNR on Re10k), with and without known cameras, renders in real time, generalizes to in-the-wild data, and can be paired with a diffusion decoder for generative extrapolation.
翻译:近期研究表明,神经网络可在无需显式三维重建的情况下完成新视角合成等三维任务。尽管如此,我们论证强三维归纳偏置对此类网络设计仍具助益。通过提出LagerNVS——一种基于"三维感知"潜在特征的编解码神经网络,我们验证了这一观点。其编码器初始化自利用显式三维监督预训练的三维重建网络,与轻量级解码器协同,并通过光度损失进行端到端训练。LagerNVS在已知/未知相机参数条件下均实现了最先进的确定性前馈式新视角合成(在Re10k数据集上PSNR达31.4),支持实时渲染,可泛化至野外数据,并能与扩散解码器结合实现生成性外推。