While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. We experimentally confirm that Latent NeRFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to NeRFs trained in the image space. Our project page can be found at https://ig-ae.github.io .
翻译:尽管预训练图像自编码器在计算机视觉领域的应用日益广泛,但二维潜在空间中的逆向图形技术尚未得到充分探索。然而,在潜在空间中应用逆向图形不仅能降低训练与渲染的复杂度,还能实现与其他基于潜在空间的二维方法的重要互操作性。主要挑战在于,由于此类图像潜在空间缺乏底层三维几何结构,逆向图形无法直接应用于其中。本文提出一种专门解决此问题的逆向图形自编码器(IG-AE)。为此,我们通过将自编码器的潜在空间与联合训练的潜在三维场景对齐,利用三维几何结构对图像自编码器进行正则化。我们运用训练好的IG-AE,通过潜在NeRF训练流程将NeRF引入潜在空间,并在Nerfstudio框架的开源扩展中实现了该流程,从而为其支持的方法解锁了潜在场景学习功能。实验证实,与标准自编码器相比,采用IG-AE训练的潜在NeRF在保持图像空间NeRF训练与渲染加速优势的同时,展现出更优的质量表现。项目页面详见 https://ig-ae.github.io。