While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. We experimentally confirm that Latent NeRFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to NeRFs trained in the image space. Our project page can be found at https://ig-ae.github.io .
翻译:尽管预训练图像自编码器在计算机视觉中的应用日益广泛,但二维潜在空间中的逆向图形技术仍处于探索不足的状态。然而,在潜在空间中应用逆向图形不仅能降低训练与渲染复杂度,还能实现与其他基于潜在空间的二维方法的重要互操作性。主要挑战在于,由于此类图像潜在空间缺乏底层三维几何结构,无法直接对其应用逆向图形技术。本文提出一种专门解决此问题的逆向图形自编码器。为此,我们通过将自编码器的潜在空间与联合训练的潜在三维场景对齐,实现具有三维几何正则化的图像自编码器。我们利用训练完成的逆向图形自编码器,通过潜在NeRF训练流程将NeRF引入潜在空间,并在Nerfstudio框架的开源扩展中实现了该流程,从而为其支持的方法解锁了潜在场景学习能力。实验证实,相较于标准自编码器,采用逆向图形自编码器训练的潜在NeRF在保持图像空间NeRF训练与渲染加速优势的同时,呈现出更优的质量表现。项目页面详见https://ig-ae.github.io。