We propose a 3D latent representation that jointly models object geometry and view-dependent appearance. Most prior works focus on either reconstructing 3D geometry or predicting view-independent diffuse appearance, and thus struggle to capture realistic view-dependent effects. Our approach leverages that RGB-depth images provide samples of a surface light field. By encoding random subsamples of this surface light field into a compact set of latent vectors, our model learns to represent both geometry and appearance within a unified 3D latent space. This representation reproduces view-dependent effects such as specular highlights and Fresnel reflections under complex lighting. We further train a latent flow matching model on this representation to learn its distribution conditioned on a single input image, enabling the generation of 3D objects with appearances consistent with the lighting and materials in the input. Experiments show that our approach achieves higher visual quality and better input fidelity than existing methods.
翻译:我们提出了一种联合建模物体几何与视角相关外观的三维潜在表示。现有方法大多专注于三维几何重建或视角无关的漫反射外观预测,因而难以捕捉真实的视角相关效果。我们的方法利用RGB深度图像提供表面光场采样点的特性,通过将表面光场的随机子样本编码为紧凑的潜在向量集合,使模型能够在统一的三维潜在空间中同时表示几何与外观。该表示能够复现复杂光照下的镜面高光与菲涅尔反射等视角相关效应。我们进一步在该表示上训练潜在流匹配模型,以学习基于单张输入图像的条件分布,从而生成与输入图像中光照和材质外观一致的三维物体。实验表明,相较于现有方法,我们的方案在视觉质量与输入保真度方面均表现出更优性能。