We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inherently ambiguous radiometric disentanglement is to leverage the fact that while their texture and reflectance may differ, objects in the same scene are all lit by the same illumination. MultiGP exploits this consensus to produce samples of reflectance, texture, and illumination from a single image of known shapes based on four key technical contributions: a cascaded end-to-end architecture that combines image-space and angular-space disentanglement; Coordinated Guidance for diffusion convergence to a single consistent illumination estimate; Axial Attention applied to facilitate ``cross-talk'' between objects of different reflectance; and a Texture Extraction ControlNet to preserve high-frequency texture details while ensuring decoupling from estimated lighting. Experimental results demonstrate that MultiGP effectively leverages the complementary spatial and frequency characteristics of multiple object appearances to recover individual texture and reflectance as well as the common illumination.
翻译:我们提出了多物体生成式感知(MultiGP),这是一种生成式逆向渲染方法,用于从单张图像中对构成物体外观的所有辐射成分(反射率、纹理和光照)进行随机采样。为解决这一本质上模糊的辐射解耦问题,我们的核心思想是利用一个事实:场景中的物体尽管纹理和反射率可能不同,但所有物体都受相同光照照射。MultiGP利用这一共识,基于四项关键技术贡献,从已知形状的单张图像中生成反射率、纹理和光照的样本:一种结合图像空间与角度空间解耦的级联端到端架构;使扩散收敛到单一一致光照估计的协调引导;应用于促进不同反射率物体之间“交叉通信”的轴向注意力;以及一个保持高频纹理细节同时确保与估计光照解耦的纹理提取控制网络。实验结果表明,MultiGP有效利用多个物体外观的互补空间与频率特性,以恢复各个物体的纹理和反射率以及共同光照。