Generative models excel at creating images that closely mimic real scenes, suggesting they inherently encode scene representations. We introduce Intrinsic LoRA (I-LoRA), a general approach that uses Low-Rank Adaptation (LoRA) to discover scene intrinsics such as normals, depth, albedo, and shading from a wide array of generative models. I-LoRA is lightweight, adding minimally to the model's parameters and requiring very small datasets for this knowledge discovery. Our approach, applicable to Diffusion models, GANs, and Autoregressive models alike, generates intrinsics using the same output head as the original images. Through control experiments, we establish a correlation between the generative model's quality and the extracted intrinsics' accuracy. Finally, scene intrinsics obtained by our method with just hundreds to thousands of labeled images, perform on par with those from supervised methods trained on millions of labeled examples.
翻译:生成模型擅长生成与真实场景高度相似的图像,这表明它们内在地编码了场景表示。我们提出了内在LoRA(I-LoRA),这是一种通用方法,它利用低秩适应(LoRA)从广泛的生成模型中发掘场景内在属性,如法线、深度、反照率和着色。I-LoRA是轻量级的,仅向模型参数添加极少增量,并且仅需非常小的数据集即可完成这种知识发现。我们的方法可同等应用于扩散模型、生成对抗网络和自回归模型,并使用与原始图像相同的输出头来生成内在属性。通过控制实验,我们确立了生成模型的质量与提取的内在属性准确性之间的相关性。最后,使用我们的方法仅需数百到数千张标注图像获得的场景内在属性,其性能与基于数百万标注样本训练的有监督方法所得结果相当。