JoIN: Joint GANs Inversion for Intrinsic Image Decomposition

Intrinsic Image Decomposition (IID) is a challenging inverse problem that seeks to decompose a natural image into its underlying intrinsic components such as albedo and shading. While recent image decomposition methods rely on learning-based priors on these components, they often suffer from component cross-contamination owing to joint training of priors; or from Sim-to-Real gap since the priors trained on synthetic data are kept frozen during the inference on real images. In this work, we propose to solve the intrinsic image decomposition problem using a bank of Generative Adversarial Networks (GANs) as priors where each GAN is independently trained only on a single intrinsic component, providing stronger and more disentangled priors. At the core of our approach is the idea that the latent space of a GAN is a well-suited optimization domain to solve inverse problems. Given an input image, we propose to jointly invert the latent codes of a set of GANs and combine their outputs to reproduce the input. Contrary to all existing GAN inversion methods that are limited to inverting only a single GAN, our proposed approach, JoIN, is able to jointly invert multiple GANs using only a single image as supervision while still maintaining distribution priors of each intrinsic component. We show that our approach is modular, allowing various forward imaging models, and that it can successfully decompose both synthetic and real images. Further, taking inspiration from existing GAN inversion approaches, we allow for careful fine-tuning of the generator priors during the inference on real images. This way, our method is able to achieve excellent generalization on real images even though it uses only synthetic data to train the GAN priors. We demonstrate the success of our approach through exhaustive qualitative and quantitative evaluations and ablation studies on various datasets.

翻译：本征图像分解（IID）是一个具有挑战性的逆问题，其目标是将自然图像分解为底层本征分量，如反照率和着色。尽管近期的图像分解方法依赖于对这些分量基于学习的先验，但它们常因先验的联合训练而遭受分量交叉污染；或由于在合成数据上训练的先验在真实图像推理时保持冻结，从而面临仿真到现实的差距。在本工作中，我们提出使用一组生成对抗网络（GANs）作为先验来解决本征图像分解问题，其中每个GAN仅独立地在单一本征分量上训练，从而提供更强且更解耦的先验。我们方法的核心思想是：GAN的潜在空间是解决逆问题的合适优化域。给定输入图像，我们提出联合反演一组GAN的潜在编码，并组合它们的输出来重建输入。与所有现有仅限于反演单个GAN的GAN反演方法不同，我们提出的方法JoIN能够仅使用单张图像作为监督，联合反演多个GAN，同时仍保持每个本征分量的分布先验。我们展示了我们的方法是模块化的，允许各种前向成像模型，并且能够成功分解合成图像和真实图像。此外，受现有GAN反演方法的启发，我们允许在真实图像推理期间对生成器先验进行精细微调。通过这种方式，即使我们的方法仅使用合成数据训练GAN先验，也能在真实图像上实现出色的泛化性能。我们通过对多个数据集进行详尽的定性与定量评估及消融研究，证明了我们方法的成功。