Domain Agnostic Image-to-image Translation using Low-Resolution Conditioning

from arxiv, 19 pages, 23 figures. arXiv admin note: substantial text overlap with arXiv:2107.11262. Under consideration in Computer Vision and Image Understanding

Generally, image-to-image translation (i2i) methods aim at learning mappings across domains with the assumption that the images used for translation share content (e.g., pose) but have their own domain-specific information (a.k.a. style). Conditioned on a target image, such methods extract the target style and combine it with the source image content, keeping coherence between the domains. In our proposal, we depart from this traditional view and instead consider the scenario where the target domain is represented by a very low-resolution (LR) image, proposing a domain-agnostic i2i method for fine-grained problems, where the domains are related. More specifically, our domain-agnostic approach aims at generating an image that combines visual features from the source image with low-frequency information (e.g. pose, color) of the LR target image. To do so, we present a novel approach that relies on training the generative model to produce images that both share distinctive information of the associated source image and correctly match the LR target image when downscaled. We validate our method on the CelebA-HQ and AFHQ datasets by demonstrating improvements in terms of visual quality. Qualitative and quantitative results show that when dealing with intra-domain image translation, our method generates realistic samples compared to state-of-the-art methods such as StarGAN v2. Ablation studies also reveal that our method is robust to changes in color, it can be applied to out-of-distribution images, and it allows for manual control over the final results.

翻译：通常，图像到图像翻译（i2i）方法旨在学习跨领域的映射，其假设用于翻译的图像共享内容（例如姿态），但具有各自领域特定的信息（即风格）。基于目标图像的条件，此类方法提取目标风格并将其与源图像内容结合，保持领域间的一致性。在我们提出的方法中，我们偏离了这一传统观点，转而考虑目标领域由极低分辨率（LR）图像表示的场景，提出了一种针对领域相关的细粒度问题的领域无关i2i方法。具体来说，我们的领域无关方法旨在生成一张图像，该图像将源图像的视觉特征与LR目标图像的低频信息（例如姿态、颜色）相结合。为此，我们提出了一种新颖方法，该方法依赖于训练生成模型，使其生成既能共享关联源图像的独特信息，又能在降采样后正确匹配LR目标图像的图像。我们在CelebA-HQ和AFHQ数据集上验证了该方法，展示了在视觉质量方面的改进。定性和定量结果表明，在处理领域内图像翻译时，与StarGAN v2等最先进方法相比，我们的方法能生成更真实的样本。消融研究还揭示，我们的方法对颜色变化具有鲁棒性，可应用于分布外图像，并允许对最终结果进行手动控制。