This paper studies a novel energy-based cooperative learning framework for multi-domain image-to-image translation. The framework consists of four components: descriptor, translator, style encoder, and style generator. The descriptor is a multi-head energy-based model that represents a multi-domain image distribution. The components of translator, style encoder, and style generator constitute a diversified image generator. Specifically, given an input image from a source domain, the translator turns it into a stylised output image of the target domain according to a style code, which can be inferred by the style encoder from a reference image or produced by the style generator from a random noise. Since the style generator is represented as an domain-specific distribution of style codes, the translator can provide a one-to-many transformation (i.e., diversified generation) between source domain and target domain. To train our framework, we propose a likelihood-based multi-domain cooperative learning algorithm to jointly train the multi-domain descriptor and the diversified image generator (including translator, style encoder, and style generator modules) via multi-domain MCMC teaching, in which the descriptor guides the diversified image generator to shift its probability density toward the data distribution, while the diversified image generator uses its randomly translated images to initialize the descriptor's Langevin dynamics process for efficient sampling.
翻译:本文研究了一种新颖的基于能量的协同学习框架,用于多域图像到图像翻译。该框架由四个组件构成:描述器、翻译器、风格编码器和风格生成器。描述器是一个多头能量模型,用于表示多域图像分布;翻译器、风格编码器和风格生成器组件共同构成多样化图像生成器。具体而言,给定源域的输入图像,翻译器根据风格编码将其转换为目标域的风格化输出图像——该编码可由风格编码器从参考图像中推断,也可由风格生成器从随机噪声中生成。由于风格生成器表示为风格编码的域特定分布,翻译器能在源域和目标域之间实现一对多变换(即多样化生成)。为训练该框架,我们提出基于似然的多域协同学习算法,通过多域MCMC教学联合训练多域描述器和多样化图像生成器(包括翻译器、风格编码器和风格生成器模块)。在该过程中,描述器引导多样化图像生成器将其概率密度向数据分布偏移,而多样化图像生成器则利用其随机生成的翻译图像初始化描述器的朗之万动力学过程,以实现高效采样。