This paper studies a novel energy-based cooperative learning framework for multi-domain image-to-image translation. The framework consists of four components: descriptor, translator, style encoder, and style generator. The descriptor is a multi-head energy-based model that represents a multi-domain image distribution. The components of translator, style encoder, and style generator constitute a diversified image generator. Specifically, given an input image from a source domain, the translator turns it into a stylised output image of the target domain according to a style code, which can be inferred by the style encoder from a reference image or produced by the style generator from a random noise. Since the style generator is represented as an domain-specific distribution of style codes, the translator can provide a one-to-many transformation (i.e., diversified generation) between source domain and target domain. To train our framework, we propose a likelihood-based multi-domain cooperative learning algorithm to jointly train the multi-domain descriptor and the diversified image generator (including translator, style encoder, and style generator modules) via multi-domain MCMC teaching, in which the descriptor guides the diversified image generator to shift its probability density toward the data distribution, while the diversified image generator uses its randomly translated images to initialize the descriptor's Langevin dynamics process for efficient sampling.
翻译:本文研究了一种新颖的基于能量的协同学习框架,用于实现多域图像到图像翻译。该框架包含四个组成部分:描述器、翻译器、风格编码器和风格生成器。描述器是一个多头基于能量的模型,用于表征多域图像分布。翻译器、风格编码器和风格生成器共同构成多样化图像生成器。具体而言,给定来自源域的输入图像,翻译器根据风格编码将其转换为目标域的具有风格化特征的输出图像;该风格编码可由风格编码器从参考图像推断得出,或由风格生成器从随机噪声中生成。由于风格生成器被建模为域特定风格编码分布,翻译器能够在源域与目标域之间实现多对一的变换(即多样化生成)。为训练该框架,我们提出了一种基于似然的多域协同学习算法,通过多域MCMC教学联合训练多域描述器与多样化图像生成器(包括翻译器、风格编码器和风格生成器模块)。在该训练过程中,描述器引导多样化图像生成器将其概率密度向数据分布方向偏移,而多样化图像生成器则利用其随机翻译的图像初始化描述器的Langevin动力学过程,从而实现高效采样。