We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.
翻译:我们提出TR0N,一个高度通用的框架,可将预训练的无条件生成模型(如GAN和VAE)转化为条件模型。条件设置具有高度任意性,仅需预训练的辅助模型。例如,我们展示了如何借助分类器将无条件模型转化为类别条件模型,以及利用CLIP将其转化为文本到图像模型。TR0N学习一个轻量级随机映射,该映射"翻译"条件空间与生成模型潜空间之间的关系,使得生成的潜变量对应于满足所需条件的数据样本。随后通过朗之万动力学进一步优化翻译后的潜变量样本,从而获得更高质量的数据样本。TR0N无需训练数据或微调,即可在MS-COCO上实现10.9的零样本FID,不仅在指标上超越竞争方案,且在采样速度上更优——同时保持更高程度的通用性。我们的代码已开源:https://github.com/layer6ai-labs/tr0n。