This paper presents $\textbf{R}$epresentation-$\textbf{C}$onditioned image $\textbf{G}$eneration (RCG), a simple yet effective image generation framework which sets a new benchmark in class-unconditional image generation. RCG does not condition on any human annotations. Instead, it conditions on a self-supervised representation distribution which is mapped from the image distribution using a pre-trained encoder. During generation, RCG samples from such representation distribution using a representation diffusion model (RDM), and employs a pixel generator to craft image pixels conditioned on the sampled representation. Such a design provides substantial guidance during the generative process, resulting in high-quality image generation. Tested on ImageNet 256$\times$256, RCG achieves a Frechet Inception Distance (FID) of 3.31 and an Inception Score (IS) of 253.4. These results not only significantly improve the state-of-the-art of class-unconditional image generation but also rival the current leading methods in class-conditional image generation, bridging the long-standing performance gap between these two tasks. Code is available at https://github.com/LTH14/rcg.
翻译:本文提出基于表征的条件图像生成(RCG),一种简单而有效的图像生成框架,在无类别条件图像生成领域树立了新标杆。RCG不依赖任何人工标注,而是利用预训练编码器从图像分布映射得到的自监督表征分布作为条件。在生成过程中,RCG通过表征扩散模型(RDM)从该表征分布中采样,并采用像素生成器基于采样表征生成图像像素。这种设计为生成过程提供了充分引导,从而生成高质量图像。在ImageNet 256×256数据集上,RCG实现了3.31的弗雷歇初始距离(FID)和253.4的初始分数(IS)。这些结果不仅显著提升了无类别条件图像生成的最新水平,而且与当前领先的有类别条件图像生成方法不相上下,弥合了这两类任务长期存在的性能差距。代码已开源至https://github.com/LTH14/rcg。