This paper presents $\textbf{R}$epresentation-$\textbf{C}$onditioned image $\textbf{G}$eneration (RCG), a simple yet effective image generation framework which sets a new benchmark in class-unconditional image generation. RCG does not condition on any human annotations. Instead, it conditions on a self-supervised representation distribution which is mapped from the image distribution using a pre-trained encoder. During generation, RCG samples from such representation distribution using a representation diffusion model (RDM), and employs a pixel generator to craft image pixels conditioned on the sampled representation. Such a design provides substantial guidance during the generative process, resulting in high-quality image generation. Tested on ImageNet 256$\times$256, RCG achieves a Frechet Inception Distance (FID) of 3.31 and an Inception Score (IS) of 253.4. These results not only significantly improve the state-of-the-art of class-unconditional image generation but also rival the current leading methods in class-conditional image generation, bridging the long-standing performance gap between these two tasks. Code is available at https://github.com/LTH14/rcg.
翻译:本文提出了表征条件图像生成(RCG),一种简单而有效的图像生成框架,在无类别条件图像生成任务中树立了新的标杆。RCG不依赖于任何人工标注,而是利用预训练编码器从图像分布映射得到的自监督表征分布进行条件化。在生成过程中,RCG通过表征扩散模型(RDM)从该表征分布中采样,并采用像素生成器以采样得到的表征为条件来生成图像像素。这种设计为生成过程提供了实质性引导,从而生成高质量图像。在ImageNet 256×256数据集上测试,RCG实现了3.31的弗雷歇初始距离(FID)和253.4的初始分数(IS)。这些结果不仅显著提升了无类别条件图像生成的最优性能,更与当前领先的类别条件图像生成方法相抗衡,弥合了这两个长期存在的性能差距。代码已在https://github.com/LTH14/rcg 开源。