Unconditional generation -- the problem of modeling data distribution without relying on human-annotated labels -- is a long-standing and fundamental challenge in generative models, creating a potential of learning from large-scale unlabeled data. In the literature, the generation quality of an unconditional method has been much worse than that of its conditional counterpart. This gap can be attributed to the lack of semantic information provided by labels. In this work, we show that one can close this gap by generating semantic representations in the representation space produced by a self-supervised encoder. These representations can be used to condition the image generator. This framework, called Representation-Conditioned Generation (RCG), provides an effective solution to the unconditional generation problem without using labels. Through comprehensive experiments, we observe that RCG significantly improves unconditional generation quality: e.g., it achieves a new state-of-the-art FID of 2.15 on ImageNet 256x256, largely reducing the previous best of 5.91 by a relative 64%. Our unconditional results are situated in the same tier as the leading class-conditional ones. We hope these encouraging observations will attract the community's attention to the fundamental problem of unconditional generation. Code is available at https://github.com/LTH14/rcg.
翻译:无条件生成——即在不依赖人工标注标签的情况下对数据分布进行建模的问题——是生成模型中一个长期存在的基础性挑战,它开辟了从大规模无标注数据中学习的潜力。在现有文献中,无条件生成方法的生成质量远逊于其有条件对应方法。这种差距可归因于缺乏标签所提供的语义信息。在本工作中,我们证明,通过在自监督编码器产生的表示空间中生成语义表示,可以弥合这一差距。这些表示可用于为图像生成器提供条件。这一框架被称为表示条件生成(RCG),为无条件生成问题提供了一种无需使用标签的有效解决方案。通过全面的实验,我们观察到RCG显著提升了无条件生成的质量:例如,其在ImageNet 256×256数据集上实现了2.15的最新最优FID分数,将先前最佳记录5.91相对降低了64%。我们的无条件生成结果已与领先的类别条件生成结果处于同一水平。我们希望这些令人鼓舞的发现能够吸引学界对无条件生成这一基础问题的关注。代码可在 https://github.com/LTH14/rcg 获取。