Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We show how such generative concepts can accurately represent the content of images, be recombined and composed to generate new artistic and hybrid images, and be further used as a representation for downstream classification tasks.
翻译:文本到图像生成模型已在不同领域实现了高分辨率图像合成,但需要用户指定其希望生成的内容。本文探讨了逆问题——给定一组不同的图像,我们能否发现代表每幅图像的生成概念?我们提出了一种无监督方法,从一组图像中发现生成概念,包括解构绘画中的不同艺术风格、厨房场景中的物体与光照,以及从ImageNet图像中识别图像类别。我们展示了这些生成概念如何准确表达图像内容,如何通过重组与组合生成新颖的艺术与混合图像,并进一步作为下游分类任务的表示。