While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts. Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models. To achieve this, we present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects. Specifically, a concept localization approach automatically locates and disentangles salient concepts by leveraging spatial correspondence from diffusion self-attention; and based on the lookup association between a concept and a conceptual token, a concept-wise optimization process learns discriminative tokens that represent each individual concept. Finally, we establish an evaluation protocol tailored for the UCE task. Extensive experiments demonstrate that ConceptExpress is a promising solution to the UCE task. Our code and data are available at: https://github.com/haoosz/ConceptExpress
翻译:尽管个性化文本到图像生成已实现从多张图像中学习单个概念,但更实用且更具挑战性的场景涉及从单张图像中学习多个概念。然而,现有处理该场景的研究严重依赖大量人工标注。本文提出一种名为无监督概念提取(UCE)的新任务,该任务考虑在没有任何人工概念先验知识的无监督设置下进行。给定包含多个概念的图像,该任务旨在仅依靠预训练扩散模型的现有知识来提取并重建各个概念。为实现这一目标,我们提出ConceptExpress方法,通过释放预训练扩散模型在两方面的内在能力来解决UCE问题。具体而言,概念定位方法通过利用扩散自注意力机制的空间对应关系,自动定位并解耦显著概念;基于概念与概念化词元之间的查找关联,概念级优化过程学习代表每个独立概念的判别性词元。最后,我们建立了针对UCE任务的评估协议。大量实验表明,ConceptExpress是解决UCE任务的有效方案。我们的代码与数据公开于:https://github.com/haoosz/ConceptExpress