A creative idea is often born from transforming, combining, and modifying ideas from existing visual examples capturing various concepts. However, one cannot simply copy the concept as a whole, and inspiration is achieved by examining certain aspects of the concept. Hence, it is often necessary to separate a concept into different aspects to provide new perspectives. In this paper, we propose a method to decompose a visual concept, represented as a set of images, into different visual aspects encoded in a hierarchical tree structure. We utilize large vision-language models and their rich latent space for concept decomposition and generation. Each node in the tree represents a sub-concept using a learned vector embedding injected into the latent space of a pretrained text-to-image model. We use a set of regularizations to guide the optimization of the embedding vectors encoded in the nodes to follow the hierarchical structure of the tree. Our method allows to explore and discover new concepts derived from the original one. The tree provides the possibility of endless visual sampling at each node, allowing the user to explore the hidden sub-concepts of the object of interest. The learned aspects in each node can be combined within and across trees to create new visual ideas, and can be used in natural language sentences to apply such aspects to new designs.
翻译:创造性想法往往源于对蕴含多种概念的现有视觉样例进行转换、组合与修改。然而,我们无法简单地整体复制某个概念,而需通过审视该概念的特定层面来获得灵感。因此,将概念分解为不同层面以提供新视角显得尤为重要。本文提出一种方法,将呈现为图像集合的视觉概念分解为编码于层次树结构中的不同视觉层面。我们利用大规模视觉-语言模型及其丰富的隐空间实现概念分解与生成。树中每个节点通过将学习到的向量嵌入注入预训练文本到图像模型的隐空间来表示子概念。我们采用一组正则化方法引导节点中嵌入向量的优化,使其遵循树的层次结构。该方法允许探索并发现源于原始概念的新概念。树结构为每个节点提供了无限视觉采样的可能性,使用户能够探索目标对象隐藏的子概念。每个节点习得的层面可在同一树内部或跨树组合以创造新视觉构想,亦可应用于自然语言语句中,将此类层面融入新设计。