Despite recent advancements, the field of text-to-image synthesis still suffers from lack of fine-grained control. Using only text, it remains challenging to deal with issues such as concept coherence and concept contamination. We propose a method to enhance control by generating specific concepts that can be reused throughout multiple images, effectively expanding natural language with new words that can be combined much like a painter's palette. Unlike previous contributions, our method does not copy visuals from input data and can generate concepts through text alone. We perform a set of comparisons that finds our method to be a significant improvement over text-only prompts.
翻译:摘要:尽管近年来取得了进展,文本到图像合成领域仍存在细粒度控制不足的问题。仅依靠文本,概念连贯性与概念污染等挑战依然难以解决。我们提出一种方法,通过生成可在多幅图像中重复使用的特定概念来增强控制,从而有效扩展自然语言,添加可像画家调色板一样组合的新词汇。与以往研究不同,我们的方法不从输入数据中复制视觉内容,且可仅通过文本生成概念。通过一系列比较,我们发现该方法相较于纯文本提示有显著改进。