FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. Additionally, a simplified "style" adaptation prevents combining multiple attributes from different sources into one generated image. In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images. To achieve this goal, we constructed the first fine-grained visual attributes dataset (FiVA) to the best of our knowledge. This FiVA dataset features a well-organized taxonomy for visual attributes and includes around 1 M high-quality generated images with visual attribute annotations. Leveraging this dataset, we propose a fine-grained visual attribute adaptation framework (FiVA-Adapter), which decouples and adapts visual attributes from one or more source images into a generated one. This approach enhances user-friendly customization, allowing users to selectively apply desired attributes to create images that meet their unique preferences and specific content requirements.

翻译：近年来，文本到图像生成技术的进步使得能够创建高质量图像，并具有多样化的应用。然而，准确描述所需的视觉属性可能具有挑战性，特别是对于艺术和摄影领域的非专业人士而言。一种直观的解决方案涉及从源图像中采用有利的属性。当前的方法试图从源图像中提取身份和风格。然而，“风格”是一个宽泛的概念，包括纹理、色彩和艺术元素，但并未涵盖其他重要属性，如光照和动态效果。此外，简化的“风格”适配阻碍了将来自不同源的多个属性组合到一张生成的图像中。在这项工作中，我们提出了一种更有效的方法，将图片的美学分解为特定的视觉属性，允许用户应用来自不同图像的特性，如光照、纹理和动态效果。为了实现这一目标，我们构建了据我们所知首个细粒度视觉属性数据集（FiVA）。该FiVA数据集具有组织良好的视觉属性分类体系，并包含约100万张带有视觉属性标注的高质量生成图像。利用该数据集，我们提出了一个细粒度视觉属性适配框架（FiVA-Adapter），该框架将一个或多个源图像中的视觉属性解耦并适配到生成的图像中。这种方法增强了用户友好的定制能力，允许用户有选择地应用所需属性，以创建满足其独特偏好和特定内容需求的图像。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日