Text-to-image generative models often present issues regarding fairness with respect to certain sensitive attributes, such as gender or skin tone. This study aims to reproduce the results presented in "ITI-GEN: Inclusive Text-to-Image Generation" by Zhang et al. (2023a), which introduces a model to improve inclusiveness in these kinds of models. We show that most of the claims made by the authors about ITI-GEN hold: it improves the diversity and quality of generated images, it is scalable to different domains, it has plug-and-play capabilities, and it is efficient from a computational point of view. However, ITI-GEN sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness. In addition, when the number of considered attributes increases, the training time grows exponentially and ITI-GEN struggles to generate inclusive images for all elements in the joint distribution. To solve these issues, we propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search. Nonetheless, Hard Prompt Search (with or without negative prompting) cannot be used for continuous attributes that are hard to express in natural language, an area where ITI-GEN excels as it is guided by images during training. Finally, we propose combining ITI-GEN and Hard Prompt Search with negative prompting.
翻译:文本到图像生成模型在诸如性别或肤色等敏感属性方面常存在公平性问题。本研究旨在复现Zhang等人(2023a)在《ITI-GEN:包容性文本到图像生成》中提出的结果,该论文介绍了一种提升此类模型包容性的模型。我们证明作者关于ITI-GEN的大部分主张是成立的:它提升了生成图像的多样性与质量,可扩展至不同领域,具备即插即用能力,且从计算角度来看是高效的。然而,ITI-GEN有时会将非期望属性作为代理特征使用,并且无法解耦某些(相关的)属性对,例如性别与秃顶。此外,当考虑的属性数量增加时,训练时间呈指数级增长,且ITI-GEN难以针对联合分布中的所有元素生成包容性图像。为解决这些问题,我们提出使用带负向提示的硬提示搜索方法,该方法无需训练,且比原始硬提示搜索能更好地处理否定语义。尽管如此,硬提示搜索(无论是否包含负向提示)无法用于难以用自然语言表达的连续属性,而这正是ITI-GEN的优势领域,因为其在训练过程中受图像引导。最后,我们提出将ITI-GEN与带负向提示的硬提示搜索相结合。