Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.
翻译:基于提示的“多样性干预”通常被用于提升文本到图像(T2I)模型在描绘具有不同种族或性别特征个体时的多样性。然而,这种策略是否会导致非事实的人口统计分布,尤其是在生成真实历史人物时?在本工作中,我们提出了人口统计事实性表征(DoFaiR)基准,以系统量化在T2I模型中使用多样性干预与保持人口统计事实性之间的权衡。DoFaiR包含756个经过细致事实核查的测试实例,通过一个自动化的证据支持评估流程,揭示各种多样性提示所带来的事实性代价。在DoFaiR上的实验表明,面向多样性的指令虽然增加了DALL-E-3生成结果中不同性别和种族群体的数量,但代价是产生了历史上不准确的人口统计分布。为解决此问题,我们提出了事实增强干预(FAI)方法,该方法指示一个大语言模型(LLM)对生成对象在历史上的性别与种族构成的口头化或检索到的事实信息进行反思,并将其整合到T2I模型的生成上下文中。通过利用反思得到的历史事实来引导模型生成,FAI在保持多样性的同时,显著提升了多样性干预下的人口统计事实性。