On the Cultural Gap in Text-to-Image Generation

from arxiv, Equal contribution: Bingshuai Liu and Longyue Wang. Work done while Bingshuai Liu and Chengyang Lyu were interning at Tencent AI Lab. Zhaopeng Tu is the corresponding author

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).

翻译：文本到图像生成（T2I）面临的一个挑战是训练数据中无意反映出的文化差距，即当输入文本的文化元素在训练集中鲜有出现时，生成的图像质量会出现差异。尽管各类T2I模型已展现出令人印象深刻但任意的生成案例，目前仍缺乏系统性评估模型跨文化图像生成能力的基准。为弥补这一空白，我们提出具有综合评价标准的挑战性跨文化（C3）基准，可评估模型对目标文化的适配程度。通过分析Stable Diffusion模型在C3基准上生成的缺陷图像，我们发现该模型常无法生成特定文化物品。据此，我们提出一种考虑对象-文本对齐的新型多模态指标，用于筛选目标文化中的微调数据，进而对T2I模型进行微调以提升跨文化生成能力。实验结果表明，在C3基准上，我们的多模态指标相比现有指标展现出更强的数据筛选性能，其中对象-文本对齐至关重要。为促进文化多样性T2I生成研究，我们公开了基准、数据、代码及生成图像（https://github.com/longyuewangdcu/C3-Bench）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日