Synthetic face generation has rapidly advanced with the emergence of text-to-image (T2I) and of multimodal large language models, enabling high-fidelity image production from natural-language prompts. Despite the widespread adoption of these tools, the biases, representational quality, and cross-cultural consistency of these models remain poorly understood. Prior research on biases in the synthetic generation of human faces has examined demographic biases, yet there is little research on how emotional prompts influence demographic representation and how models trained in different cultural and linguistic contexts vary in their output distributions. We present a systematic audit of eight state-of-the-art T2I models comprising four models developed by Western organizations and four developed by Chinese institutions, all prompted identically. Using state-of-the-art facial analysis algorithms, we estimate the gender, race, age, and attractiveness levels in the generated faces. To measure the deviations from global population statistics, we apply information-theoretic bias metrics including Kullback-Leibler and Jensen-Shannon divergences. Our findings reveal persistent demographic and emotion-conditioned biases in all models regardless of their country of origin. We discuss implications for fairness, socio-technical harms, governance, and the development of transparent generative systems.
翻译:随着文本到图像(T2I)模型和多模态大语言模型的出现,合成人脸生成技术迅速发展,能够根据自然语言提示生成高保真度的图像。尽管这些工具已被广泛采用,但人们对这些模型的偏见、表征质量以及跨文化一致性仍知之甚少。先前关于人脸合成生成中偏见的研究已考察了人口统计学偏见,但关于情绪提示如何影响人口统计学表征,以及在不同文化和语言背景下训练的模型在其输出分布上如何变化的研究却很少。我们对八个最先进的T2I模型进行了系统性审计,其中包括四个由西方组织开发的模型和四个由中国机构开发的模型,并对所有模型使用完全相同的提示。利用最先进的面部分析算法,我们估算了生成人脸中的性别、种族、年龄和吸引力水平。为了衡量其与全球人口统计数据的偏差,我们应用了信息论偏见度量,包括Kullback-Leibler散度和Jensen-Shannon散度。我们的研究结果表明,所有模型都存在持续的人口统计学偏见和情绪条件化偏见,且与其来源国无关。我们讨论了这对公平性、社会技术危害、治理以及透明生成系统发展的影响。