中性提示词，非中性人群：量化Gemini Flash 2.5 Image与GPT Image 1.5中的性别与肤色偏见 (Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5)

This study quantifies gender and skin-tone bias in two widely deployed commercial image generators - Gemini Flash 2.5 Image (NanoBanana) and GPT Image 1.5 - to test the assumption that neutral prompts yield demographically neutral outputs. We generated 3,200 photorealistic images using four semantically neutral prompts. The analysis employed a rigorous pipeline combining hybrid color normalization, facial landmark masking, and perceptually uniform skin tone quantification using the Monk (MST), PERLA, and Fitzpatrick scales. Neutral prompts produced highly polarized defaults. Both models exhibited a strong "default white" bias (>96% of outputs). However, they diverged sharply on gender: Gemini favored female-presenting subjects, while GPT favored male-presenting subjects with lighter skin tones. This research provides a large-scale, comparative audit of state-of-the-art models using an illumination-aware colorimetric methodology, distinguishing aesthetic rendering from underlying pigmentation in synthetic imagery. The study demonstrates that neutral prompts function as diagnostic probes rather than neutral instructions. It offers a robust framework for auditing algorithmic visual culture and challenges the sociolinguistic assumption that unmarked language results in inclusive representation.

翻译：本研究量化了两种广泛部署的商业图像生成器——Gemini Flash 2.5 Image（NanoBanana）与GPT Image 1.5——中的性别与肤色偏见，以检验“中性提示词会产生人口统计学中性输出”的假设。我们使用四个语义中性的提示词生成了3,200张逼真图像。分析采用了一个严谨的流程，结合了混合色彩归一化、面部关键点掩码处理，以及使用Monk（MST）、PERLA和Fitzpatrick量表进行的感知均匀肤色量化。中性提示词产生了高度极化的默认输出。两种模型均表现出强烈的“默认白人”偏见（>96%的输出）。然而，它们在性别上存在显著分歧：Gemini偏好呈现女性特征的主体，而GPT则偏好呈现较浅肤色的男性特征主体。本研究采用了一种光照感知的色度测量方法，对最先进的模型进行了大规模比较性审计，区分了合成图像中的美学渲染与底层色素沉着。该研究表明，中性提示词充当的是诊断性探针，而非中性指令。它为审计算法视觉文化提供了一个稳健的框架，并挑战了“无标记语言会导致包容性表征”的社会语言学假设。

相关内容

Gemini

关注 12

2023年12 月 6 日，谷歌 CEO 桑达尔・皮查伊官宣 Gemini 1.0 版正式上线。这次发布的 Gemini 大模型是原生多模态大模型，是谷歌大模型新时代的第一步，它包括三种量级：能力最强的 Gemini Ultra，适用于多任务的 Gemini Pro 以及适用于特定任务和端侧的 Gemini Nano。

大型语言模型中隐性与显性偏见的综合研究

专知会员服务

16+阅读 · 2025年11月25日

Gemini 2.5：推动前沿，具备先进推理、多模态、长上下文及下一代智能体能力

专知会员服务

20+阅读 · 2025年7月13日

多样化偏好优化

专知会员服务

12+阅读 · 2025年2月3日

【博士论文】语言模型与人类偏好对齐，148页pdf

专知会员服务

32+阅读 · 2024年4月21日