This study quantifies gender and skin-tone bias in two widely deployed commercial image generators - Gemini Flash 2.5 Image (NanoBanana) and GPT Image 1.5 - to test the assumption that neutral prompts yield demographically neutral outputs. We generated 3,200 photorealistic images using four semantically neutral prompts. The analysis employed a rigorous pipeline combining hybrid color normalization, facial landmark masking, and perceptually uniform skin tone quantification using the Monk (MST), PERLA, and Fitzpatrick scales. Neutral prompts produced highly polarized defaults. Both models exhibited a strong "default white" bias (>96% of outputs). However, they diverged sharply on gender: Gemini favored female-presenting subjects, while GPT favored male-presenting subjects with lighter skin tones. This research provides a large-scale, comparative audit of state-of-the-art models using an illumination-aware colorimetric methodology, distinguishing aesthetic rendering from underlying pigmentation in synthetic imagery. The study demonstrates that neutral prompts function as diagnostic probes rather than neutral instructions. It offers a robust framework for auditing algorithmic visual culture and challenges the sociolinguistic assumption that unmarked language results in inclusive representation.
翻译:本研究量化了两种广泛部署的商业图像生成器——Gemini Flash 2.5 Image(NanoBanana)与GPT Image 1.5——中的性别与肤色偏见,以检验“中性提示词会产生人口统计学中性输出”的假设。我们使用四个语义中性的提示词生成了3,200张逼真图像。分析采用了一个严谨的流程,结合了混合色彩归一化、面部关键点掩码处理,以及使用Monk(MST)、PERLA和Fitzpatrick量表进行的感知均匀肤色量化。中性提示词产生了高度极化的默认输出。两种模型均表现出强烈的“默认白人”偏见(>96%的输出)。然而,它们在性别上存在显著分歧:Gemini偏好呈现女性特征的主体,而GPT则偏好呈现较浅肤色的男性特征主体。本研究采用了一种光照感知的色度测量方法,对最先进的模型进行了大规模比较性审计,区分了合成图像中的美学渲染与底层色素沉着。该研究表明,中性提示词充当的是诊断性探针,而非中性指令。它为审计算法视觉文化提供了一个稳健的框架,并挑战了“无标记语言会导致包容性表征”的社会语言学假设。