We ask whether demographic identity, signaled by a name alone, systematically reshapes the generative distribution of a language model. Measuring full-vocabulary Shannon entropy at temperature zero across six open-weight base models and 5,760 implicit sentence-completion prompts (e.g., "Tanisha walked into the office on a Monday morning and"), we find that Black-associated names produce higher first-token entropy than White-associated names across all six architectures - opposite to the output-level homogeneity bias documented under explicit demographic prompting (Lee et al., 2024) - and Black-associated names always produce greater entropy above identity-neutral baselines than White-associated names ($ΔΔ> 0$ in all six models). Women-associated names co-occur with lower first-token entropy (DL-pooled $\hatβ= -0.041, p = .019$) and more homogeneous outputs ($\hatα= +0.024, p < .001$) than men-associated names - a pattern convergent with homogeneity bias; race and gender effects are additive. Instruction tuning does not attenuate the race gap (matched-format DL-pooled $\hatβ=+0.153$). Running the same templates with explicit group labels instead of names yields null race effects in 10 of 12 models where implicit probing is significant - establishing that probing methodology is a primary determinant of which distributional structure is recovered.
翻译:我们探究了仅由名字标识的人口群体特征是否系统性地重塑了语言模型的生成分布。通过测量六个开源基础模型在零温度下的全词表香农熵,并基于5,760个隐式句子补全提示(例如“塔尼莎周一早晨走进办公室,然后...”),发现所有六种架构中,与非裔美国人名字相关的首词熵均高于欧裔美国人名字——这与显式群体提示下记录的输出层面同质性偏差(Lee等人,2024)相反——且非裔美国人名字相对于群体中性基线产生的熵始终高于欧裔美国人名字(全部六种模型中ΔΔ>0)。女性关联名字的首词熵低于男性关联名字(深度学习合并估计β̂=-0.041,p=0.019),且输出同质性更高(α̂=+0.024,p<0.001)——该模式与同质性偏差趋同;种族与性别效应具有可加性。指令微调并未减弱种族差异(匹配格式深度学习合并估计β̂=+0.153)。使用显式群体标签而非名字运行相同模板时,在隐式探测具有显著性的12个模型中有10个模型呈现零种族效应——这表明探测方法是决定所恢复分布结构的首要因素。