Large language models (LLMs) are known to exhibit demographic biases, yet few studies systematically evaluate these biases across multiple datasets or account for confounding factors. In this work, we examine LLM alignment with human annotations in five offensive language datasets, comprising approximately 220K annotations. Our findings reveal that while demographic traits, particularly race, influence alignment, these effects are inconsistent across datasets and often entangled with other factors. Confounders -- such as document difficulty, annotator sensitivity, and within-group agreement -- account for more variation in alignment patterns than demographic traits alone. Specifically, alignment increases with higher annotator sensitivity and group agreement, while greater document difficulty corresponds to reduced alignment. Our results underscore the importance of multi-dataset analyses and confounder-aware methodologies in developing robust measures of demographic bias in LLMs.
翻译:大语言模型(LLMs)已知存在人口统计偏差,但少有研究系统性地评估跨多个数据集的此类偏差或考虑混杂因素。本研究基于五个冒犯性语言数据集(包含约22万条标注),检验了LLMs与人类标注的对齐程度。研究发现:虽然人口统计特征(尤其是种族)会影响对齐效果,但这些影响在不同数据集间并不一致,且常与其他因素相互交织。混杂因素——如文本难度、标注者敏感度及组内一致性——比对齐模式变异性的解释力强于单纯的人口统计特征。具体而言,标注者敏感度与组内一致性的提升会增强对齐效果,而文本难度的增加则导致对齐程度下降。我们的结果凸显了多数据集分析和考虑混杂因素的方法对于构建稳健的大语言模型人口统计偏差度量体系的重要性。