Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.
翻译:大型语言模型(LLM)研究常忽视那些虽不明显却可能显著影响模型输出、使其偏向特定社会叙事的微妙偏差。本研究针对LLM中的两类此类偏差展开探讨:表征偏差——指LLM倾向于生成反映特定身份群体经验的输出;以及亲和偏差——反映模型对特定叙事或观点的评价偏好。我们引入两种新颖的度量指标来衡量这些偏差:表征偏差分数(RBS)与亲和偏差分数(ABS),并提出了创造力导向生成测试集(CoGS)。该测试集包含开放式任务(如短篇故事创作与诗歌写作),并配备定制化评估准则以探测这些细微偏差。我们的分析揭示了主流LLM中存在显著的表征偏差,其偏好与白人、异性恋及男性相关的身份特征。此外,对亲和偏差的考察显示出各模型内部独特的评价模式,类似于“偏差指纹”。这一趋势在人类评估者中同样存在,凸显了人类与机器偏差认知之间复杂的相互作用。