Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: \textit{representative bias}, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and \textit{affinity bias}, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.
翻译:大型语言模型(LLM)的研究常忽视那些虽不明显却能显著影响模型输出、使其偏向特定社会叙事的细微偏差。本研究针对LLM中的两种此类偏差展开探讨:\textit{代表性偏差},指LLM倾向于生成反映特定身份群体经验的输出;以及\textit{亲和性偏差},反映模型对特定叙事或观点的评价偏好。我们引入了两种新颖的指标来度量这些偏差:代表性偏差分数(RBS)和亲和性偏差分数(ABS),并提出了创造力导向生成套件(CoGS)——一套包含短篇故事写作和诗歌创作等开放式任务的集合,其设计采用定制化评估准则以检测这些细微偏差。我们的分析揭示了主流LLM中存在显著的代表性偏差,尤其偏好与白人、异性恋及男性相关的身份。此外,对亲和性偏差的考察显示每个模型内部存在独特的评价模式,类似于“偏差指纹”。这一趋势在人类评估者中同样存在,凸显了人类与机器偏差认知之间复杂的相互作用。