Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content, including professional documents such as recommendation letters. Though bringing convenience, this application also introduces unprecedented fairness concerns. Model-generated reference letters might be directly used by users in professional scenarios. If underlying biases exist in these model-constructed letters, using them without scrutinization could lead to direct societal harms, such as sabotaging application success rates for female applicants. In light of this pressing issue, it is imminent and necessary to comprehensively study fairness issues and associated harms in this real-world use case. In this paper, we critically examine gender biases in LLM-generated reference letters. Drawing inspiration from social science findings, we design evaluation methods to manifest biases through 2 dimensions: (1) biases in language style and (2) biases in lexical content. We further investigate the extent of bias propagation by analyzing the hallucination bias of models, a term that we define to be bias exacerbation in model-hallucinated contents. Through benchmarking evaluation on 2 popular LLMs- ChatGPT and Alpaca, we reveal significant gender biases in LLM-generated recommendation letters. Our findings not only warn against using LLMs for this application without scrutinization, but also illuminate the importance of thoroughly studying hidden biases and harms in LLM-generated professional documents.
翻译:大型语言模型(LLMs)近期已成为协助个人撰写各类内容(包括推荐信等专业文档)的有效工具。虽然带来了便利,但这一应用也引发了前所未有的公平性问题。模型生成的推荐信可能在专业场景中被用户直接使用。若这些模型构建的信件存在潜在偏见,未经审查地使用将直接导致社会危害,例如损害女性申请者的成功率。鉴于这一紧迫问题,全面研究该实际应用场景中的公平性问题及相关危害既迫在眉睫又十分必要。本文批判性地审视了LLMs生成推荐信中的性别偏见。受社会科学研究成果启发,我们设计了通过两个维度揭示偏见的评估方法:(1)语言风格偏见;(2)词汇内容偏见。进一步通过分析模型的幻觉偏见(我们定义为模型虚构内容中偏见加剧的现象),研究了偏见的传播程度。通过对ChatGPT和Alpaca两种主流LLMs的基准评估,我们揭示了LLMs生成推荐信中存在的显著性别偏见。研究结果不仅警示了未经审查便使用LLMs进行此类应用的风险,也阐明了深入研究LLMs生成专业文档中隐藏偏见与危害的重要性。