As libraries explore large language models (LLMs) as a scalable layer for reference services, a core fairness question follows: can LLM-based services support all patrons fairly, regardless of demographic identity? While LLMs offer great potential for broadening access to information assistance, they may also reproduce societal biases embedded in their training data, potentially undermining libraries' commitments to impartial service. In this chapter, we apply a systematic evaluation approach that combines diagnostic classification to detect systematic differences with linguistic analysis to interpret their sources. Across three widely used open models (Llama-3.1 8B, Gemma-2 9B, and Ministral 8B), we find no compelling evidence of systematic differentiation by race/ethnicity, and only minor evidence of sex-linked differentiation in one model. We discuss implications for responsible AI adoption in libraries and the importance of ongoing monitoring in aligning LLM-based services with core professional values.
翻译:随着图书馆探索将大语言模型(LLMs)作为参考服务的可扩展层,一个核心的公平性问题随之产生:基于LLM的服务能否公平地支持所有用户,而不论其人口统计身份?尽管LLM在扩大信息辅助获取范围方面展现出巨大潜力,但它们也可能复现训练数据中蕴含的社会偏见,从而可能削弱图书馆提供公正服务的承诺。在本章中,我们采用一种系统性的评估方法,该方法结合了用于检测系统性差异的诊断性分类与用于解释其来源的语言学分析。通过对三个广泛使用的开放模型(Llama-3.1 8B、Gemma-2 9B和Ministral 8B)的评估,我们没有发现强有力的证据表明存在基于种族/民族系统性差异,仅在一个模型中发现了微弱的与性别相关的差异证据。我们讨论了图书馆负责任地采用人工智能的影响,以及持续监控对于使基于LLM的服务与核心专业价值观保持一致的重要性。