An essential aspect of evaluating Large Language Models (LLMs) is identifying potential biases. This is especially relevant considering the substantial evidence that LLMs can replicate human social biases in their text outputs and further influence stakeholders, potentially amplifying harm to already marginalized individuals and communities. Therefore, recent efforts in bias detection invested in automated benchmarks and objective metrics such as accuracy (i.e., an LLMs output is compared against a predefined ground truth). Nonetheless, social biases can be nuanced, oftentimes subjective and context-dependent, where a situation is open to interpretation and there is no ground truth. While these situations can be difficult for automated evaluation systems to identify, human evaluators could potentially pick up on these nuances. In this paper, we discuss the role of human evaluation and subjective interpretation to augment automated processes when identifying biases in LLMs as part of a human-centred approach to evaluate these models.
翻译:评估大语言模型(LLMs)的一个关键方面是识别潜在偏见。鉴于大量证据表明LLMs可能在其文本输出中复制人类社会偏见,并进一步影响利益相关者,可能加剧对已处于边缘地位的个人和群体的伤害,这一点尤为重要。因此,近年来的偏见检测工作主要集中于自动化基准测试和客观指标,如准确率(即比较LLM输出与预定义的真实值)。然而,社会偏见可能是微妙的,常常具有主观性和语境依赖性,此时情况可能有多种解读,且不存在唯一真实答案。虽然这类情况可能难以被自动化评估系统识别,但人类评估者或能捕捉到这些细微之处。本文探讨了人类评估与主观解读在增强自动化偏见检测过程中的作用,作为以人为中心评估这些模型的一部分。