Allocational harms occur when resources or opportunities are unfairly withheld from specific groups. Many proposed bias measures ignore the discrepancy between predictions, which are what the proposed methods consider, and decisions that are made as a result of those predictions. Our work examines the reliability of current bias metrics in assessing allocational harms arising from predictions of large language models (LLMs). We evaluate their predictive validity and utility for model selection across ten LLMs and two allocation tasks. Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes. Our work highlights the need to account for how model predictions are used in decisions, in particular in contexts where they are influenced by how limited resources are allocated.
翻译:分配性伤害指资源或机会被不公正地从特定群体中剥夺的现象。许多现有偏见度量方法忽略了预测结果(即现有方法所考察的对象)与基于这些预测所作决策之间的差异。本研究检验了当前偏见指标在评估大语言模型(LLMs)预测引发的分配性伤害时的可靠性。我们通过十个大语言模型和两项分配任务,评估了这些指标的预测效度及其在模型选择中的实用性。结果表明,基于平均性能差距和分布距离的常用偏见指标无法可靠捕捉分配结果中的群体差异。本研究强调在评估模型时需要考量预测结果如何转化为实际决策,特别是在有限资源分配的情境中。