Spurious correlations were found to be an important factor explaining model performance in various NLP tasks (e.g., gender or racial artifacts), often considered to be ''shortcuts'' to the actual task. However, humans tend to similarly make quick (and sometimes wrong) predictions based on societal and cognitive presuppositions. In this work we address the question: can we quantify the extent to which model biases reflect human behaviour? Answering this question will help shed light on model performance and provide meaningful comparisons against humans. We approach this question through the lens of the dual-process theory for human decision-making. This theory differentiates between an automatic unconscious (and sometimes biased) ''fast system'' and a ''slow system'', which when triggered may revisit earlier automatic reactions. We make several observations from two crowdsourcing experiments of gender bias in coreference resolution, using self-paced reading to study the ''fast'' system, and question answering to study the ''slow'' system under a constrained time setting. On real-world data humans make $\sim$3\% more gender-biased decisions compared to models, while on synthetic data models are $\sim$12\% more biased.
翻译:摘要:虚假相关性已被确认为解释各类自然语言处理任务(如性别或种族伪影)中模型性能的关键因素,常被视为实际任务的"捷径"。然而,人类同样倾向于基于社会认知预设做出快速(且有时错误的)判断。本研究探讨的核心问题是:能否量化模型偏见在多大程度上反映了人类行为?回答这一疑问将有助于揭示模型性能的机理,并为人类与模型的比较提供有意义的参照。我们通过人类决策双重过程理论来探究该问题。该理论将决策划分为自动无意识(可能带有偏见)的"快速系统"和触发后可能修正早期自动反应的"慢速系统"。通过两项关于共指消解中性别偏见的众包实验,我们采用自定步速阅读研究"快速系统",并在限定时间条件下通过问答研究"慢速系统",获得若干发现。在真实数据上,人类做出的性别偏见决策比模型多约3%,而在合成数据上,模型的偏见程度则高出约12%。