Language models are the new state-of-the-art natural language processing (NLP) models and they are being increasingly used in many NLP tasks. Even though there is evidence that language models are biased, the impact of that bias on the fairness of downstream NLP tasks is still understudied. Furthermore, despite that numerous debiasing methods have been proposed in the literature, the impact of bias removal methods on the fairness of NLP tasks is also understudied. In this work, we investigate three different sources of bias in NLP models, i.e. representation bias, selection bias and overamplification bias, and examine how they impact the fairness of the downstream task of toxicity detection. Moreover, we investigate the impact of removing these biases using different bias removal techniques on the fairness of toxicity detection. Results show strong evidence that downstream sources of bias, especially overamplification bias, are the most impactful types of bias on the fairness of the task of toxicity detection. We also found strong evidence that removing overamplification bias by fine-tuning the language models on a dataset with balanced contextual representations and ratios of positive examples between different identity groups can improve the fairness of the task of toxicity detection. Finally, we build on our findings and introduce a list of guidelines to ensure the fairness of the task of toxicity detection.
翻译:语言模型已成为自然语言处理领域的最新技术,并被广泛应用于众多NLP任务中。尽管已有证据表明语言模型存在偏见,但这些偏见对下游NLP任务公平性的影响仍缺乏深入研究。此外,尽管文献中提出了多种去偏方法,但偏置消除技术对NLP任务公平性的影响同样未得到充分探讨。本研究考察了NLP模型中三种不同的偏见来源——表征偏差、选择偏差和过度放大偏差,并分析它们如何影响下游毒性检测任务的公平性。同时,我们探究了采用不同偏置消除技术去除这些偏差后对毒性检测公平性的影响。结果表明,下游偏差来源(尤其是过度放大偏差)对毒性检测任务公平性具有最显著的影响。我们还发现,通过在具有平衡语境表征和不同身份群体间正例比例一致的数据集上微调语言模型来消除过度放大偏差,能够有效提升毒性检测任务的公平性。最后,基于研究发现,我们提出了一套确保毒性检测任务公平性的指导原则。