Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, for example, word embeddings. However, the negative side effects of debiasing interventions are usually not revealed in the downstream tasks. We propose xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In this work, We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups, including those the debiasing techniques aim to protect. We advocate that a debiasing technique should have good downstream performance with the constraint of ensuring no harm to the protected group.
翻译:基于真实世界数据训练的语言表示模型可能会捕获并加剧不必要的偏见,导致对不同人口群体成员的不公平对待。已有多种技术被研究用于对语言表示模型进行干预,以消除词嵌入等基准评估中的偏见。然而,去偏干预的负面副作用通常在下游任务中未被揭示。我们提出xGAP-DEBIAS,一套评估去偏公平性的评测方法。在本工作中,我们在一个真实世界的文本分类任务上检验了四种去偏技术,并表明减少偏见是以牺牲所有人口群体(包括去偏技术旨在保护的群体)的性能为代价的。我们主张,去偏技术应在确保不伤害受保护群体的约束下,具有良好下游性能表现。