Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, for example, word embeddings. However, the negative side effects of debiasing interventions are usually not revealed in the downstream tasks. We propose xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In this work, We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups, including those the debiasing techniques aim to protect. We advocate that a debiasing technique should have good downstream performance with the constraint of ensuring no harm to the protected group.
翻译:基于真实世界数据训练的语言表征模型可能捕获并加剧不良偏见,导致对不同人口群体的不公平对待。已有研究探索了多种干预技术以消除语言表征模型在基准评估(如词嵌入)中的偏见。然而,去偏见干预的负面副作用通常在下游任务中未被揭示。我们提出xGAP-DEBIAS,一套用于评估去偏见公平性的评估体系。本研究在真实文本分类任务中检验了四种去偏见技术,结果表明降低偏见是以牺牲所有人口群体(包括去偏见技术旨在保护的群体)的性能为代价的。我们主张去偏见技术应在确保不损害受保护群体的约束条件下,保持良好的下游任务性能。