Deep neural networks often learn unintended bias during training, which might have harmful effects when deployed in real-world settings. This work surveys 214 papers related to sociodemographic bias in natural language processing (NLP). In this study, we aim to provide a more comprehensive understanding of the similarities and differences among approaches to sociodemographic bias in NLP. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing techniques. We highlight the current trends in quantifying bias and debiasing techniques, offering insights into their strengths and weaknesses. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world bias, and that debiasing techniques need to focus more on training methods. Finally, we provide recommendations for future work.
翻译:深度神经网络在训练过程中常会习得非预期的偏见,当这些模型被部署到真实场景时可能产生有害影响。本文综述了214篇与自然语言处理(NLP)中社会人口学偏见相关的研究论文。本研究旨在更全面地理解NLP中社会人口学偏见研究方法的异同。为更好区分偏见与真实世界危害的差异,我们借鉴心理学和行为经济学的观点,提出了社会人口学偏见的定义。我们识别出NLP偏见研究的三大类别:偏见类型、偏见量化与去偏见技术。聚焦当前偏见量化与去偏见技术的发展趋势,分析了其优势与局限性。研究指出,当前偏见量化方法存在可靠性问题,许多偏见指标与真实世界的偏见缺乏关联,而去偏见技术需更关注训练方法的改进。最后,我们为未来研究提出了建议。