As language models are increasingly included in human-facing machine learning tools, bias against demographic subgroups has gained attention. We propose FineDeb, a two-phase debiasing framework for language models that starts with contextual debiasing of embeddings learned by pretrained language models. The model is then fine-tuned on a language modeling objective. Our results show that FineDeb offers stronger debiasing in comparison to other methods which often result in models as biased as the original language model. Our framework is generalizable for demographics with multiple classes, and we demonstrate its effectiveness through extensive experiments and comparisons with state of the art techniques. We release our code and data on GitHub.
翻译:随着语言模型越来越多地嵌入面向人类的机器学习工具中,针对人口统计子群体的偏见问题已引起关注。我们提出FineDeb,一种两阶段语言模型去偏框架:首先对预训练语言模型习得的嵌入进行上下文去偏处理,随后基于语言建模目标对模型进行微调。实验结果表明,相较于其他常导致模型与原始语言模型同样存在偏见的方法,FineDeb能够实现更强的去偏效果。该框架可泛化至包含多类别的人口统计场景,我们通过大量实验及与前沿技术的对比验证了其有效性。相关代码与数据已公开发布于GitHub。