Counterfactual Data Augmentation (CDA) has been one of the preferred techniques for mitigating gender bias in natural language models. CDA techniques have mostly employed word substitution based on dictionaries. Although such dictionary-based CDA techniques have been shown to significantly improve the mitigation of gender bias, in this paper, we highlight some limitations of such dictionary-based counterfactual data augmentation techniques, such as susceptibility to ungrammatical compositions, and lack of generalization outside the set of predefined dictionary words. Model-based solutions can alleviate these problems, yet the lack of qualitative parallel training data hinders development in this direction. Therefore, we propose a combination of data processing techniques and a bi-objective training regime to develop a model-based solution for generating counterfactuals to mitigate gender bias. We implemented our proposed solution and performed an empirical evaluation which shows how our model alleviates the shortcomings of dictionary-based solutions.
翻译:反事实数据增强(CDA)一直是缓解自然语言模型中性别偏见的常用技术。CDA技术主要采用基于词典的词替换方法。尽管此类基于词典的CDA技术已被证明能显著改善性别偏见的缓解效果,但本文指出了这类基于词典的反事实数据增强技术的一些局限性,例如容易产生不合语法的组合,以及在预定义词典词汇集之外的泛化能力不足。基于模型的解决方案可以缓解这些问题,但缺乏高质量的平行训练数据阻碍了这一方向的发展。因此,我们提出了一种数据处理技术与双目标训练机制相结合的方法,开发出基于模型的反事实生成解决方案以缓解性别偏见。我们实现了所提出的方案并进行了实证评估,结果表明我们的模型有效缓解了基于词典的解决方案的缺陷。