Existing studies addressing gender bias of pre-trained language models, usually build a small gender-neutral data set and conduct a second phase pre-training on the model with such data. However, given the limited size and concentrated focus of the gender-neutral data, catastrophic forgetting would occur during second-phase pre-training. Forgetting information in the original training data may damage the model's downstream performance by a large margin. In this work, we empirically show that catastrophic forgetting occurs in such methods by evaluating them with general NLP tasks in GLUE. Then, we propose a new method, GEnder Equality Prompt (GEEP), to improve gender fairness of pre-trained models with less forgetting. GEEP freezes the pre-trained model and learns gender-related prompts with gender-neutral data. Empirical results show that GEEP not only achieves SOTA performances on gender fairness tasks, but also forgets less and performs better on GLUE by a large margin.
翻译:现有针对预训练语言模型性别偏见的研究,通常构建一个小型性别中立数据集,并基于该数据对模型进行第二阶段预训练。然而,受限于性别中立数据规模小且主题集中,第二阶段预训练过程中会出现灾难性遗忘。原始训练数据的信息遗忘可能严重损害模型的下游任务性能。本研究通过GLUE通用自然语言处理任务评估,实证表明此类方法确实存在灾难性遗忘问题。为此,我们提出一种新方法——性别平等提示(GEEP),旨在以更少遗忘提升预训练模型的性别公平性。GEEP冻结预训练模型,利用性别中立数据学习与性别相关的提示。实验结果表明,GEEP不仅在性别公平任务上达到最先进性能,而且在GLUE基准测试中遗忘更少、性能显著更优。