The increasingly large size of modern pretrained language models not only makes them inherit more human-like biases from the training corpora, but also makes it computationally expensive to mitigate such biases. In this paper, we investigate recent parameter-efficient methods in combination with counterfactual data augmentation (CDA) for bias mitigation. We conduct extensive experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance and abilities to preserve the internal knowledge of a pre-trained model. We find that the parameter-efficient methods (i) are effective in mitigating gender bias, where adapter tuning is consistently the most effective one and prompt tuning is more suitable for GPT-2 than BERT, (ii) are less effective when it comes to racial and religious bias, which may be attributed to the limitations of CDA, and (iii) can perform similarly to or sometimes better than full fine-tuning with improved time and memory efficiency, as well as maintain the internal knowledge in BERT and GPT-2, evaluated via fact retrieval and downstream fine-tuning.
翻译:现代预训练语言模型规模的不断增大,不仅使其从训练语料中继承了更多类人偏见,也使得缓解此类偏见的计算成本变得高昂。本文研究了最近提出的参数高效方法与反事实数据增强(CDA)相结合进行偏见缓解的问题。我们针对前缀调优、提示调优和适配器调优,在不同语言模型和偏见类型上开展了大量实验,评估其去偏性能以及保持预训练模型内部知识的能力。我们发现:参数高效方法(i)在缓解性别偏见方面有效,其中适配器调优始终最为有效,而提示调优更适用于GPT-2而非BERT;(ii)在应对种族和宗教偏见时效果较差,这可能归因于CDA的局限性;(iii)在提高时间和内存效率的同时,其表现可与全参数微调相似甚至更优,并在事实检索和下游微调的评估中,能够保持BERT和GPT-2的内部知识。