Large pre-trained language models contain societal biases and carry along these biases to downstream tasks. Current in-processing bias mitigation approaches (like adversarial training) impose debiasing by updating a model's parameters, effectively transferring the model to a new, irreversible debiased state. In this work, we propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched. Drawing from the concept of AdapterFusion in multi-task learning, we introduce DAM (Debiasing with Adapter Modules) - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand in order to deliver fairness qualities. We conduct a large set of experiments on three classification tasks with gender, race, and age as protected attributes. Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids catastrophic forgetting in a multi-attribute scenario, and maintains on-par task performance, while granting parameter-efficiency and easy switching between the original and debiased models.
翻译:大型预训练语言模型包含社会性偏差,并会将这些偏差传递至下游任务。当前的动态偏差缓解方法(如对抗训练)通过更新模型参数进行去偏,实质上将模型转移至一种不可逆的新去偏状态。本研究提出一种新颖方法,开发独立于模型的模块化去偏功能,可在保持核心模型不变的前提下按需集成至模型中。借鉴多任务学习中的适配器融合概念,我们提出DAM(基于适配器模块的去偏方法)——该去偏方法首先将任意偏差缓解功能封装至独立适配器中,随后按需添加至模型以实现公平性特征。我们针对以性别、种族和年龄作为保护属性的三项分类任务开展大量实验。结果表明,DAM能提升或保持偏差缓解效果,在多属性场景中避免灾难性遗忘,维持相当的任务性能,同时实现参数高效性并支持原始模型与去偏模型间的便捷切换。