Large pre-trained language models contain societal biases and carry along these biases to downstream tasks. Current in-processing bias mitigation approaches (like adversarial training) impose debiasing by updating a model's parameters, effectively transferring the model to a new, irreversible debiased state. In this work, we propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched. Drawing from the concept of AdapterFusion in multi-task learning, we introduce DAM (Debiasing with Adapter Modules) - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand in order to deliver fairness qualities. We conduct a large set of experiments on three classification tasks with gender, race, and age as protected attributes. Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids catastrophic forgetting in a multi-attribute scenario, and maintains on-par task performance, while granting parameter-efficiency and easy switching between the original and debiased models.
翻译:大规模预训练语言模型包含社会偏见,并将这些偏见传递至下游任务。当前的进程内偏差缓解方法(如对抗训练)通过更新模型参数实现去偏,实际上将模型转移至一种新的、不可逆的去偏状态。本文提出一种新颖方法,开发独立于模型的可独立部署去偏功能,该功能可按需集成至模型中,同时保持核心模型不变。借鉴多任务学习中的AdapterFusion概念,我们引入DAM(基于适配器模块的去偏方法)——一种先将任意偏差缓解功能封装至独立适配器,再按需添加至模型以实现公平性的去偏方法。我们在三项分类任务上开展大量实验,以性别、种族和年龄作为受保护属性。结果表明,DAM能够提升或保持偏差缓解的有效性,在多属性场景下避免灾难性遗忘,并维持相当的任务性能,同时实现参数效率并支持在原始模型与去偏模型之间便捷切换。