Effective Controllable Bias Mitigation for Classification and Retrieval using Gate Adapters

Bias mitigation of Language Models has been the topic of many studies with a recent focus on learning separate modules like adapters for on-demand debiasing. Besides optimizing for a modularized debiased model, it is often critical in practice to control the degree of bias reduction at inference time, e.g., in order to tune for a desired performance-fairness trade-off in search results or to control the strength of debiasing in classification tasks. In this paper, we introduce Controllable Gate Adapter (ConGater), a novel modular gating mechanism with adjustable sensitivity parameters, which allows for a gradual transition from the biased state of the model to the fully debiased version at inference time. We demonstrate ConGater performance by (1) conducting adversarial debiasing experiments with three different models on three classification tasks with four protected attributes, and (2) reducing the bias of search results through fairness list-wise regularization to enable adjusting a trade-off between performance and fairness metrics. Our experiments on the classification tasks show that compared to baselines of the same caliber, ConGater can maintain higher task performance while containing less information regarding the attributes. Our results on the retrieval task show that the fully debiased ConGater can achieve the same fairness performance while maintaining more than twice as high task performance than recent strong baselines. Overall, besides strong performance ConGater enables the continuous transitioning between biased and debiased states of models, enhancing personalization of use and interpretability through controllability.

翻译：语言模型的偏差缓解已成为众多研究的主题，近期研究重点在于学习独立模块（如适配器）以实现按需去偏。除了优化模块化的去偏模型外，在实际应用中，控制推理阶段的偏差降低程度通常至关重要——例如，为了在搜索结果中调整期望的性能-公平性权衡，或在分类任务中控制去偏强度。本文提出了可控门控适配器（ConGater），这是一种新颖的模块化门控机制，配备可调节的灵敏度参数，允许在推理阶段从模型的偏差状态逐步过渡到完全去偏版本。我们通过以下方式验证ConGater的性能：（1）在三个分类任务中，针对四个受保护属性，使用三种不同模型进行对抗性去偏实验；（2）通过公平性列表级正则化减少搜索结果的偏差，以支持性能与公平性指标之间的权衡调节。分类任务的实验表明，与同等水平的基线方法相比，ConGater能够在保持更高任务性能的同时，包含更少的属性相关信息。检索任务的实验结果表明，完全去偏的ConGater能够达到与近期强基线相同的公平性表现，同时任务性能高出两倍以上。总体而言，ConGater除了具备强劲性能外，还能实现模型在偏差与去偏状态之间的连续转换，通过可控性增强了使用的个性化和可解释性。