BiasEdit: Debiasing Stereotyped Language Models via Model Editing

Previous studies have established that language models manifest stereotyped biases. Existing debiasing strategies, such as retraining a model with counterfactual data, representation projection, and prompting often fail to efficiently eliminate bias or directly alter the models' biased internal representations. To address these issues, we propose BiasEdit, an efficient model editing method to remove stereotypical bias from language models through lightweight networks that act as editors to generate parameter updates. BiasEdit employs a debiasing loss guiding editor networks to conduct local edits on partial parameters of a language model for debiasing while preserving the language modeling abilities during editing through a retention loss. Experiments on StereoSet and Crows-Pairs demonstrate the effectiveness, efficiency, and robustness of BiasEdit in eliminating bias compared to tangental debiasing baselines and little to no impact on the language models' general capabilities. In addition, we conduct bias tracing to probe bias in various modules and explore bias editing impacts on different components of language models.

翻译：先前的研究已证实语言模型存在刻板偏见。现有的去偏策略，如使用反事实数据重新训练模型、表示投影和提示工程，往往无法有效消除偏见或直接改变模型带有偏见的内部表示。为解决这些问题，我们提出BiasEdit，一种高效的模型编辑方法，通过轻量级网络作为编辑器生成参数更新，从而消除语言模型中的刻板偏见。BiasEdit采用去偏损失引导编辑器网络对语言模型的部分参数进行局部编辑以实现去偏，同时通过保留损失在编辑过程中维持语言建模能力。在StereoSet和Crows-Pairs数据集上的实验表明，相较于其他去偏基线方法，BiasEdit在消除偏见方面具有高效性、有效性和鲁棒性，且对语言模型的通用能力影响甚微。此外，我们通过偏见追踪探究了不同模块中的偏见分布，并探索了偏见编辑对语言模型各组件的影响。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日