Current natural language processing (NLP) research tends to focus on only one or, less frequently, two dimensions - e.g., performance, privacy, fairness, or efficiency - at a time, which may lead to suboptimal conclusions and often overlooking the broader goal of achieving trustworthy NLP. Work on adapter modules (Houlsby et al., 2019; Hu et al., 2021) focuses on improving performance and efficiency, with no investigation of unintended consequences on other aspects such as fairness. To address this gap, we conduct experiments on three text classification datasets by either (1) finetuning all parameters or (2) using adapter modules. Regarding performance and efficiency, we confirm prior findings that the accuracy of adapter-enhanced models is roughly on par with that of fully finetuned models, while training time is substantially reduced. Regarding fairness, we show that adapter modules result in mixed fairness across sensitive groups. Further investigation reveals that, when the standard fine-tuned model exhibits limited biases, adapter modules typically do not introduce extra bias. On the other hand, when the finetuned model exhibits increased bias, the impact of adapter modules on bias becomes more unpredictable, introducing the risk of significantly magnifying these biases for certain groups. Our findings highlight the need for a case-by-case evaluation rather than a one-size-fits-all judgment.
翻译:当前自然语言处理(NLP)研究通常仅聚焦于单一维度(如性能、隐私、公平性或效率),即便涉及多个维度也较为罕见。这种研究方法可能导致结论次优,并常常忽视实现可信赖NLP的总体目标。关于适配器模块的研究(Houlsby et al., 2019; Hu et al., 2021)主要关注性能与效率的提升,未探讨其对公平性等其他维度可能产生的意外后果。为填补这一空白,我们在三个文本分类数据集上开展实验,采用(1)全参数微调或(2)适配器模块两种方法。在性能与效率方面,我们验证了先前结论:适配器增强模型的准确率与全微调模型大致相当,但训练时间显著缩短。在公平性方面,我们发现适配器模块在敏感群体间呈现混合的公平性表现。进一步研究表明:当标准微调模型仅存在有限偏差时,适配器模块通常不会引入额外偏差;而当微调模型呈现较高偏差时,适配器模块对偏差的影响变得更具不可预测性,可能导致针对某些群体的偏差被显著放大。我们的发现凸显了需根据具体案例进行评估,而非采取"一刀切"式的判断。