Pre-trained foundation models can be adapted for specific tasks using Low-Rank Adaptation (LoRA). However, the fairness properties of these adapted classifiers remain underexplored. Existing fairness-aware fine-tuning methods rely on direct access to sensitive attributes or their predictors, but in practice, these sensitive attributes are often held under strict consumer privacy controls, and neither the attributes nor their predictors are available to model developers, hampering the development of fair models. To address this issue, we introduce a set of LoRA-based fine-tuning methods that can be trained in a distributed fashion, where model developers and fairness auditors collaborate without sharing sensitive attributes or predictors. In this paper, we evaluate three such methods - sensitive unlearning, adversarial training, and orthogonality loss - against a fairness-unaware baseline, using experiments on the CelebA and UTK-Face datasets with an ImageNet pre-trained ViT-Base model. We find that orthogonality loss consistently reduces bias while maintaining or improving utility, whereas adversarial training improves False Positive Rate Parity and Demographic Parity in some cases, and sensitive unlearning provides no clear benefit. In tasks where significant biases are present, distributed fairness-aware fine-tuning methods can effectively eliminate bias without compromising consumer privacy and, in most cases, improve model utility.
翻译:预训练基础模型可通过低秩适应(LoRA)技术适配特定任务。然而,这些适配后分类器的公平性特性仍未得到充分探索。现有的公平性感知微调方法依赖于直接访问敏感属性或其预测因子,但在实际应用中,这些敏感属性通常受到严格的消费者隐私管控,模型开发者既无法获取属性本身也无法获得其预测因子,这阻碍了公平模型的开发。为解决这一问题,我们提出了一系列基于LoRA的分布式微调方法,使模型开发者与公平性审计方能够在无需共享敏感属性或预测因子的情况下协同工作。本文通过在CelebA和UTK-Face数据集上使用ImageNet预训练的ViT-Base模型进行实验,评估了三种此类方法——敏感遗忘、对抗训练和正交性损失——并与未考虑公平性的基线方法进行比较。研究发现:正交性损失在维持或提升模型效用的同时能持续降低偏差;对抗训练在某些情况下可改善假正率均衡与人口统计均衡;而敏感遗忘则未表现出明确优势。在存在显著偏差的任务中,分布式公平性感知微调方法能在不损害消费者隐私的前提下有效消除偏差,且在多数情况下能提升模型效用。