Training large language models (LLMs) is a costly endeavour in terms of time and computational resources. The large amount of training data used during the unsupervised pre-training phase makes it difficult to verify all data and, unfortunately, undesirable data may be ingested during training. Re-training from scratch is impractical and has led to the creation of the 'unlearning' discipline where models are modified to "unlearn" undesirable information without retraining. However, any modification can alter the behaviour of LLMs, especially on key dimensions such as fairness. This is the first work that examines this interplay between unlearning and fairness for LLMs. In particular, we focus on a popular unlearning framework known as SISA [Bourtoule et al., 2021], which creates an ensemble of models trained on disjoint shards. We evaluate the performance-fairness trade-off for SISA, and empirically demsontrate that SISA can indeed reduce fairness in LLMs. To remedy this, we propose post-processing bias mitigation techniques for ensemble models produced by SISA. We adapt the post-processing fairness improvement technique from [Hardt et al., 2016] to design three methods that can handle model ensembles, and prove that one of the methods is an optimal fair predictor for ensemble of models. Through experimental results, we demonstrate the efficacy of our post-processing framework called 'FairSISA'.
翻译:训练大语言模型(LLMs)在时间和计算资源方面成本高昂。无监督预训练阶段使用的大量训练数据使得验证所有数据变得困难,不幸的是,训练过程中可能摄入不合需要的数据。从头开始重新训练不切实际,这催生了“遗忘”领域,即在不重新训练的情况下修改模型以“遗忘”不合需要的信息。然而,任何修改都可能改变LLMs的行为,尤其是在公平性等关键维度上。这是首次考察LLMs中遗忘与公平性之间相互作用的研究。具体而言,我们聚焦于一种流行的遗忘框架SISA [Bourtoule et al., 2021],该框架创建了一个在不相交数据分片上训练的模型集成。我们评估了SISA的性能-公平性权衡,并通过实验证明SISA确实会降低LLMs的公平性。为解决这一问题,我们提出了针对SISA产生的集成模型的后处理偏差缓解技术。我们改编了[Hardt et al., 2016]中的后处理公平性改进技术,设计了三种能够处理模型集成的方法,并证明其中一种方法是模型集成的最优公平预测器。通过实验结果,我们展示了名为“FairSISA”的后处理框架的有效性。