Transfer learning is beneficial by allowing the expressive features of models pretrained on large-scale datasets to be finetuned for the target task of smaller, more domain-specific datasets. However, there is a concern that these pretrained models may come with their own biases which would propagate into the finetuned model. In this work, we investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset. Under both notions of bias, we find that (1) models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset, and often with a negligible impact to performance. Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.
翻译:迁移学习具有优势,它允许在大型数据集上预训练的模型所具有的表达能力较强的特征,能够针对更小的、更具领域特异性的数据集的目标任务进行微调。然而,存在这样的担忧:这些预训练模型可能带有自身的偏见,这些偏见会传播到微调后的模型中。在本研究中,我们探讨了两种概念化下的偏见:一是目标任务与敏感属性之间的虚假相关性,二是数据集中特定群体的表征不足。在这两种偏见概念下,我们发现:(1)在预训练模型基础上微调得到的模型确实可能继承其偏见,但(2)这种偏见可以通过对微调数据集进行相对较小的干预来纠正,并且通常对性能的影响可忽略不计。我们的研究结果表明,仔细整理微调数据集对于减少下游任务中的偏见很重要,而且这样做甚至能够弥补预训练模型中的偏见。