Pretrained machine learning models are known to perpetuate and even amplify existing biases in data, which can result in unfair outcomes that ultimately impact user experience. Therefore, it is crucial to understand the mechanisms behind those prejudicial biases to ensure that model performance does not result in discriminatory behaviour toward certain groups or populations. In this work, we define gender bias as our case study. We quantify bias amplification in pretraining and after fine-tuning on three families of vision-and-language models. We investigate the connection, if any, between the two learning stages, and evaluate how bias amplification reflects on model performance. Overall, we find that bias amplification in pretraining and after fine-tuning are independent. We then examine the effect of continued pretraining on gender-neutral data, finding that this reduces group disparities, i.e., promotes fairness, on VQAv2 and retrieval tasks without significantly compromising task performance.
翻译:预训练机器学习模型已知会延续甚至放大数据中存在的固有偏见,这可能导致不公平的结果,最终影响用户体验。因此,理解这些歧视性偏见背后的机制至关重要,以确保模型性能不会对特定群体或人群产生歧视性行为。本研究以性别偏见为案例,量化了三类视觉-语言模型在预训练阶段及微调后的偏见放大程度。我们探究两个学习阶段之间可能存在的关联性,并评估偏见放大如何反映在模型性能上。总体而言,我们发现预训练阶段和微调后的偏见放大相互独立。进一步研究表明,在性别中立数据上持续进行预训练能够减少群体差异(即促进公平性),同时不会显著影响VQAv2和检索任务的任务性能。