Machine learning models have been shown to inherit biases from their training datasets, which can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.
翻译:机器学习模型已被证明会从其训练数据集中继承偏见,这对于基于从互联网抓取的未经过滤数据集训练的视觉-语言基础模型而言尤为严重。这些偏见可能被放大并传播至零样本分类器和文本到图像生成模型等下游应用中。在本研究中,我们提出了一种通用方法,通过投影去除文本嵌入中的有偏方向,从而对视觉-语言基础模型进行去偏处理。具体而言,我们证明仅使用校准后的投影矩阵对文本嵌入进行去偏处理,就足以生成鲁棒的分类器和公平的生成模型。该闭式解能够轻松集成至大规模流水线中,且实验结果表明,我们的方法有效减少了判别式与生成式视觉-语言模型中的社会偏见和虚假相关性,无需额外数据或训练。