Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The proposed closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.
翻译:机器学习模型已被证明会从其训练数据集中继承偏差。这一问题对于基于从互联网抓取的未筛选数据集训练的视觉语言基础模型尤为突出,偏差可能被放大并传播至零样本分类器和文本到图像生成模型等下游应用。本研究提出一种通用方法,通过消除文本嵌入中的有偏方向来对视觉语言基础模型进行去偏。特别地,我们证明仅使用校准后的投影矩阵对文本嵌入进行去偏,即可生成鲁棒分类器和公平生成模型。所提闭式解可轻松集成至大规模流水线中,实证结果表明,该方法在无需额外数据或训练的情况下,能有效降低判别式与生成式视觉语言模型中的社会偏差和虚假关联。