With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different models under this counterfactual generation setting at scale, producing over 57 million responses from popular LVLMs. Our multi-dimensional analysis reveals that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of depicted individuals. We additionally explore the relationship between social bias in LVLMs and their corresponding LLMs, as well as inference-time strategies to mitigate bias.
翻译:随着具备日益强大能力的大型语言模型(LLMs)的出现,一系列大型视觉语言模型(LVLMs)被提出,旨在通过视觉输入增强LLMs的功能。此类模型基于输入图像和文本提示共同生成文本,实现了视觉问答和多模态对话等多种应用场景。尽管已有研究探讨了LLMs生成文本中包含的社会偏见,但该议题在LVLMs中尚未得到充分探索。由于文本与视觉模态信息所诱发偏见的混杂影响,检测LVLMs中的社会偏见尤为困难。为应对这一挑战,我们通过大规模反事实图像输入变化,系统研究了不同LVLMs生成文本的差异。具体而言,我们向LVLMs提供相同的开放式文本提示,同时基于不同反事实图像集进行条件生成——每个图像集在描绘共同主体(如医生)时保持高度一致性,仅改变交叉社会属性(如种族与性别)。我们在大规模反事实生成设置下全面评估了不同模型的文本输出,从主流LVLMs中收集了超过5700万条响应。多维分析表明,输入图像中描绘的种族、性别和生理特征等社会属性会显著影响毒性内容生成、能力关联词使用、有害刻板印象呈现以及对描绘对象的数值评分。此外,我们探究了LVLMs与其对应LLMs在社会偏见上的关联性,并探索了推理阶段的偏见缓解策略。