Investigating Associational Biases in Inter-Model Communication of Large Generative Models

Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demographic groups, even in the absence of explicit demographic information (e.g., associating doctors with men). These associations can persist, propagate, and potentially amplify across repeated exchanges in inter-model communication pipelines, where one generative model's output becomes another's input. This is especially salient for human-centred perception tasks, such as human activity recognition and affect prediction, where inferences about behaviour and internal states can lead to errors or stereotypical associations that propagate into unequal treatment. In this work, focusing on human activity and affective expression, we study how such associations evolve within an inter-model communication pipeline that alternates between image generation and image description. Using the RAF-DB and PHASE datasets, we quantify demographic distribution drift induced by model-to-model information exchange and assess whether these drifts are systematic using an explainability pipeline. Our results reveal demographic drifts toward younger representations for both actions and emotions, as well as toward more female-presenting representations, primarily for emotions. We further find evidence that some predictions are supported by spurious visual regions (e.g., background or hair) rather than concept-relevant cues (e.g., body or face). We also examine whether these demographic drifts translate into measurable differences in downstream behaviour, i.e., while predicting activity and emotion labels. Finally, we outline mitigation strategies spanning data-centric, training and deployment interventions, and emphasise the need for careful safeguards when deploying interconnected models in human-centred AI systems.

翻译：生成式人工智能中的社会偏见不仅表现为性能差异，还可体现为关联性偏见——即使在没有明确人口统计信息的情况下，模型仍能学习并再现概念与人口群体之间的刻板关联（例如将医生与男性关联）。在模型间通信管道中，当某个生成模型的输出成为另一模型的输入时，这些关联可能在反复交换过程中持续存在、传播甚至放大。这对于以人为中心的感知任务（如人类活动识别与情感预测）尤为显著，因为对行为及内在状态的推断可能导致错误或刻板关联，进而引发不平等待遇。本研究聚焦人类活动与情感表达，探究此类关联在图像生成与图像描述交替进行的模型间通信管道中如何演变。基于RAF-DB和PHASE数据集，我们量化了模型间信息交换引发的人口统计分布漂移，并通过可解释性管道评估这些漂移是否具有系统性。研究结果显示：在动作与情感识别中均出现向年轻化表征的人口统计漂移；情感识别中还呈现向女性化表征的漂移。进一步研究发现，部分预测依赖于虚假视觉区域（如背景或头发）而非概念相关线索（如身体或面部）。我们还检验了这些人口统计漂移是否转化为下游行为中的可测量差异（即预测活动与情感标签时）。最后，我们提出涵盖数据层面、训练阶段与部署阶段的缓解策略，并强调在以人为中心的人工智能系统中部署互联模型时需建立审慎的防护机制。