While stereotypes are well-documented in human social interactions, AI systems are often presumed to be less susceptible to such biases. Previous studies have focused on biases inherited from training data, but whether stereotypes can emerge spontaneously in AI agent interactions merits further exploration. Through a novel experimental framework simulating workplace interactions with neutral initial conditions, we investigate the emergence and evolution of stereotypes in LLM-based multi-agent systems. Our findings reveal that (1) LLM-Based AI agents develop stereotype-driven biases in their interactions despite beginning without predefined biases; (2) stereotype effects intensify with increased interaction rounds and decision-making power, particularly after introducing hierarchical structures; (3) these systems exhibit group effects analogous to human social behavior, including halo effects, confirmation bias, and role congruity; and (4) these stereotype patterns manifest consistently across different LLM architectures. Through comprehensive quantitative analysis, these findings suggest that stereotype formation in AI systems may arise as an emergent property of multi-agent interactions, rather than merely from training data biases. Our work underscores the need for future research to explore the underlying mechanisms of this phenomenon and develop strategies to mitigate its ethical impacts.
翻译:尽管刻板印象在人类社会互动中已有充分记载,人工智能系统通常被认为不易受此类偏见影响。先前研究主要关注从训练数据中继承的偏见,但刻板印象能否在AI智能体互动中自发产生值得进一步探索。通过一个模拟职场互动、具有中性初始条件的新型实验框架,我们研究了基于大语言模型的多智能体系统中刻板印象的涌现与演化。我们的研究发现:(1) 基于大语言模型的AI智能体在互动中会形成刻板印象驱动的偏见,尽管初始状态并无预设偏见;(2) 刻板印象效应随着互动轮次和决策权力的增加而加剧,在引入层级结构后尤为明显;(3) 这些系统表现出与人类社会行为类似的群体效应,包括光环效应、确认偏误和角色一致性;(4) 这些刻板印象模式在不同大语言模型架构中均稳定呈现。通过全面的定量分析,这些发现表明AI系统中的刻板印象形成可能是多智能体互动中涌现的特性,而不仅仅是训练数据偏见的产物。我们的研究强调未来需要探索这一现象的内在机制,并制定减轻其伦理影响的策略。