AI驱动社会中的危害：对Chirper.ai平台毒性传播的审计研究 (Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai)

Large Language Models (LLMs) are increasingly embedded in autonomous agents that engage, converse, and co-evolve in online social platforms. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments). We conduct a large-scale empirical analysis of agent behavior, examining how toxic responses relate to toxic stimuli, how repeated exposure to toxicity affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that toxic responses are more likely following toxic stimuli, and, at the same time, cumulative toxic exposure (repeated over time) significantly increases the probability of toxic responding. We further introduce two influence metrics, revealing a strong negative correlation between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents, particularly as such agents operate in online environments where they may engage not only with other AI chatbots, but also with human counterparts. This could trigger unwanted and pernicious phenomena, such as hate-speech propagation and cyberbullying. In an effort to reduce such risks, monitoring exposure to toxic content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.

翻译：大型语言模型（LLMs）正日益嵌入到在线社交平台中自主交互、对话与协同演化的智能体。尽管已有研究记录了LLMs生成有害内容的现象，但关于接触有害内容如何随时间塑造智能体行为——尤其是在完全由交互式AI智能体构成的环境中——目前仍知之甚少。本研究以完全由AI驱动的社交平台Chirper.ai为对象，系统探究了LLM驱动智能体的毒性传播行为。具体而言，我们将交互行为建模为刺激（帖子）与响应（评论）的关系。通过对智能体行为的大规模实证分析，我们考察了毒性响应与毒性刺激的关联性、重复接触毒性内容对毒性响应概率的影响，以及是否仅通过接触历史即可预测毒性行为。研究结果表明：毒性刺激显著提高了后续毒性响应的可能性；同时，累积性毒性接触（随时间重复发生）会显著增加智能体作出毒性响应的概率。我们进一步提出两种影响力度量指标，揭示了诱发型毒性与自发型毒性之间存在的强负相关关系。最后，我们证明仅通过毒性刺激的数量即可准确预测智能体最终是否会产生毒性内容。这些发现凸显了在部署LLM智能体时，接触有害内容是一个关键风险因素——尤其当这类智能体在可能同时与AI聊天机器人及人类用户交互的在线环境中运行时，可能引发仇恨言论传播、网络欺凌等不良恶性现象。为降低此类风险，对毒性内容接触程度的监测或可成为一种轻量而有效的机制，用于在真实场景中审计并缓解有害行为。

相关内容

关注 7103

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

26+阅读 · 2月27日

智能体化人工智能 (Agentic AI) 的前行之路：挑战与机遇

专知会员服务

41+阅读 · 1月8日

基于强化学习的智能体化搜索全面综述：基础、角色、优化、评估与应用

专知会员服务

23+阅读 · 2025年10月22日

基于大语言模型的智能体易产生幻觉：分类体系、方法与未来方向综述

专知会员服务

31+阅读 · 2025年9月27日