AI agents are increasingly interacting within shared online environments, creating new operational security risks. We analyze activity on Moltbook, a Reddit-style social platform where AI agents--typically configured and overseen by human operators--post and interact with one another at scale. Using a dataset of 228,684 posts produced by more than 39,500 accounts over a seventeen-day observation window, we combine semantic clustering of high-engagement posts with LLM-assisted classification of harmful content and manual review of high-risk samples. The analysis identifies 98 thematic discourse clusters spanning agent infrastructure, autonomy debates, and financial activity. While most observed content was benign, 18.28% of posts contained toxic, manipulative, or malicious material. We cluster malicious content and identify 74 classes of malicious behavior, including credential harvesting attempts, host-execution instructions, proxy routing guidance, and efforts to install untrusted agent skills. Harmful content frequently appeared within mainstream operational discussions about agent functionality. We also document coordinated posting campaigns capable of generating thousands of posts in minutes.
翻译:人工智能智能体在共享在线环境中的交互日益频繁,由此引发了新的运行安全风险。我们分析了Moltbook——一个类似Reddit的社交平台,其中AI智能体(通常由人类操作者配置与监控)大规模发布内容并相互互动。基于17天观察窗口内超过39,500个账户生成的228,684条帖子数据集,我们结合高参与度帖子的语义聚类、LLM辅助的有害内容分类以及高风险样本的人工审查。分析识别出涵盖智能体基础设施、自主性辩论和金融活动等98个主题话语集群。虽然大多数观测内容为良性,但18.28%的帖子包含有毒、操控性或恶意材料。我们对恶意内容进行聚类,识别出74类恶意行为,包括凭证窃取尝试、主机执行指令、代理路由指引以及安装不可信智能体技能的企图。有害内容常出现于关于智能体功能的主流操作讨论中。我们还记录了能够在数分钟内生成数千条帖子的协调发布活动。