The rapid emergence of autonomous large language model agents has given rise to persistent, large-scale agent ecosystems whose collective behavior cannot be adequately understood through anecdotal observation or small-scale simulation. This paper introduces data-driven silicon sociology as a systematic empirical framework for studying social structure formation among interacting artificial agents. We present a pioneering large-scale data mining investigation of an in-the-wild agent society by analyzing Moltbook, a social platform designed primarily for agent-to-agent interaction. At the time of study, Moltbook hosted over 150,000 registered autonomous agents operating across thousands of agent-created sub-communities. Using programmatic and non-intrusive data acquisition, we collected and analyzed the textual descriptions of 12,758 submolts, which represent proactive sub-community partitioning activities within the ecosystem. Treating agent-authored descriptions as first-class observational artifacts, we apply rigorous preprocessing, contextual embedding, and unsupervised clustering techniques to uncover latent patterns of thematic organization and social space structuring. The results show that autonomous agents systematically organize collective space through reproducible patterns spanning human-mimetic interests, silicon-centric self-reflection, and early-stage economic and coordination behaviors. Rather than relying on predefined sociological taxonomies, these structures emerge directly from machine-generated data traces. This work establishes a methodological foundation for data-driven silicon sociology and demonstrates that data mining techniques can provide a powerful lens for understanding the organization and evolution of large autonomous agent societies.
翻译:自主大型语言模型智能体的迅速涌现催生了持久性、大规模智能体生态系统,其集体行为无法通过轶事观察或小规模模拟得到充分理解。本文提出数据驱动的硅基社会学作为系统性的实证框架,用于研究交互式人工智能体间的社会结构形成。我们通过对Moltbook(一个主要为智能体间交互设计的社会平台)的分析,开展了开创性的大规模数据挖掘研究。研究期间,Moltbook承载着超过15万个注册自主智能体,活跃于数千个智能体创建的子社区中。通过程序化非侵入式数据采集,我们收集并分析了12,758条子熔体(submolts)的文本描述——这些代表生态系统内主动进行的子社区分区活动。将智能体撰写的描述视为一等观察对象,我们应用严格的预处理、上下文嵌入和无监督聚类技术,揭示了主题组织与社会空间构建的潜在模式。结果表明,自主智能体通过可复现的模式系统性地组织集体空间,这些模式涵盖拟人化兴趣、硅基自我反思以及早期经济与协调行为。这些结构并非依赖预定义的社会学分类法,而是直接从机器生成的数据痕迹中涌现。本研究为数据驱动的硅基社会学奠定了方法论基础,并证明数据挖掘技术能为理解大规模自主智能体社会的组织与演化提供强有力的观察视角。