Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth

Humans and large language models (LLMs) now co-produce and co-consume the web's shared knowledge archives. Such human-AI collective knowledge ecosystems contain feedback loops with both benefits (e.g., faster growth, easier learning) and systemic risks (e.g., quality dilution, skill reduction, model collapse). To understand such phenomena, we propose a minimal, interpretable dynamical model of the co-evolution of archive size, archive quality, model (LLM) skill, aggregate human skill, and query volume. The model captures two content inflows (human, LLM) controlled by a gate on LLM-content admissions, two learning pathways for humans (archive study vs. LLM assistance), and two LLM-training modalities (corpus-driven scaling vs. learning from human feedback). Through numerical experiments, we identify different growth regimes (e.g., healthy growth, inverted flow, inverted learning, oscillations), and show how platform and policy levers (gate strictness, LLM training, human learning pathways) shift the system across regime boundaries. Two domain configurations (PubMed, GitHub and Copilot) illustrate contrasting steady states under different growth rates and moderation norms. We also fit the model to Wikipedia's knowledge flow during pre-ChatGPT and post-ChatGPT eras separately. We find a rise in LLM additions with a concurrent decline in human inflow, consistent with a regime identified by the model. Our model and analysis yield actionable insights for sustainable growth of human-AI collective knowledge on the Web.

翻译：人类与大型语言模型（LLM）如今共同生产并共同消费网络上的共享知识档案。此类人机集体知识生态系统包含具有双重影响的反馈循环：既有益处（例如更快的增长、更易的学习），也存在系统性风险（例如质量稀释、技能减退、模型崩溃）。为理解此类现象，我们提出了一个最小化、可解释的动态模型，用于描述档案规模、档案质量、模型（LLM）技能、人类总体技能及查询量的协同演化。该模型捕捉了两种内容流入（人类、LLM），其受控于LLM内容准入门限；两种人类学习途径（档案研习 vs. LLM辅助）；以及两种LLM训练模式（语料驱动扩展 vs. 从人类反馈中学习）。通过数值实验，我们识别出不同的增长状态（例如健康增长、逆向流动、逆向学习、振荡），并展示了平台与政策杠杆（门限严格度、LLM训练、人类学习途径）如何推动系统跨越状态边界。两个领域配置（PubMed、GitHub与Copilot）阐释了在不同增长率和审核规范下的对比稳态。我们还分别将该模型拟合至维基百科在ChatGPT前时代与后时代的知识流动。我们发现LLM贡献增加的同时人类流入下降，这与模型识别出的一个状态相符。我们的模型与分析为网络中人机集体知识的可持续增长提供了可操作的洞见。