Building Digital Societies as Ecosystems: How Recognition and Repeat Relationships Sustain Cross-Community Work in Open Source

from arxiv, 52 pages (main text + supplementary material), 5 main figures, 13 supplementary figures, 2 main tables. Submitted to EPJ Data Science. Data and code: https://doi.org/10.17605/OSF.IO/5RWEK

We measure cross-boundary collaboration in an open-source software (OSS) ecosystem by reconstructing the bipartite contributor-repository graph of 464 cybersecurity projects and 11,372 contributors active over October 2001-May 2022 (Rawsec Cybersecurity Inventory). Louvain community detection identifies 163 non-singleton communities; per-community contributor count scales superlinearly with repository count (n_contributors ~ n_repos^1.4), and community formation follows a logistic trajectory saturating around 2018. Three patterns support a recognition/repeat-relationship account of cross-boundary work. First, cross-community work concentrates in a thin carrier layer: only nine canonical humans span seven or more communities at the commit level, authoring 14% of 4,015 inter-community merged pull requests; the top 50 cross-community contributors produce 54%. Second, boundary friction is a recognition cost, not a fixed boundary property: inter-community pull-request acceptance rises from 42% at breadth k=1 to 87% at k=5-9, with median latency compressing from 147 h to 49 h. Third, community survival is cohort-structured: per-cohort residualisation hazard rises an order of magnitude between pre-2010 and 2018 cohorts, and external community reach predicts survival mainly through size, leaving late cohorts under-served despite a stable carrier layer. The corpus predates mainstream LLM coding assistants; this baseline of carrier-layer thinness, friction gradient, and cohort hazard informs debates on social coding as a template for digital societies and on what AI-mediated OSS ecosystems should not optimise away.

翻译：我们通过重构 2001 年 10 月至 2022 年 5 月期间活跃的 464 个网络安全项目及 11,372 名贡献者（Rawsec 网络安全库存）的二部贡献者-仓库图，衡量了开源软件（OSS）生态系统中的跨边界协作。Louvain 社区检测识别出 163 个非单例社区；每个社区的贡献者数量随仓库数量超线性增长（n_contributors ~ n_repos^1.4），社区形成遵循约于 2018 年饱和的 Logistic 轨迹。三种模式支持跨边界工作的认可/重复关系解释。首先，跨社区工作集中于薄载体层：仅有九名典型个体在提交层面跨越七个及以上社区，撰写了 4,015 个跨社区合并拉取请求中的 14%；前 50 名跨社区贡献者贡献了 54%。其次，边界摩擦是认可成本，而非固定边界属性：跨社区拉取请求接受率从广度 k=1 时的 42% 上升至 k=5-9 时的 87%，中位延迟从 147 小时压缩至 49 小时。第三，社区生存具有队列结构：每队列留存风险在 2010 年之前队列与 2018 年队列之间提升一个数量级，外部社区覆盖主要通过规模预测生存，导致晚期队列尽管有稳定载体层却服务不足。该语料库早于主流大语言模型编码助手；此载体层薄度、摩擦梯度及队列风险的基线，为关于将社会化编码作为数字社会模板以及AI中介的OSS生态系统不应优化剔除何物的辩论提供了依据。