We measure cross-boundary collaboration in an open-source software (OSS) ecosystem by reconstructing the bipartite contributor-repository graph of 464 cybersecurity projects and 11,372 contributors active over October 2001-May 2022 (Rawsec Cybersecurity Inventory). Louvain community detection identifies 163 non-singleton communities; per-community contributor count scales superlinearly with repository count (n_contributors ~ n_repos^1.4), and community formation follows a logistic trajectory saturating around 2018. Three patterns support a recognition/repeat-relationship account of cross-boundary work. First, cross-community work concentrates in a thin carrier layer: only nine canonical humans span seven or more communities at the commit level, authoring 14% of 4,015 inter-community merged pull requests; the top 50 cross-community contributors produce 54%. Second, boundary friction is a recognition cost, not a fixed boundary property: inter-community pull-request acceptance rises from 42% at breadth k=1 to 87% at k=5-9, with median latency compressing from 147 h to 49 h. Third, community survival is cohort-structured: per-cohort residualisation hazard rises an order of magnitude between pre-2010 and 2018 cohorts, and external community reach predicts survival mainly through size, leaving late cohorts under-served despite a stable carrier layer. The corpus predates mainstream LLM coding assistants; this baseline of carrier-layer thinness, friction gradient, and cohort hazard informs debates on social coding as a template for digital societies and on what AI-mediated OSS ecosystems should not optimise away.
翻译:我们通过重构 2001 年 10 月至 2022 年 5 月期间活跃的 464 个网络安全项目及 11,372 名贡献者(Rawsec 网络安全库存)的二部贡献者-仓库图,衡量了开源软件(OSS)生态系统中的跨边界协作。Louvain 社区检测识别出 163 个非单例社区;每个社区的贡献者数量随仓库数量超线性增长(n_contributors ~ n_repos^1.4),社区形成遵循约于 2018 年饱和的 Logistic 轨迹。三种模式支持跨边界工作的认可/重复关系解释。首先,跨社区工作集中于薄载体层:仅有九名典型个体在提交层面跨越七个及以上社区,撰写了 4,015 个跨社区合并拉取请求中的 14%;前 50 名跨社区贡献者贡献了 54%。其次,边界摩擦是认可成本,而非固定边界属性:跨社区拉取请求接受率从广度 k=1 时的 42% 上升至 k=5-9 时的 87%,中位延迟从 147 小时压缩至 49 小时。第三,社区生存具有队列结构:每队列留存风险在 2010 年之前队列与 2018 年队列之间提升一个数量级,外部社区覆盖主要通过规模预测生存,导致晚期队列尽管有稳定载体层却服务不足。该语料库早于主流大语言模型编码助手;此载体层薄度、摩擦梯度及队列风险的基线,为关于将社会化编码作为数字社会模板以及AI中介的OSS生态系统不应优化剔除何物的辩论提供了依据。