Software libraries are the elementary building blocks of open source software ecosystems, extending the capabilities of programming languages beyond their standard libraries. Although ecosystem health is often quantified using data on libraries and their interdependencies, we know little about the rate at which new libraries are developed and used. Here we study imports of libraries in 12 different programming language ecosystems within millions of Stack Overflow posts over a 15 year period. New libraries emerge at a remarkably predictable sub-linear rate within ecosystems per post. As a consequence, the distribution of the frequency of use of libraries in all ecosystems is highly concentrated: the most widely used libraries are used many times more often than the average. Although new libraries come out more slowly over time, novel combinations of libraries appear at an approximately linear rate, suggesting that recombination is a key innovation process in software. Newer users are more likely to use new libraries and new combinations, and we find significant variation in the rates of innovation between countries. Our work links the evolution of OSS ecosystems to the literature on the dynamics of innovation, revealing how ecosystems grow and highlighting implications for sustainability.
翻译:软件库是开源软件生态系统的基本构建模块,它们将编程语言的功能扩展到标准库之外。尽管生态系统健康状况通常通过库及其相互依赖关系的数据进行量化,但我们对新库的开发和使用速率知之甚少。本文通过分析15年间数百万条Stack Overflow帖子,研究了12种不同编程语言生态系统中库的导入情况。新库在每个生态系统内按帖子数量计算的出现速率呈现出显著可预测的亚线性规律。因此,所有生态系统中库的使用频率分布高度集中:使用最广泛的库其使用次数远高于平均水平。尽管新库随时间推移出现速度逐渐放缓,但库的新颖组合以近似线性的速率出现,这表明重组是软件创新的关键过程。新用户更倾向于使用新库和新组合,并且我们发现不同国家间的创新速率存在显著差异。本研究将OSS生态系统的演化与创新动力学文献联系起来,揭示了生态系统的增长方式并强调了其对可持续性的启示。