Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology that measures the early participation of 290,255 GitHub projects during the first three months with reference to the Blumberg model, trains an XGBoost model to predict project's two-year sustained activity, and interprets the trained model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.
翻译:尽管开源模式在软件开发中具有诸多优势,但开源项目往往难以维持其长期发展。已有关于开源可持续性的研究主要聚焦于已具备一定成熟度(如拥有社区、版本发布及下游项目)的项目。然而,对尚处于初期的(可持续)开源项目发展过程的关注十分有限,我们认识到理解早期可持续性决定因素对项目发起者、孵化机构、新加入者及用户均至关重要。本文旨在探究早期参与因素与项目长期可持续性之间的关系。我们采用创新方法论,基于Blumberg模型衡量290,255个GitHub项目在头三个月的早期参与特征,训练XGBoost模型预测项目两年的持续活动状态,并运用LIME对训练模型进行可解释性分析。定量研究结果表明:若早期参与者在孵化开源软件项目方面具备先验经验,且展现出专注投入与稳定承诺,则其参与对项目未来持续活跃度具有正向促进作用。非代码贡献者的参与及详尽的贡献文档记录同样能提升项目的持续活跃度。与个人项目相比,构建由经验丰富的核心开发者与更活跃的边缘开发者组成的社区,对组织级项目尤为重要。本研究为可持续开源项目的孵化和识别提供了独特见解,其可解释性预测方法亦能为开源项目发起者与新加入者提供实践指导。