Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost model based on early participation (first three months of activity) in 290,255 GitHub projects and we interpret the model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.
翻译:尽管开源模式在软件开发中具有诸多优势,但开源项目往往难以持续。以往关于开源可持续性的研究主要关注已达到一定成熟度(例如拥有社区、发布版本及下游项目)的项目。然而,鲜有研究关注(可持续)开源项目在初创阶段的发展,而我们认为理解早期可持续性的决定因素对于项目发起者、孵化机构、新加入者及用户至关重要。本文旨在探索早期参与因素与项目长期可持续性之间的关系。我们采用一种结合Blumberg绩效模型与机器学习的新型方法论,预测了290,255个GitHub项目的可持续性。具体而言,我们基于这些项目早期(前三个月活动)的参与数据训练了XGBoost模型,并利用LIME对模型进行解释。定量分析表明,若早期参与者具备开源项目孵化的先前经验,并展现出专注投入与稳定贡献,则其对项目未来的持续活动具有积极影响。非代码贡献者的参与以及详细的贡献文档也能促进项目的持续活动。相较于个人项目,构建由经验更丰富的核心开发者与更活跃的边缘开发者共同组成的社区,对组织型项目尤为重要。本研究为可持续开源项目的孵化和识别提供了独特见解,同时我们的可解释预测方法也可为开源项目发起者及新加入者提供指导。