Open Source Software (OSS) projects follow diverse lifecycle trajectories shaped by evolving patterns of contribution, coordination, and community engagement. Understanding these trajectories is essential for stakeholders seeking to assess project organization and health at scale. However, prior work has largely relied on static or aggregated metrics, such as project age or cumulative activity, providing limited insight into how OSS sustainability unfolds over time. In this paper, we propose a hierarchical predictive framework that models OSS projects as belonging to distinct lifecycle stages grounded in established socio-technical categorizations of OSS development. Rather than treating sustainability solely as project longevity, these lifecycle stages operationalize sustainability as a multidimensional construct integrating contribution activity, community participation, and maintenance dynamics. The framework combines engineered tabular indicators with 24-month temporal activity sequences and employs a multi-stage classification pipeline to distinguish lifecycle stages associated with different coordination and participation regimes. To support transparency, we incorporate explainable AI techniques to examine the relative contribution of feature categories to model predictions. Evaluated on a large corpus of OSS repositories, the proposed approach achieves over 94\% overall accuracy in lifecycle stage classification. Attribution analyses consistently identify contribution activity and community-related features as dominant signals, highlighting the central role of collective participation dynamics.
翻译:开源软件(OSS)项目遵循由贡献模式、协调机制和社区参与动态演变所塑造的多样化生命周期轨迹。理解这些轨迹对于需要大规模评估项目组织与健康状况的利益相关者至关重要。然而,先前研究主要依赖静态或聚合指标(如项目年龄或累积活动量),难以深入揭示OSS可持续性随时间演化的内在规律。本文提出一种层次化预测框架,将OSS项目建模为归属于基于成熟OSS开发社会技术分类体系的生命周期阶段。该框架不仅将可持续性视为项目存续时长,更通过整合贡献活动、社区参与和维护动态的多维结构来具象化各生命周期阶段的可持续性特征。该框架融合工程化表格指标与24个月时序活动序列,采用多阶段分类流程以区分关联不同协调与参与机制的生命周期阶段。为增强透明度,我们引入可解释AI技术以解析特征类别对模型预测的相对贡献度。在大型OSS代码库数据集上的评估表明,所提方法在生命周期阶段分类任务中达到超过94%的整体准确率。归因分析一致表明贡献活动与社区关联特征是主导信号,凸显了集体参与动态在可持续性评估中的核心作用。