Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

Open Source Software (OSS) projects follow diverse lifecycle trajectories shaped by evolving patterns of contribution, coordination, and community engagement. Understanding these trajectories is essential for stakeholders seeking to assess project organization and health at scale. However, prior work has largely relied on static or aggregated metrics, such as project age or cumulative activity, providing limited insight into how OSS sustainability unfolds over time. In this paper, we propose a hierarchical predictive framework that models OSS projects as belonging to distinct lifecycle stages grounded in established socio-technical categorizations of OSS development. Rather than treating sustainability solely as project longevity, these lifecycle stages operationalize sustainability as a multidimensional construct integrating contribution activity, community participation, and maintenance dynamics. The framework combines engineered tabular indicators with 24-month temporal activity sequences and employs a multi-stage classification pipeline to distinguish lifecycle stages associated with different coordination and participation regimes. To support transparency, we incorporate explainable AI techniques to examine the relative contribution of feature categories to model predictions. Evaluated on a large corpus of OSS repositories, the proposed approach achieves over 94\% overall accuracy in lifecycle stage classification. Attribution analyses consistently identify contribution activity and community-related features as dominant signals, highlighting the central role of collective participation dynamics.

翻译：开源软件（OSS）项目遵循由贡献模式、协调机制和社区参与动态演变所塑造的多样化生命周期轨迹。理解这些轨迹对于需要大规模评估项目组织与健康状况的利益相关者至关重要。然而，先前研究主要依赖静态或聚合指标（如项目年龄或累积活动量），难以深入揭示OSS可持续性随时间演化的内在规律。本文提出一种层次化预测框架，将OSS项目建模为归属于基于成熟OSS开发社会技术分类体系的生命周期阶段。该框架不仅将可持续性视为项目存续时长，更通过整合贡献活动、社区参与和维护动态的多维结构来具象化各生命周期阶段的可持续性特征。该框架融合工程化表格指标与24个月时序活动序列，采用多阶段分类流程以区分关联不同协调与参与机制的生命周期阶段。为增强透明度，我们引入可解释AI技术以解析特征类别对模型预测的相对贡献度。在大型OSS代码库数据集上的评估表明，所提方法在生命周期阶段分类任务中达到超过94%的整体准确率。归因分析一致表明贡献活动与社区关联特征是主导信号，凸显了集体参与动态在可持续性评估中的核心作用。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《革命性软件智能：融合神经程序合成、量子安全运维与可解释人工智能的下一代自主系统统一框架》最新报告

专知会员服务

26+阅读 · 2025年8月28日

《深度学习在时间序列预测中的应用：综述》

专知会员服务

30+阅读 · 2025年3月14日