Leveraging Language Models to Discover Evidence-Based Actions for OSS Sustainability

When successful, Open Source Software (OSS) projects create enormous value, but most never reach a sustainable state. Recent work has produced accurate models that forecast OSS sustainability, yet these models rarely tell maintainers what to do: their features are often high-level socio-technical signals that are not directly actionable. Decades of empirical software engineering research have accumulated a large but underused body of evidence on concrete practices that improve project health. We close this gap by using LLMs as evidence miners over the SE literature. We design a RAG-pipeline and a two-layer prompting strategy that extract researched actionables (ReACTs): concise, evidence-linked recommendations mapping to specific OSS practices. In the first layer, we systematically explore open LLMs and prompting techniques, selecting the best-performing combination to derive candidate ReACTs from 829 ICSE and FSE papers. In the second layer, we apply follow-up prompting to filter hallucinations, extract impact and evidence, and assess soundness and precision. Our pipeline yields 1,922 ReACTs, of which 1,312 pass strict quality criteria and are organized into practice-oriented categories connectable to project signals from tools like APEX. The result is a reproducible, scalable approach turning scattered research findings into structured, evidence-based actions guiding OSS projects toward sustainability.

翻译：开源软件（OSS）项目若取得成功，将创造巨大价值，但大多数项目从未达到可持续状态。近期研究已开发出能够准确预测开源软件可持续性的模型，但这些模型很少告知维护者应采取何种具体行动：其特征通常是高层次的社会技术信号，难以直接转化为实践。数十年的实证软件工程研究已积累了大量关于改善项目健康状况的具体实践证据，但这些证据尚未得到充分利用。我们通过利用大型语言模型（LLM）作为软件工程文献的证据挖掘工具来弥合这一差距。我们设计了一个检索增强生成（RAG）流程和双层提示策略，用于提取经过研究验证的可执行建议（ReACTs）：这些建议简洁、与证据关联，并映射到具体的开源软件实践。在第一层中，我们系统性地探索了开源大型语言模型及提示技术，选择性能最佳的组合从829篇ICSE和FSE论文中提取候选ReACTs。在第二层中，我们应用后续提示来过滤幻觉内容，提取影响与证据，并评估建议的合理性与精确度。我们的流程最终生成了1,922条ReACTs，其中1,312条通过了严格的质量标准，并被组织成以实践为导向的类别，可与APEX等工具中的项目信号相连接。这一成果提供了一种可复现、可扩展的方法，将分散的研究发现转化为结构化、基于证据的行动指南，助力开源软件项目走向可持续。