Traceability links are key information sources for software developers, connecting software artifacts (e.g., linking requirements to the corresponding source code). In open-source software (OSS) projects, such links play an important role, particularly between the contributions (e.g., GitHub issues) and the corresponding source code. Through these links, developers can trace the discussions in contributions and uncover design rationales, constraints, and security concerns. Previous studies have mainly examined accepted contributions, while those declined after discussion have been overlooked. The discussions behind declined contributions contain valuable design rationales and implicit knowledge about software decision-making, as the reasons behind the decline often reveal the criteria used to judge what should or should not be implemented. In this study, we present the first attempt to establish traceability links between declined contributions and related source code. We propose an initial linking approach and conduct an empirical analysis of the generated links to discuss factors affecting link generation. As our dataset, we use proposals from the official Go repository, which are GitHub issues used to propose new features or language changes. To link declined proposals to source code, we designed an LLM-driven pipeline. Our results showed that the pipeline selected the correct granularity for each declined proposal with an accuracy of 0.836, and generated correct links at that granularity with a mean precision of 0.643. To clarify the challenges of linking declined proposals, we performed a failure analysis. In the declined proposals where the pipeline failed to generate links, the discussions were often redundant and lacked concrete information (e.g., how the feature should be implemented).
翻译:可追踪链接是软件开发者的关键信息来源,其将软件制品(例如需求与对应源代码)相互关联。在开源软件项目中,此类链接发挥着重要作用,尤其在贡献(如 GitHub issue)与对应源代码之间。通过这些链接,开发者能够追溯贡献中的讨论,并揭示设计原理、约束条件及安全考量。先前研究主要关注已接受的贡献,而讨论后被拒绝的贡献则被忽视。被拒贡献背后的讨论蕴含着宝贵的设计原理和关于软件决策的隐性知识,因为拒绝背后的原因往往揭示了用于判断应实施或不应实施内容的标准。在本研究中,我们首次尝试在被拒贡献与相关源代码之间建立可追踪链接。我们提出了一种初始关联方法,并对生成的链接进行了实证分析,以探讨影响链接生成的因素。作为数据集,我们使用了官方 Go 代码库中的提案,这些提案是用于提出新功能或语言变更的 GitHub issue。为将被拒提案关联至源代码,我们设计了一个基于大语言模型的流程。结果显示,该流程为每个被拒提案选择正确粒度的准确率为 0.836,并在该粒度下生成正确链接的平均精确率为 0.643。为阐明关联被拒提案的挑战,我们进行了失败案例分析。在流程未能生成链接的被拒提案中,讨论往往冗余且缺乏具体信息(例如功能应如何实现)。