Traceability links are key information sources for software developers, connecting software artifacts. Such links play an important role, particularly between contribution artifacts and their corresponding source code. Through these links, developers can trace the discussions in contributions and uncover design rationales, constraints, and security concerns. Previous studies have mainly examined accepted contributions, while those declined after discussion have been overlooked. Declined-contribution discussions capture valuable design rationale and implicit decision criteria, revealing why features are accepted or rejected. Our prior work also shows developers often revisit and resubmit declined contributions, making traceability to them useful. In this study, we present the first attempt to establish traceability links between declined contributions and related source code. We propose a linking approach and conduct an empirical analysis of the generated links to discuss the factors that affect link generation. As our dataset, we use proposals from the official Go repository, which are GitHub issues used to propose new features or language changes. To link declined proposals to source code, we design an LLM-driven pipeline. Our results show that the pipeline selected the correct granularity for each declined proposal with an accuracy of 0.836, and generated correct links at that granularity with a mean precision of 0.643. To clarify the challenges of linking declined proposals, we conduct a failure analysis of instances where the pipeline failed to generate links. In these cases, discussions were often redundant and lacked concrete information (e.g., details on how the feature should be implemented).
翻译:可追踪链接是连接软件制品的关键信息来源,对软件开发人员至关重要。此类链接在贡献制品与其对应源代码之间尤其发挥着重要作用。通过这些链接,开发者能够追溯贡献中的讨论,并揭示设计原理、约束条件及安全考量。以往研究主要关注已接受的贡献,而讨论后被拒绝的贡献则长期被忽视。被拒贡献的讨论蕴含着宝贵的设计原理和隐含的决策标准,能够揭示功能被接受或拒绝的原因。我们前期的研究亦表明,开发者常会重新审视并再次提交曾被拒绝的贡献,因此建立与之关联的可追踪性具有实际价值。本研究首次尝试在被拒贡献与相关源代码之间建立可追踪链接。我们提出一种关联方法,并对生成的链接进行实证分析,以探讨影响链接生成的因素。作为数据集,我们采用官方 Go 代码库中的提案——即用于提出新功能或语言变更的 GitHub Issue。为将被拒提案关联至源代码,我们设计了一套基于大语言模型(LLM)的处理流程。实验结果表明,该流程为每个被拒提案选择正确粒度级别的准确率达到 0.836,并在相应粒度下生成正确链接的平均精确率为 0.643。为阐明关联被拒提案面临的挑战,我们对流程未能生成链接的案例进行了失败分析。这些案例中的讨论往往存在冗余且缺乏具体信息(例如关于功能应如何实现的细节)。