R2ComSync：利用上下文学习与重排序改进代码-注释同步 (R2ComSync: Improving Code-Comment Synchronization with In-Context Learning and Reranking)

Code-Comment Synchronization (CCS) aims to synchronize the comments with code changes in an automated fashion, thereby significantly reducing the workload of developers during software maintenance and evolution. While previous studies have proposed various solutions that have shown success, they often exhibit limitations, such as a lack of generalization ability or the need for extensive task-specific learning resources. This motivates us to investigate the potential of Large Language Models (LLMs) in this area. However, a pilot analysis proves that LLMs fall short of State-Of-The-Art (SOTA) CCS approaches because (1) they lack instructive demonstrations for In-Context Learning (ICL) and (2) many correct-prone candidates are not prioritized.To tackle the above challenges, we propose R2ComSync, an ICL-based code-Comment Synchronization approach enhanced with Retrieval and Re-ranking. Specifically, R2ComSync carries corresponding two novelties: (1) Ensemble hybrid retrieval. It equally considers the similarity in both code-comment semantics and change patterns when retrieval, thereby creating ICL prompts with effective examples. (2) Multi-turn re-ranking strategy. We derived three significant rules through large-scale CCS sample analysis. Given the inference results of LLMs, it progressively exploits three re-ranking rules to prioritize relatively correct-prone candidates. We evaluate R2ComSync using five recent LLMs on three CCS datasets covering both Java and Python programming languages, and make comparisons with five SOTA approaches. Extensive experiments demonstrate the superior performance of R2ComSync against other approaches. Moreover, both quantitative and qualitative analyses provide compelling evidence that the comments synchronized by our proposal exhibit significantly higher quality.}

翻译：代码-注释同步（CCS）旨在以自动化方式使注释与代码变更保持同步，从而显著减轻开发人员在软件维护与演进过程中的工作量。尽管先前研究已提出多种成功解决方案，但它们通常存在局限性，例如泛化能力不足或需要大量任务特定的学习资源。这促使我们探索大型语言模型（LLMs）在该领域的潜力。然而，初步分析证明LLMs未能达到最先进（SOTA）CCS方法的水平，原因在于：（1）缺乏适用于上下文学习（ICL）的指导性示例；（2）大量易正确的候选结果未被优先排序。为应对上述挑战，我们提出R2ComSync——一种基于ICL并融合检索与重排序机制的代码-注释同步方法。具体而言，R2ComSync包含两大创新点：（1）集成混合检索。在检索过程中同等考量代码-注释语义与变更模式的相似性，从而构建包含有效示例的ICL提示。（2）多轮次重排序策略。通过大规模CCS样本分析，我们推导出三条重要规则。基于LLMs的推理结果，该方法逐步运用三重排序规则对相对易正确的候选结果进行优先排序。我们在涵盖Java和Python编程语言的三个CCS数据集上，使用五种最新LLMs对R2ComSync进行评估，并与五种SOTA方法进行比较。大量实验证明R2ComSync相较于其他方法具有优越性能。此外，定量与定性分析均提供有力证据，表明本方案同步生成的注释具有显著更高的质量。