LLMs have shown strong potential for automating code review, yet their practical utility depends heavily on the design of generation and context strategies. In this paper, we investigate how to improve LLM-based code review through generation strategy and contextual augmentation. We first propose an issue-list review paradigm, in which LLMs enumerate all potential issues rather than reporting only the single most important one (i.e., primary-issue review). We then systematically compare three types of code context augmentation -- neighboring, LSP-based semantics, and IR-based similar co-change context -- and study how they influence issue discovery. Finally, we integrate candidates from no-context and context-enhanced generation to improve review coverage, and introduce refinement-guided pruning to keep the candidate list at a practical size. We evaluate our approach on 1,438 Go review instances using downstream code refinement as the main metric, i.e., how often the candidate list contains at least one comment inducing the same code change as the final human revision. For comparison, we evaluate comments by CodeReviewer, a model trained specifically for review comment generation, as well as ground-truth human review comments (as a practical upper bound), under the same refinement-based evaluation. The results show that our best configuration, combining issue-list review, neighboring and similar co-change context, and candidate integration, reaches 28.00% refinement exact match, a statistically significant gain of +10.85 percentage points over primary-issue review without any additional context (17.15%), substantially outperforming CodeReviewer (15.02%) and approaching the human-oracle ceiling of 36.09%. Our refinement-guided pruning reduces the average candidate count from 7.2 to 3.1 at top-5 while retaining nearly the full benefit, making the candidate list easier to inspect.
翻译:大语言模型在自动化代码审查方面展现出强大潜力,但其实际效用高度依赖于生成与上下文策略的设计。本文研究如何通过生成策略与上下文增强来改进基于LLM的代码审查。我们首先提出一种基于问题列表的审查范式,要求LLM列举所有潜在问题而非仅报告最重要的单一问题(即主问题审查)。随后系统比较三类代码上下文增强——邻域代码、基于LSP的语义信息及基于IR的相似共变更上下文——并探究其对问题发现的影响。最后,我们将无上下文与上下文增强生成结果进行集成以提升审查覆盖率,并引入精炼引导剪枝策略使候选列表保持实用规模。我们在1,438个Go语言审查实例上评估该方法,以下游代码精炼为核心评估指标,即候选列表中至少包含一个能催生与最终人工修订完全一致代码变更的评论的频次。作为对比,我们采用相同精炼指标评估CodeReviewer(专为审查评论生成训练的模型)生成的评论及人工审查评论(作为实际基准上限)。结果表明,结合问题列表审查、邻域与相似共变更上下文及候选集集成的最佳配置,在精炼精确匹配率达28.00%,相较于无辅助上下文的主问题审查(17.15%)实现+10.85个百分点的统计显著提升,显著超越CodeReviewer(15.02%)并逼近人工预测上限(36.09%)。精炼引导剪枝策略在Top-5候选集中将平均候选数量从7.2降至3.1,同时保持近乎完整的收益,使候选列表更易审查。