Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our module ChunkShapley constructs offline labels by (i) single-chunk probing with teacher-forced likelihood to estimate signed, weighted effects, (ii) a surrogate game that captures saturation and interference, (iii) exact Shapley computation for small retrieval sets, and (iv) bounded post-verification that selects a decoding-optimal coalition using the frozen generator. We distill verified $KEEP$ or $DROP$ decisions and retrieval triggering into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval. Code: https://anonymous.4open.science/r/a7f3c9.
翻译:仓库级代码补全得益于检索增强生成(RAG)技术。然而,控制跨文件证据是困难的,因为代码块的效用通常是交互依赖的:某些代码片段仅在与其他互补上下文配对时才有帮助,而另一些则在发生冲突时损害解码过程。我们提出了RepoShapley,这是一个基于Shapley式边际贡献监督的、具备联盟感知能力的上下文过滤框架。我们的模块ChunkShapley通过以下步骤构建离线标签:(i) 使用教师强制似然进行单代码块探测,以估计带符号的加权效应;(ii) 构建一个捕捉饱和与干扰的代理博弈;(iii) 对小型检索集进行精确的Shapley值计算;以及(iv) 通过有界后验证,利用冻结的生成器选择解码最优的联盟。我们将经过验证的$KEEP$或$DROP$决策以及检索触发机制,通过离散控制令牌蒸馏到单一模型中。在多个基准测试和骨干模型上的实验表明,RepoShapley在提升补全质量的同时,减少了有害上下文和不必要的检索。代码:https://anonymous.4open.science/r/a7f3c9。