Large Language Models (LLMs) excel at code-related tasks but often struggle in realistic software repositories, where project-specific APIs and cross-file dependencies are crucial. Retrieval-augmented methods mitigate this by injecting repository context at inference time. The low inference-time latency budget affects either retrieval quality or the added latency adversely impacts user experience. We address this limitation with SpecAgent, an agent that improves both latency and code-generation quality by proactively exploring repository files during indexing and constructing speculative context that anticipates future edits in each file. This indexing-time asynchrony allows thorough context computation, masking latency, and the speculative nature of the context improves code-generation quality. Additionally, we identify the problem of future context leakage in existing benchmarks, which can inflate reported performance. To address this, we construct a synthetic, leakage-free benchmark that enables a more realistic evaluation of our agent against baselines. Experiments show that SpecAgent consistently achieves absolute gains of 9-11% (48-58% relative) compared to the best-performing baselines, while significantly reducing inference latency.
翻译:大型语言模型(LLMs)在代码相关任务中表现出色,但在真实软件仓库环境中经常难以应对项目专属API和跨文件依赖等关键挑战。检索增强方法通过在推理阶段注入仓库上下文来缓解这一问题,但低推理时延预算会导致检索质量下降或额外时延影响用户体验。我们通过SpecAgent解决了这一局限性——该代理在索引阶段主动探索仓库文件并构建可预测每个文件未来编辑的投机性上下文,从而同时提升时延与代码生成质量。这种索引阶段的异步机制允许进行全面的上下文计算并掩盖时延,而上下文的投机性特征则提升了代码生成质量。此外,我们指出现有基准测试中存在的未来上下文泄露问题,这可能导致报告性能虚高。为解决该问题,我们构建了一个无泄露的合成基准测试,从而能够更真实地评估我们的代理与基线方法的性能。实验表明,相比最优基线方法,SpecAgent在显著降低推理时延的同时,始终保持9-11%(相对提升48-58%)的绝对性能增益。