We present Agentic Retrieval-Augmented Code Synthesis (ARCS), a system that improves LLM-based code generation without fine-tuning. ARCS operates through a budgeted synthesize-execute-repair loop over a frozen model: it retrieves relevant code context before generation, proposes candidates, executes them against tests, and repairs based on execution feedback. This retrieval-before-generation design reduces hallucination and accelerates convergence. We formalize ARCS as a state-action process with provable guarantees on termination, monotonic improvement, and bounded cost. A tiered controller (Small/Medium/Large) trades latency for accuracy predictably. On HumanEval, ARCS achieves up to 87.2% pass@1 with Llama-3.1-405B, surpassing CodeAgent (82.3%) while using simpler control than tree-search methods. On TransCoder, it achieves >= 90% accuracy on most translation pairs. On a LANL scientific corpus, it improves CodeBLEU by +0.115 over baseline RAG. ARCS provides a practical, reproducible approach to reliable code synthesis using existing LLM checkpoints.
翻译:本文提出智能检索增强代码合成系统(ARCS),该系统无需微调即可改进基于大语言模型的代码生成。ARCS通过冻结模型上的预算化"合成-执行-修复"循环运行:在生成前检索相关代码上下文,提出候选方案,通过测试执行验证,并根据执行反馈进行修复。这种"先检索后生成"的设计减少了幻觉现象并加速了收敛过程。我们将ARCS形式化为具有可证明保证的状态-动作过程,确保其终止性、单调改进性和有界成本。分级控制器(小/中/大型)可预测性地权衡延迟与准确率。在HumanEval基准测试中,ARCS使用Llama-3.1-405B实现了87.2%的pass@1准确率,超越CodeAgent(82.3%),且其控制机制比树搜索方法更简洁。在TransCoder测试中,大多数翻译对准确率≥90%。在LANL科学语料库上,其CodeBLEU分数较基线RAG提升0.115。ARCS为利用现有大语言模型检查点实现可靠代码合成提供了实用且可复现的解决方案。