Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced strategy for generalizing large language models for code. In contrast to relying on a single source, we construct a knowledge soup integrating web search, documentation, execution feedback, and evolved code snippets. We employ an active retrieval strategy that iteratively refines the query and updates the knowledge soup. To assess the performance of ARKS, we compile a new benchmark comprising realistic coding problems associated with frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama demonstrate a substantial improvement in the average execution accuracy of ARKS on LLMs. The analysis confirms the effectiveness of our proposed knowledge soup and active retrieval strategies, offering rich insights into the construction of effective retrieval-augmented code generation (RACG) pipelines. Our model, code, and data are available at https://arks-codegen.github.io.
翻译:近年来,检索增强生成(RAG)范式因其能够在无需额外训练的情况下将外部知识融入大型语言模型(LLM)而备受关注。尽管该范式在自然语言应用中已被广泛探索,但其在代码生成中的利用仍待深入。本文提出了一种面向代码大型语言模型的通用化先进策略——知识汤中主动检索(ARKS)。不同于依赖单一知识源,我们构建了一个融合网络搜索、文档、执行反馈及演化代码片段的知识汤。我们采用主动检索策略,通过迭代优化查询并更新知识汤。为评估ARKS性能,我们编制了一个新基准,涵盖与频繁更新库及长尾编程语言相关的实际编码问题。在ChatGPT和CodeLlama上的实验结果表明,ARKS显著提升了大型语言模型的平均执行准确率。分析验证了所提知识汤与主动检索策略的有效性,为构建高效的检索增强代码生成(RACG)流水线提供了丰富见解。我们的模型、代码及数据详见https://arks-codegen.github.io。