Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems that require not only implementation but also identification of the suitable algorithm. Moreover, LLM-generated programs lack guaranteed correctness and require human verification. To address these challenges, we propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness. ALGO first generates a reference oracle by prompting an LLM to exhaustively enumerate all the combinations of relevant variables. This oracle is then utilized to guide an arbitrary search strategy in exploring the algorithm space and to verify the synthesized algorithms. Our study shows that the LLM-generated oracles are correct for 88% of the cases. With the oracles as verifiers, ALGO can be integrated with any existing code generation model in a model-agnostic manner to enhance its performance. Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT, the current state-of-the-art model on CodeContests. We can also get 1.3x better pass rate over the ChatGPT Code Interpreter on unseen problems. The problem set we used for testing, the prompts we used, the verifier and solution programs, and the test cases generated by ALGO are available at https://github.com/zkx06111/ALGO.
翻译:大型语言模型(LLMs)擅长根据功能描述实现代码,但在算法问题中不仅需要实现代码,还需识别合适算法,这使其面临挑战。此外,LLM生成的程序缺乏正确性保障,需要人工验证。为解决这些问题,我们提出ALGO框架,该框架通过LLM生成的Oracle合成算法程序,以引导代码生成并验证其正确性。ALGO首先通过提示LLM穷举所有相关变量组合来生成参考Oracle,随后利用该Oracle指导任意搜索策略探索算法空间,并验证合成算法的正确性。研究表明,LLM生成的Oracle在88%的案例中准确无误。借助Oracle作为验证器,ALGO能以模型无关的方式集成至任何现有代码生成模型,从而提升其性能。实验表明,集成ALGO后,在CodeContests数据集上,一次提交通过率相比Codex模型提升8倍,相比当前最先进的CodeT模型提升2.6倍;在未见问题中,相比ChatGPT代码解释器提升1.3倍。我们用于测试的问题集、提示、验证器与解决方案程序,以及ALGO生成的测试用例均可在https://github.com/zkx06111/ALGO获取。