ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

Large language models (LLMs) have achieved impressive performance in code generation recently, offering programmers revolutionary assistance in software development. However, due to the auto-regressive nature of LLMs, they are susceptible to error accumulation during code generation. Once an error is produced, LLMs can merely continue to generate the subsequent code conditioned on it, given their inability to adjust previous outputs. Existing LLM-based approaches typically consider post-revising after code generation, leading to the challenging resolution of accumulated errors and the significant wastage of resources. Ideally, LLMs should rollback and resolve the occurred error in time during code generation, rather than proceed on the basis of the error and wait for post-revising after generation. In this paper, we propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation. Specifically, we employ program analysis to perform incremental error detection during the generation process. When an error is detected, the backtracking mechanism is triggered to priming rollback strategies and constraint regeneration, thereby eliminating the error early and ensuring continued generation on the correct basis. Experiments on multiple code generation benchmarks show that ROCODE can significantly reduce the errors generated by LLMs, with a compilation pass rate of 99.1%. The test pass rate is improved by up to 23.8% compared to the best baseline approach. Compared to the post-revising baseline, the token cost is reduced by 19.3%. Moreover, our approach is model-agnostic and achieves consistent improvements across nine representative LLMs.

翻译：近年来，大型语言模型（LLMs）在代码生成方面取得了令人瞩目的性能，为程序员的软件开发工作带来了革命性的辅助。然而，由于LLMs的自回归特性，它们在代码生成过程中容易受到错误累积的影响。一旦产生错误，由于无法调整先前的输出，LLMs只能基于该错误继续生成后续代码。现有的基于LLM的方法通常在代码生成后进行后修正，这导致累积错误的解决变得困难，并造成显著的资源浪费。理想情况下，LLMs应在代码生成过程中及时回滚并解决已发生的错误，而不是基于错误继续生成并等待生成后的后修正。本文提出ROCODE，将回溯机制与程序分析集成到LLMs中以进行代码生成。具体而言，我们利用程序分析在生成过程中执行增量错误检测。当检测到错误时，触发回溯机制以启动回滚策略和约束重新生成，从而及早消除错误并确保在正确基础上继续生成。在多个代码生成基准测试上的实验表明，ROCODE能够显著减少LLMs生成的错误，其编译通过率达到99.1%。与最佳基线方法相比，测试通过率最高提升了23.8%。与后修正基线相比，令牌成本降低了19.3%。此外，我们的方法具有模型无关性，在九个代表性LLM上均取得了一致的改进。