Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to identify the correct program, either because they can misidentify nonequivalent programs or because they rely on an LLM and assume it always correctly determines the output for every input. We present ExPairT-LLM, an exact learning algorithm for code selection that selects a program by posing to an LLM oracle two new types of queries: pairwise membership and pairwise equivalence. These queries are simpler for LLMs and enable ExPairT-LLM to identify the correct program through a tournament, which is robust to some LLM mistakes. We evaluate ExPairT-LLM on four popular code datasets. Its pass@1 (success rate) outperforms the state-of-the-art code selection algorithm on average by +13.0% and up to +27.1%. It also improves the pass@1 of LLMs performing complex reasoning by +24.0%.
翻译:尽管大型语言模型(LLM)近期取得了进展,代码生成任务仍然充满挑战。为此,代码选择算法从LLM生成的多个程序中筛选出最佳程序。然而,现有算法可能无法识别出正确程序,其原因可能是错误识别了不等价的程序,或是依赖于LLM并假设其总能正确判定每个输入对应的输出。我们提出了ExPairT-LLM,一种用于代码选择的精确学习算法,该算法通过向LLM预言机提出两种新型查询——成对成员查询与成对等价查询——来选择程序。这些查询对LLM而言更为简单,使得ExPairT-LLM能够通过锦标赛机制识别正确程序,该机制对LLM的某些错误具有鲁棒性。我们在四个主流代码数据集上评估了ExPairT-LLM。其pass@1(成功率)平均超越最先进的代码选择算法13.0%,最高提升达27.1%。同时,它还将执行复杂推理的LLM的pass@1提升了24.0%。