Large language models (LLMs) have exhibited remarkable ability in code generation. However, generating the correct solution in a single attempt still remains a challenge. Prior works utilize verification properties in software engineering to verify and re-rank solutions in a majority voting manner. But the assumption behind them that generated verification properties have better qualities than solutions may not always hold. In this paper, we treat them equally as different perspectives of LLMs' reasoning processes. We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency across outputs from multiple perspectives. Specifically, we prompt LLMs to generate diverse outputs from three perspectives, Solution, Specification and Test case, constructing a 3-partite graph. With two measure functions of consistency, we embed both inter- and intra-consistency information into the graph. The optimal choice of solutions is then determined based on analysis in the graph. MPSC significantly boosts performance of foundation models (ChatGPT in this paper) on various benchmarks, including HumanEval (+15.91%), MBPP (+6.43%) and CodeContests (+9.37%), even surpassing GPT-4.
翻译:大型语言模型(LLMs)在代码生成方面展现出卓越能力。然而,单次尝试生成正确解决方案仍具挑战性。先前研究利用软件工程中的验证属性,通过多数投票机制对方案进行验证与重排序,但其假设——生成的验证属性质量优于解决方案——未必始终成立。本文将这些输出视为LLM推理过程的不同视角,提出多视角自一致性(MPSC)框架,融合多视角输出间的内部一致性与外部一致性。具体而言,我们引导LLM从解决方案、规范与测试用例三个视角生成多样化输出,并构建三方图。通过两个一致性度量函数,我们将内部与外部一致性信息嵌入图中,最终基于图分析确定最优解决方案。MPSC显著提升了基础模型(本文以ChatGPT为例)在HumanEval(+15.91%)、MBPP(+6.43%)及CodeContests(+9.37%)等多个基准测试上的性能,甚至超越GPT-4。