Code Large Language Models (CodeLLMs) have marked a new era in code generation advancements. However, selecting the best solutions from all possible CodeLLM solutions remains a challenge. Previous methods frequently overlooked the intricate functional similarities and interactions between clusters, resulting in suboptimal results. In this work, we introduce \textit{SRank}, a novel reranking strategy for selecting the best solution from code generation that focuses on modeling the relationship between clusters of solutions. By quantifying the functional overlap between clusters, our approach provides a better ranking strategy of code solutions. Empirical results show that our method achieves remarkable results on pass@1 score. For instance, on the Human-Eval benchmark, we achieve 69.66\% in pass@1 with Codex002, 75.31\% for WizardCoder, 53.99\% for StarCoder and 60.55\% for CodeGen, which surpass the state-of-the-arts solution ranking methods, such as CodeT and Coder-Reviewer on the same CodeLLM with significant margin ($\approx 6.1\%$ improvement on average). Even in scenarios with a limited number of sampled solutions and test cases, our approach demonstrates robustness and superiority, marking a new benchmark in code generation reranking.
翻译:代码大语言模型(CodeLLMs)标志着代码生成领域的新纪元。然而,从所有可能的CodeLLM解决方案中选择最佳方案仍是一项挑战。以往的方法常忽略簇间复杂的功能相似性与交互关系,导致结果欠优。本研究提出\textit{SRank}——一种新颖的重排序策略,通过聚焦解决方案簇间关系建模,从代码生成结果中选取最优方案。通过量化簇间的功能重叠程度,我们的方法提供了更优的代码解决方案排序策略。实验结果表明,该方法在pass@1得分上取得了显著成效。例如,在Human-Eval基准测试中,我们基于Codex002、WizardCoder、StarCoder和CodeGen分别获得69.66%、75.31%、53.99%和60.55%的pass@1得分,其性能显著超越CodeT、Coder-Reviewer等当前最先进的解决方案排序方法(平均提升约6.1%)。即使在采样方案及测试用例数量有限的场景下,本方法仍展现出鲁棒性与优越性,树立了代码生成重排序领域的新基准。