With the development of code generation techniques, selecting the correct code solution from multiple candidate solutions has become a crucial task. This study proposes AutoTest, a novel technique that combines automated test case generation with code solution execution to optimize the selection process using an evolutionary genetic algorithm. Firstly, AutoTest utilizes large pre-trained language models such as codegen-16B, code-davinci-002, and incoder-6B to provide code solutions and their corresponding test cases. Then, by executing the code solutions and evaluating their performance on the test cases, a consensus set is formed. Fine-grained ranking is achieved through the selection, mutation, and crossover mechanisms based on the evolutionary genetic algorithm, with the adjustment of alpha and beta parameters. Finally, the best code solution is chosen. AutoTest demonstrates significant performance improvements on the HumanEval benchmark test. The HumanEval dataset consists of 164 programming problems, and AutoTest achieves approximately a 10% improvement over the baseline method in terms of pass@1 score.
翻译:随着代码生成技术的发展,从多个候选解决方案中选择正确的代码解决方案已成为一项关键任务。本研究提出AutoTest,一种将自动化测试用例生成与代码解决方案执行相结合的新技术,通过进化遗传算法优化选择过程。首先,AutoTest利用codegen-16B、code-davinci-002和incoder-6B等大型预训练语言模型提供代码解决方案及其对应的测试用例。随后,通过执行代码解决方案并评估其在测试用例上的性能,形成共识集合。基于进化遗传算法的选择、变异和交叉机制,通过调整alpha和beta参数实现细粒度排序。最终选择出最优代码解决方案。在HumanEval基准测试中,AutoTest展现出显著的性能提升。HumanEval数据集包含164个编程问题,AutoTest在pass@1分数上较基线方法提升约10%。