AutoTest: Evolutionary Code Solution Selection with Test Cases

With the development of code generation techniques, selecting the correct code solution from multiple candidate solutions has become a crucial task. This study proposes AutoTest, a novel technique that combines automated test case generation with code solution execution to optimize the selection process using an evolutionary genetic algorithm. Firstly, AutoTest utilizes large pre-trained language models such as codegen-16B, code-davinci-002, and incoder-6B to provide code solutions and their corresponding test cases. Then, by executing the code solutions and evaluating their performance on the test cases, a consensus set is formed. Fine-grained ranking is achieved through the selection, mutation, and crossover mechanisms based on the evolutionary genetic algorithm, with the adjustment of alpha and beta parameters. Finally, the best code solution is chosen. AutoTest demonstrates significant performance improvements on the HumanEval benchmark test. The HumanEval dataset consists of 164 programming problems, and AutoTest achieves approximately a 10% improvement over the baseline method in terms of pass@1 score.

翻译：随着代码生成技术的发展，从多个候选解决方案中选择正确的代码解决方案已成为一项关键任务。本研究提出AutoTest，一种将自动化测试用例生成与代码解决方案执行相结合的新技术，通过进化遗传算法优化选择过程。首先，AutoTest利用codegen-16B、code-davinci-002和incoder-6B等大型预训练语言模型提供代码解决方案及其对应的测试用例。随后，通过执行代码解决方案并评估其在测试用例上的性能，形成共识集合。基于进化遗传算法的选择、变异和交叉机制，通过调整alpha和beta参数实现细粒度排序。最终选择出最优代码解决方案。在HumanEval基准测试中，AutoTest展现出显著的性能提升。HumanEval数据集包含164个编程问题，AutoTest在pass@1分数上较基线方法提升约10%。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日