Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation. However, these methods often rely heavily on pre-existing test cases, which may not always be available or comprehensive. In this work, we propose a novel approach that concurrently trains a code generation model and a test generation model, utilizing execution feedback to refine and enhance the performance of both. We introduce two strategies for test and code data augmentation and a new scoring function for code and test ranking. We experiment on the APPS dataset and demonstrate that our approach can effectively generate and augment test cases, filter and synthesize correct code solutions, and rank the quality of generated code and tests. The results demonstrate that our models, when iteratively trained with an increasing number of test cases and code solutions, outperform those trained on the original dataset.
翻译:近年来,语言建模的进展已能实现自然语言到代码的转换,并利用执行反馈来改进代码生成。然而,这些方法通常严重依赖预先存在的测试用例,而这些测试用例可能并不总是可用或全面。在本工作中,我们提出了一种新颖的方法,同时训练代码生成模型和测试生成模型,利用执行反馈来优化和提升两者的性能。我们引入了两种测试与代码数据增强策略,以及一种用于代码和测试排序的新评分函数。我们在APPS数据集上进行了实验,结果表明,我们的方法能够有效生成和增强测试用例、筛选并合成正确的代码解决方案,并对生成的代码和测试进行质量排序。实验结果显示,当使用逐渐增加的测试用例和代码解决方案进行迭代训练时,我们的模型性能优于在原始数据集上训练的模型。