A software engineering issue (SWE issue) is easier to resolve when accompanied by a reproduction test. Unfortunately, most issues do not come with functioning reproduction tests, so this paper explores how to generate them automatically. The primary challenge in this setting is that the code to be tested is either missing or wrong, as evidenced by the existence of the issue in the first place. This has held back test generation for this setting: without the correct code to execute, it is difficult to leverage execution feedback to generate good tests. This paper introduces novel techniques for leveraging execution feedback to get around this problem, implemented in a new reproduction test generator called e-Otter++. Experiments show that e-Otter++ represents a leap ahead in the state-of-the-art for this problem, generating tests with an average fail-to-pass rate of 63% on the TDD-Bench Verified benchmark.
翻译:当软件工程问题(SWE issue)附带可复现测试时,其解决过程将更为高效。然而,大多数问题并未配备有效的复现测试,因此本文致力于探索如何自动生成此类测试。该场景下的核心挑战在于:待测代码本身存在缺失或错误——这正是问题产生的根本原因。这一难点长期阻碍了该场景下的测试生成工作:若缺乏可执行的正確代码,则难以利用执行反馈来生成优质测试。本文提出了一系列创新技术,通过执行反馈来规避此问题,并在一款名为e-Otter++的新型复现测试生成器中实现。实验表明,e-Otter++在此领域取得了突破性进展:在TDD-Bench Verified基准测试中,其生成测试的平均“失败转通过”率达到63%。