Tests can be useful towards resolving issues on code repositories. However, relying too much on tests for issue resolution can lead to code that technically passes observed tests but actually misses important cases or even breaks functionality. This problem, called test overfitting, is exacerbated by the fact that issues usually lack readily executable tests. Instead, several issue resolution systems use tests auto-generated from issues, which may be imperfect. Some systems even iteratively refine code and tests jointly. This paper presents the first empirical study of test overfitting in this setting.
翻译:测试对于解决代码仓库中的问题具有重要作用。然而,过度依赖测试进行问题修复可能导致代码在技术上通过已观测的测试,却实际上遗漏了重要用例甚至破坏原有功能。这一问题被称为测试过拟合,且由于问题描述通常缺乏可直接执行的测试用例而进一步加剧。实践中,多个问题修复系统采用从问题描述自动生成的测试用例,但这些测试可能存在缺陷。部分系统甚至采用代码与测试联合迭代优化的方法。本文首次针对该场景下的测试过拟合问题开展了实证研究。