Change And Cover: Last-Mile, Pull Request-Based Regression Test Augmentation

Software is in constant evolution, with developers frequently submitting pull requests (PRs) to introduce new features or fix bugs. Testing PRs is critical to maintaining software quality. Yet, even in projects with extensive test suites, some PR-modified lines remain untested, leaving a "last-mile" regression test gap. Existing test generators typically aim to improve overall coverage, but do not specifically target the uncovered lines in PRs. We present Change And Cover (ChaCo), an LLM-based test augmentation technique that addresses this gap. It makes three contributions: (i) ChaCo considers the PR-specific patch coverage, offering developers augmented tests for code just when it is on the developers' mind. (ii) We identify providing suitable test context as a crucial challenge for an LLM to generate useful tests, and present two techniques to extract relevant test content, such as existing test functions, fixtures, and data generators. (iii) To make augmented tests acceptable for developers, ChaCo carefully integrates them into the existing test suite, e.g., by matching the test's structure and style with the existing tests, and generates a summary of the test addition for developer review. We evaluate ChaCo on 145 PRs from three popular and complex open-source projects - SciPy, Qiskit, and Pandas. The approach successfully helps 30% of PRs achieve full patch coverage, at the cost of $0.11, showing its effectiveness and practicality. Human reviewers find the tests to be worth adding (4.53/5.0), well integrated (4.2/5.0), and relevant to the PR (4.7/5.0). Ablations show test context is crucial for context-aware test generation, leading to 2x coverage. We submitted 12 tests, of which 8 have already been merged, and two previously unknown bugs were exposed and fixed. We envision our approach to be integrated into CI workflows, automating the last mile of regression test augmentation.

翻译：软件处于持续演化之中，开发者频繁提交拉取请求（PR）以引入新功能或修复缺陷。测试PR对于维持软件质量至关重要。然而，即使在拥有大量测试套件的项目中，部分PR修改的代码行仍未被测试覆盖，形成了“最后一公里”的回归测试缺口。现有的测试生成技术通常以提高整体覆盖率为目标，但并未专门针对PR中未被覆盖的代码行。本文提出变更与覆盖（ChaCo），一种基于大语言模型的测试增强技术，旨在解决这一缺口。其主要贡献包括：（i）ChaCo关注PR特定的补丁覆盖率，在开发者关注相关代码时及时提供增强测试。（ii）我们指出为LLM提供合适的测试上下文是生成有效测试的关键挑战，并提出两种技术来提取相关测试内容，例如现有测试函数、固件和数据生成器。（iii）为使增强测试易于被开发者接受，ChaCo将其谨慎集成到现有测试套件中，例如使测试的结构和风格与现有测试保持一致，并生成测试添加的摘要供开发者审阅。我们在三个流行且复杂的开源项目（SciPy、Qiskit和Pandas）的145个PR上评估ChaCo。该方法成功帮助30%的PR实现了完整的补丁覆盖率，平均成本为0.11美元，证明了其有效性和实用性。人工评审者认为这些测试值得添加（4.53/5.0）、集成良好（4.2/5.0）且与PR高度相关（4.7/5.0）。消融实验表明测试上下文对于上下文感知的测试生成至关重要，可使覆盖率提升2倍。我们提交了12个测试，其中8个已被合并，并暴露并修复了两个先前未知的缺陷。我们设想将本方法集成到持续集成工作流中，以自动化最后一公里的回归测试增强。