State-of-the-art automated test generation techniques, such as search-based testing, are usually ignorant about what a developer would create as a test case. Therefore, they typically create tests that are not human-readable and may not necessarily detect all types of complex bugs developer-written tests would do. In this study, we leverage Transformer-based code models to generate unit tests that can complement search-based test generation. Specifically, we use CodeT5, i.e., a state-of-the-art large code model, and fine-tune it on the test generation downstream task. For our analysis, we use the Methods2test dataset for fine-tuning CodeT5 and Defects4j for project-level domain adaptation and evaluation. The main contribution of this study is proposing a fully automated testing framework that leverages developer-written tests and available code models to generate compilable, human-readable unit tests. Results show that our approach can generate new test cases that cover lines that were not covered by developer-written tests. Using domain adaptation, we can also increase line coverage of the model-generated unit tests by 49.9% and 54% in terms of mean and median (compared to the model without domain adaptation). We can also use our framework as a complementary solution alongside common search-based methods to increase the overall coverage with mean and median of 25.3% and 6.3%. It can also increase the mutation score of search-based methods by killing extra mutants (up to 64 new mutants were killed per project in our experiments).
翻译:最先进的自动化测试生成技术(如基于搜索的测试)通常不了解开发者会创建怎样的测试用例。因此,它们生成的测试往往缺乏可读性,且不一定能检测出开发者编写测试所能发现的所有复杂缺陷类型。本研究利用基于Transformer的代码模型生成单元测试,以补充基于搜索的测试生成方法。具体而言,我们采用当前最先进的大型代码模型CodeT5,并在测试生成下游任务上对其进行微调。分析过程中,我们使用Methods2test数据集微调CodeT5,并采用Defects4j进行项目级领域自适应与评估。本研究的主要贡献在于提出了一种完全自动化的测试框架,该框架利用开发者编写的测试与现有代码模型,生成可编译、可读的单元测试。结果表明,我们的方法能够生成覆盖开发者编写测试未覆盖代码行的新测试用例。通过领域自适应,我们可将模型生成的单元测试行覆盖率均值与中位数分别提高49.9%和54%(相较于未采用领域自适应的模型)。此外,该框架可作为补充方案与常见基于搜索的方法协同使用,将总体覆盖率的均值与中位数分别提升25.3%和6.3%。同时,它还能通过杀死额外突变体提升基于搜索方法的突变评分(实验中每个项目最多可杀死64个新突变体)。