Does mutation testing improve testing practices?

Various proxy metrics for test quality have been defined in order to guide developers when writing tests. Code coverage is particularly well established in practice, even though the question of how coverage relates to test quality is a matter of ongoing debate. Mutation testing offers a promising alternative: Artificial defects can identify holes in a test suite, and thus provide concrete suggestions for additional tests. Despite the obvious advantages of mutation testing, it is not yet well established in practice. Until recently, mutation testing tools and techniques simply did not scale to complex systems. Although they now do scale, a remaining obstacle is lack of evidence that writing tests for mutants actually improves test quality. In this paper we aim to fill this gap: By analyzing a large dataset of almost 15 million mutants, we investigate how these mutants influenced developers over time, and how these mutants relate to real faults. Our analyses suggest that developers using mutation testing write more tests, and actively improve their test suites with high quality tests such that fewer mutants remain. By analyzing a dataset of past fixes of real high-priority faults, our analyses further provide evidence that mutants are indeed coupled with real faults. In other words, had mutation testing been used for the changes introducing the faults, it would have reported a live mutant that could have prevented the bug.

翻译：为了指导开发者写测试时的测试质量,已经定义了各种代用标准,以指导开发者制定测试质量。守则覆盖范围在实践上特别明确,尽管与测试质量有关的覆盖范围问题是一个持续辩论的问题。突变测试提供了一个很有希望的替代方案: 人工缺陷可以识别测试套件中的漏洞, 从而为额外的测试提供具体的建议。尽管突变测试具有明显的优势, 但它在实践上还没有很好确立。直到最近, 突变测试工具和技术仅仅没有推广到复杂的系统。虽然它们现在有规模, 剩下的障碍是缺乏为变异者写测试实际提高测试质量的证据。在本文中,我们的目标是填补这一空白: 通过分析近1500万变异体的庞大数据集, 我们调查这些变异体如何影响开发者, 以及这些变异体如何与实际的缺陷相联系。我们的分析表明, 使用突变试验的开发者写了更多的测试, 并且积极改进测试套件质量测试, 如此少变种保留。通过分析过去对真正高优先错误的数据集的数据集, 我们的分析进一步提供证据证明, 变异体确实存在变异体的变异体的变异体与实际的变变体的变体的变体的变异体的变体的变体的变异体的变种。