Developers create bug-reproducing tests that support debugging by failing as long as the bug is present, and passing once the bug has been fixed. These tests are usually integrated into existing test suites and executed regularly alongside all other tests to ensure that future regressions are caught. Despite this co-existence with other types of tests, the properties of bug-reproducing tests are scarcely researched, and it remains unclear whether they differ fundamentally. In this short paper, we provide an initial empirical study to understand bug-reproducing tests better. We analyze 642 bug-reproducing tests of 15 real-world Python systems. Overall, we find that bug-reproducing tests are not (statistically significantly) different from other tests regarding LOC, number of assertions, and complexity. However, bug-reproducing tests contain slightly more try/except blocks and ``weak assertions'' (e.g.,~\texttt{assertNotEqual}). Lastly, we detect that the majority (95%) of the bug-reproducing tests reproduce a single bug, while 5% reproduce multiple bugs. We conclude by discussing implications and future research directions.
翻译:开发人员创建缺陷复现测试以支持调试,这类测试在缺陷存在时会失败,而在缺陷修复后则通过。这些测试通常被集成到现有测试套件中,并与所有其他测试一同定期执行,以确保捕获未来的回归问题。尽管与其他类型的测试共存,缺陷复现测试的特性却鲜有研究,且其是否具有根本性差异仍不明确。在这篇短文中,我们提供了一项初步的实证研究,以更好地理解缺陷复现测试。我们分析了15个真实世界Python系统中的642个缺陷复现测试。总体而言,我们发现缺陷复现测试在代码行数、断言数量及复杂度方面与其他测试并无(统计学上显著的)差异。然而,缺陷复现测试包含稍多的try/except块和“弱断言”(例如~\texttt{assertNotEqual})。最后,我们检测到大多数(95%)缺陷复现测试仅复现单个缺陷,而5%的测试复现多个缺陷。我们通过讨论其影响及未来研究方向作为总结。