The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the abundance of testing and debugging tools for Java. Thus, there is a need to push research on tools that can help Python developers. One factor that contributed to the rapid growth of Java testing and debugging tools is the availability of benchmarks. A popular benchmark is the Defects4J benchmark; its initial version contained 357 real bugs from 5 real-world Java programs. Each bug comes with a test suite that can expose the bug. Defects4J has been used by hundreds of testing and debugging studies and has helped to push the frontier of research in these directions. In this project, inspired by Defects4J, we create another benchmark database and tool that contain 493 real bugs from 17 real-world Python programs. We hope our benchmark can help catalyze future work on testing and debugging tools that work on Python programs.
翻译:2019年Stack Overflow开发者调查显示,Python在流行度上首次超越Java。2020年调查中,两者差距进一步拉大。然而,尽管Python的普及度迅速攀升,专为Python设计的测试与调试工具却相对匮乏。这与Java拥有大量测试调试工具形成鲜明对比。因此,亟需推动面向Python开发者的工具研究。Java测试调试工具的快速发展,一定程度上得益于基准测试库的可用性。其中广为人知的是Defects4J基准库——其初始版本包含来自5个真实Java程序的357个真实错误,每个错误均附带可暴露缺陷的测试套件。Defects4J已被数百项测试与调试研究使用,有力推动了该领域的研究前沿。受Defects4J启发,本项目创建了涵盖17个真实Python程序中493个真实错误的基准数据库及配套工具。我们期待该基准能催化未来针对Python程序的测试与调试工具研究。