Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily "abnormal" or rare.
翻译:异常机制允许开发者处理预期不频繁发生的错误情况。理想情况下,良好的测试套件应同时测试正常行为与异常行为,以捕获更多缺陷并避免回归问题。现有研究虽分析了传播至测试的异常,但未探讨未抵达测试的其他异常。本文通过实证研究探索实际系统中异常行为的测试频率,同时考虑传播至测试与未抵达测试的异常类型。为此,我们运行插桩版测试套件,监控其执行过程,并收集运行时引发异常的信息。通过对25个Python系统的测试套件进行分析(涵盖5,372个执行方法、1,790万次调用及140万次异常引发),发现21.4%的执行方法确实会在运行时引发异常。在引发异常的方法中,中位数显示每10次调用中有1次会触发异常行为。近80%的异常引发方法触发频率较低,但约20%的方法会较频繁地引发异常。最后,我们为研究者与实践者提出建议:开发新型工具以支持异常行为触发机制,重构高成本的try/except代码块;同时需注意异常引发行为未必属于"异常"或罕见现象。