Random testing approaches work by generating inputs at random, or by selecting inputs randomly from some pre-defined operational profile. One long-standing question that arises in this and other testing contexts is as follows: When can we stop testing? At what point can we be certain that executing further tests in this manner will not explore previously untested (and potentially buggy) software behaviors? This is analogous to the question in Machine Learning, of how many training examples are required in order to infer an accurate model. In this paper we show how probabilistic approaches to answer this question in Machine Learning (arising from Computational Learning Theory) can be applied in our testing context. This enables us to produce an upper bound on the number of tests that are required to achieve a given level of adequacy. We are the first to enable this from only knowing the number of coverage targets (e.g. lines of code) in the source code, without needing to observe a sample test executions. We validate this bound on a large set of Java units, and an autonomous driving system.
翻译:随机测试方法通过随机生成输入或从预定义的操作配置文件中随机选择输入来工作。在此及其他测试场景中长期存在的一个问题是:我们何时可以停止测试?在什么情况下我们可以确信,以这种方式执行更多测试将不会探索到先前未测试(且可能存在缺陷)的软件行为?这类似于机器学习中需要多少训练样本才能推断出准确模型的问题。本文展示了如何将机器学习中回答该问题的概率方法(源自计算学习理论)应用于我们的测试场景。这使我们能够为达到给定充分性水平所需的测试次数建立上界。我们首次实现了仅通过源代码中的覆盖目标数量(如代码行数)即可完成此界定,而无需观察样本测试执行过程。我们在大量Java单元及一个自动驾驶系统上验证了该界定的有效性。