Failure indexing is a longstanding crux in software testing and debugging, the goal of which is to automatically divide failures (e.g., failed test cases) into distinct groups according to the culprit root causes, as such multiple faults in a faulty program can be handled independently and simultaneously. This community has long been plagued by two challenges: 1) The effectiveness of division is still far from promising. Existing techniques only employ a limited source of run-time data (e.g., code coverage) to be failure proximity, which typically delivers unsatisfactory results. 2) The outcome can be hardly comprehensible. A developer who receives the failure indexing result does not know why all failures should be divided the way they are. This leads to difficulties for developers to be convinced by the result, which in turn affects the adoption of the results. To tackle these challenges, in this paper, we propose SURE, a viSUalized failuRe indExing approach using the program memory spectrum. We first collect the run-time memory information at preset breakpoints during the execution of failed test cases, and transform it into human-friendly images (called program memory spectrum, PMS). Then, any pair of PMS images that serve as proxies for two failures is fed to a trained Siamese convolutional neural network, to predict the likelihood of them being triggered by the same fault. Results demonstrate the effectiveness of SURE: It achieves 101.20% and 41.38% improvements in faults number estimation, as well as 105.20% and 35.53% improvements in clustering, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively. Moreover, we carry out a human study to quantitatively evaluate the comprehensibility of PMS, revealing that this novel type of representation can help developers better comprehend failure indexing results.
翻译:失效索引是软件测试与调试领域长期存在的关键难题,其目标是根据根本原因自动将失效(如失败的测试用例)划分为不同组别,从而使程序中的多个缺陷能够被独立且并行地处理。该领域长期面临两大挑战:1)划分效果仍远未达到理想状态。现有技术仅使用有限的运行时数据(如代码覆盖率)作为失效邻近度量,通常导致结果不理想。2)结果难以理解。开发者在获取失效索引结果后,无法理解为何所有失效需按现有方式划分,导致其对结果缺乏信任,进而影响方法的实际应用。为应对这些挑战,本文提出SURE——一种利用程序内存频谱的可视化失效索引方法。我们首先在失败测试用例执行过程中,于预设断点处收集运行时内存信息,并将其转换为人类友好的图像(称为程序内存频谱,PMS)。随后,将代表两个失效的PMS图像对输入至训练好的孪生卷积神经网络,以预测两者由相同缺陷触发的可能性。实验结果表明SURE的有效性:与当前领域最先进技术相比,在模拟环境与真实环境中,SURE在故障数量估计上分别提升了101.20%和41.38%,在聚类效果上分别提升了105.20%和35.53%。此外,我们通过人工研究量化评估了PMS的可理解性,证明这种新型表示形式能够帮助开发者更好地理解失效索引结果。