Failure indexing is a longstanding crux in software testing and debugging, the goal of which is to automatically divide failures (e.g., failed test cases) into distinct groups according to the culprit root causes, as such multiple faults in a faulty program can be handled independently and simultaneously. This community has long been plagued by two challenges: 1) The effectiveness of division is still far from promising. Existing techniques only employ a limited source of run-time data (e.g., code coverage) to be failure proximity, which typically delivers unsatisfactory results. 2) The outcome can be hardly comprehensible. A developer who receives the failure indexing result does not know why all failures should be divided the way they are. This leads to difficulties for developers to be convinced by the result, which in turn affects the adoption of the results. To tackle these challenges, in this paper, we propose SURE, a viSUalized failuRe indExing approach using the program memory spectrum. We first collect the run-time memory information at preset breakpoints during the execution of failed test cases, and transform it into human-friendly images (called program memory spectrum, PMS). Then, any pair of PMS images that serve as proxies for two failures is fed to a trained Siamese convolutional neural network, to predict the likelihood of them being triggered by the same fault. Results demonstrate the effectiveness of SURE: It achieves 101.20% and 41.38% improvements in faults number estimation, as well as 105.20% and 35.53% improvements in clustering, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively. Moreover, we carry out a human study to quantitatively evaluate the comprehensibility of PMS, revealing that this novel type of representation can help developers better comprehend failure indexing results.
翻译:故障索引是软件测试与调试领域长期存在的核心难题,其目标是将故障(例如失败的测试用例)根据根本原因自动划分为不同的组,从而使故障程序中的多个缺陷能够被独立且同时处理。该领域长期面临两大挑战:1) 划分效果远未达到预期。现有技术仅利用有限的运行时数据(如代码覆盖率)作为故障近似度量,通常会导致不理想的结果。2) 结果难以理解。开发人员在获得故障索引结果后,无法理解为何所有故障会被如此划分。这导致开发人员难以信服结果,进而影响结果的实际应用。为应对这些挑战,本文提出SURE——一种利用程序内存谱的可视化故障索引方法。我们首先在失败测试用例执行过程中,在预设断点处收集运行时内存信息,并将其转化为人类友好的图像(称为程序内存谱,PMS)。随后,将作为两个故障代理的任意一对PMS图像输入训练好的孪生卷积神经网络,以预测它们由同一故障触发的可能性。实验结果表明SURE的有效性:与当前该领域最先进技术相比,在模拟环境和真实环境中,SURE在故障数量估计上分别提升了101.20%和41.38%,在聚类效果上分别提升了105.20%和35.53%。此外,我们通过人类实验定量评估了PMS的可理解性,表明这种新型表示方法能帮助开发人员更好地理解故障索引结果。