SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum

Failure indexing is a longstanding crux in software testing and debugging, the goal of which is to automatically divide failures (e.g., failed test cases) into distinct groups according to the culprit root causes, as such multiple faults in a faulty program can be handled independently and simultaneously. This community has long been plagued by two challenges: 1) The effectiveness of division is still far from promising. Existing techniques only employ a limited source of run-time data (e.g., code coverage) to be failure proximity, which typically delivers unsatisfactory results. 2) The outcome can be hardly comprehensible. A developer who receives the failure indexing result does not know why all failures should be divided the way they are. This leads to difficulties for developers to be convinced by the result, which in turn affects the adoption of the results. To tackle these challenges, in this paper, we propose SURE, a viSUalized failuRe indExing approach using the program memory spectrum. We first collect the run-time memory information at preset breakpoints during the execution of failed test cases, and transform it into human-friendly images (called program memory spectrum, PMS). Then, any pair of PMS images that serve as proxies for two failures is fed to a trained Siamese convolutional neural network, to predict the likelihood of them being triggered by the same fault. Results demonstrate the effectiveness of SURE: It achieves 101.20% and 41.38% improvements in faults number estimation, as well as 105.20% and 35.53% improvements in clustering, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively. Moreover, we carry out a human study to quantitatively evaluate the comprehensibility of PMS, revealing that this novel type of representation can help developers better comprehend failure indexing results.

翻译：故障索引是软件测试与调试领域长期存在的核心难题，其目标是将故障（例如失败的测试用例）根据根本原因自动划分为不同的组，从而使故障程序中的多个缺陷能够被独立且同时处理。该领域长期面临两大挑战：1) 划分效果远未达到预期。现有技术仅利用有限的运行时数据（如代码覆盖率）作为故障近似度量，通常会导致不理想的结果。2) 结果难以理解。开发人员在获得故障索引结果后，无法理解为何所有故障会被如此划分。这导致开发人员难以信服结果，进而影响结果的实际应用。为应对这些挑战，本文提出SURE——一种利用程序内存谱的可视化故障索引方法。我们首先在失败测试用例执行过程中，在预设断点处收集运行时内存信息，并将其转化为人类友好的图像（称为程序内存谱，PMS）。随后，将作为两个故障代理的任意一对PMS图像输入训练好的孪生卷积神经网络，以预测它们由同一故障触发的可能性。实验结果表明SURE的有效性：与当前该领域最先进技术相比，在模拟环境和真实环境中，SURE在故障数量估计上分别提升了101.20%和41.38%，在聚类效果上分别提升了105.20%和35.53%。此外，我们通过人类实验定量评估了PMS的可理解性，表明这种新型表示方法能帮助开发人员更好地理解故障索引结果。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日