Failures with different root causes can disturb multi-fault localization significantly, therefore, dividing failures into distinct groups according to the responsible faults is highly important. In such a failure indexing task, the crux lies in the failure proximity, which involves two points, i.e., how to effectively represent failures (e.g., extract the signature of failures) and how to properly measure the distance between the proxies for those failures. Existing studies have proposed a variety of failure proximities. The prevalent of them extract signatures of failures from execution coverage or suspiciousness ranking lists, and accordingly employ the Euclid or the Kendall tau distances. However, such strategies may not properly reflect the essential characteristics of failures, thus resulting in unsatisfactory effectiveness. In this paper, we propose a new failure proximity, namely, program variable-based failure proximity, and based on which present a novel failure indexing approach. Specifically, the proposed approach utilizes the run-time values of program variables to represent failures, and designs a set of rules to measure the similarity between them. Experimental results demonstrate the competitiveness of the proposed approach: it can achieve 44.12% and 27.59% improvements in faults number estimation, as well as 47.30% and 26.93% improvements in clustering effectiveness, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively.
翻译:具有不同根因的失效会显著干扰多故障定位,因此根据责任故障将失效划分为不同组别至关重要。在失效索引任务中,关键在于失效近似性,涉及两个要点:如何有效表示失效(例如,提取失效特征)以及如何恰当度量这些失效代理之间的距离。现有研究已提出多种失效近似方法。主流方法从执行覆盖率或可疑度排名列表中提取失效特征,并相应采用欧几里得距离或肯德尔τ距离进行度量。然而,此类策略可能无法准确反映失效的本质特性,导致效果不佳。本文提出一种新的失效近似方法——基于程序变量的失效近似,并据此提出一种新颖的失效索引方法。具体而言,所提方法利用程序变量的运行时值表征失效,并设计一组规则来度量它们之间的相似性。实验结果表明,该方法的竞争力显著:在模拟环境和真实环境下,与当前领域最先进的技术相比,该方法在故障数量估计上分别提升44.12%和27.59%,在聚类有效性上分别提升47.30%和26.93%。