BugLens: Leveraging Bisection for Lightweight Compiler Bug Deduplication

Random testing has proven to be an effective technique for compiler validation. However, the debugging of bugs identified through random testing presents a significant challenge due to the frequent occurrence of duplicate test programs that expose identical compiler bugs. The process to identify duplicates is a practical research problem known as bug deduplication. Prior methodologies for compiler bug deduplication primarily rely on program analysis to extract bug-related features for duplicate identification, which can result in substantial computational overhead and limited generalizability. This paper investigates the feasibility of employing bisection, a standard debugging procedure largely overlooked in prior research on compiler bug deduplication, for this purpose. Our study demonstrates that the utilization of bisection to locate failure-inducing commits provides a valuable criterion for deduplication, albeit one that requires supplementary techniques for more accurate identification. Building on these results, we introduce BugLens, a novel deduplication method that primarily uses bisection, enhanced by the identification of bug-triggering optimizations to minimize false negatives. Empirical evaluations conducted on four real-world datasets demonstrate that BugLens significantly outperforms the state-of-the-art analysis-based methodologies Tamer and D3 by saving an average of 26.98% and 9.64% human effort to identify the same number of distinct bugs. Given the inherent simplicity and generalizability of bisection, it presents a highly practical solution for compiler bug deduplication in real-world applications.

翻译：随机测试已被证明是编译器验证的有效技术。然而，由于经常出现暴露相同编译器缺陷的重复测试程序，对随机测试发现的缺陷进行调试面临重大挑战。识别重复项的过程是一个实际的研究问题，称为缺陷去重。现有的编译器缺陷去重方法主要依赖程序分析提取缺陷相关特征以进行重复识别，这可能导致显著的计算开销和有限的泛化能力。本文探讨了利用二分定位（一种在先前编译器缺陷去重研究中被广泛忽视的标准调试过程）实现该目的的可行性。我们的研究表明，利用二分定位确定导致故障的提交可为去重提供有价值的判据，尽管该判据需要辅助技术以实现更精确的识别。基于这些结果，我们提出了BugLens，一种新颖的去重方法，其以二分定位为核心，并通过识别缺陷触发优化来增强，以最小化漏报率。在四个真实数据集上进行的实证评估表明，BugLens在识别相同数量独立缺陷时，分别比当前最先进的基于分析的方法Tamer和D3平均节省26.98%和9.64%的人工工作量。鉴于二分定位固有的简洁性和泛化性，它为实际应用中的编译器缺陷去重提供了一个高度实用的解决方案。