Reproducing system-level concurrency bugs requires both input data and the precise interleaving order of system calls. This process is challenging because such bugs are non-deterministic, and bug reports often lack the detailed information needed. Additionally, the unstructured nature of reports written in natural language makes it difficult to extract necessary details. Existing tools are inadequate to reproduce these bugs due to their inability to manage the specific interleaving at the system call level. To address these challenges, we propose SysPro, a novel approach that automatically extracts relevant system call names from bug reports and identifies their locations in the source code. It generates input data by utilizing information retrieval, regular expression matching, and the category-partition method. This extracted input and interleaving data are then used to reproduce bugs through dynamic source code instrumentation. Our empirical study on real-world benchmarks demonstrates that SysPro is both effective and efficient at localizing and reproducing system-level concurrency bugs from bug reports.
翻译:复现系统级并发缺陷需要输入数据及系统调用的精确交错顺序。由于此类缺陷具有非确定性,且缺陷报告通常缺乏所需详细信息,这一过程极具挑战性。此外,用自然语言撰写的报告的非结构化特性使得提取必要细节变得困难。现有工具因无法在系统调用层级管理特定交错顺序,难以有效复现此类缺陷。为应对这些挑战,我们提出SysPro——一种能够自动从缺陷报告中提取相关系统调用名称并在源代码中定位其位置的新方法。该方法通过信息检索、正则表达式匹配和类别划分法生成输入数据。提取的输入与交错数据随后通过动态源代码插桩技术用于缺陷复现。我们在真实基准测试上进行的实证研究表明,SysPro在从缺陷报告中定位和复现系统级并发缺陷方面兼具高效性与有效性。