Identifying Concurrency Bug Reports via Linguistic Patterns

With the growing ubiquity of multi-core architectures, concurrent systems have become essential but increasingly prone to complex issues such as data races and deadlocks. While modern issue-tracking systems facilitate the reporting of such problems, labeling concurrency-related bug reports remains a labor-intensive and error-prone task. This paper presents a linguistic-pattern-based framework for automatically identifying concurrency bug reports. We derive 58 distinct linguistic patterns from 730 manually labeled concurrency bug reports, organized across four levels: word-level (keywords), phrase-level (n-grams), sentence-level (semantic), and bug report-level (contextual). To assess their effectiveness, we evaluate four complementary approaches-matching, learning, prompt-based, and fine-tuning-spanning traditional machine learning, large language models (LLMs), and pre-trained language models (PLMs). Our comprehensive evaluation on 12 large-scale open-source projects (10,920 issue reports from GitHub and Jira) demonstrates that fine-tuning PLMs with linguistic-pattern-enriched inputs achieves the best performance, reaching a precision of 91% on GitHub and 93% on Jira, and maintaining strong precision on post cut-off data (91%). The contributions of this work include: (1) a comprehensive taxonomy of linguistic patterns for concurrency bugs, (2) a novel fine-tuning strategy that integrates domain-specific linguistic knowledge into PLMs, and (3) a curated, labeled dataset to support reproducible research. Together, these advances provide a foundation for improving the automation, precision, and interpretability of concurrency bug classification.

翻译：随着多核架构的日益普及，并发系统已变得至关重要，但也越来越容易出现数据竞争和死锁等复杂问题。虽然现代问题跟踪系统促进了此类问题的报告，但标记与并发相关的缺陷报告仍然是一项劳动密集且容易出错的任务。本文提出了一种基于语言模式的框架，用于自动识别并发缺陷报告。我们从730份人工标记的并发缺陷报告中提取出58种不同的语言模式，并将其组织为四个层级：词级（关键词）、短语级（n元语法）、句子级（语义）和缺陷报告级（上下文）。为评估其有效性，我们评估了四种互补的方法——匹配法、学习法、基于提示的方法和微调法——涵盖了传统机器学习、大语言模型（LLMs）和预训练语言模型（PLMs）。我们在12个大型开源项目（来自GitHub和Jira的10,920份问题报告）上进行的全面评估表明，使用语言模式增强的输入对PLMs进行微调能获得最佳性能，在GitHub上达到91%的精确率，在Jira上达到93%的精确率，并在截止日期后的数据上保持较高的精确率（91%）。本工作的贡献包括：（1）一套全面的并发缺陷语言模式分类体系，（2）一种将领域特定语言知识集成到PLMs中的新颖微调策略，以及（3）一个精心整理、已标记的数据集以支持可重复研究。这些进展共同为提升并发缺陷分类的自动化程度、精确性和可解释性奠定了基础。