Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 51 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI).
翻译:可操作警告识别(AWI)对于提升静态代码分析器的可用性起着至关重要的作用。随着机器学习(ML)的最新进展,已有多种方法被提出,将ML技术融入AWI。这些基于ML的AWI方法受益于ML从历史数据中学习细微及先前未见模式的强大能力,已展现出卓越的性能。然而,目前缺乏对这些方法的全面概述,这可能阻碍研究人员/从业者理解当前进展,并限制了在基于ML的AWI领域发现未来改进的潜力。本文系统性地回顾了最先进的基于ML的AWI方法。首先,我们采用严谨的综述方法,收集了从2000年1月1日至2023年9月1日期间的51项主要研究。接着,我们概述了典型的基于ML的AWI工作流程,包括警告数据集准备、预处理、AWI模型构建和评估阶段。在此工作流程中,我们根据警告输出格式对基于ML的AWI方法进行了分类。此外,我们分析了每个阶段所使用的技术,及其优势、劣势和分布情况。最后,我们为未来的基于ML的AWI方法提供了实用的研究方向,重点关注数据改进(例如,优化警告标注策略)和模型探索(例如,探索大型语言模型在AWI中的应用)等方面。