Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 50 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI).
翻译:可操作警告识别(AWI)在提升静态代码分析工具实用性方面发挥着关键作用。随着机器学习(ML)的近期进展,多种将ML技术整合到AWI中的方法已被提出。这些基于ML的AWI方法,得益于ML从历史数据中学习细微且前所未见模式的强大能力,展现出优越的性能。然而,目前尚缺乏对这些方法的全面概述,这可能导致研究人员/从业者难以理解当前流程,并发现基于ML的AWI社区未来改进的潜力。本文系统性地综述了最先进的基于ML的AWI方法。首先,我们采用严谨的综述方法,搜集了2000年1月1日至2023年9月1日期间的50项主要研究。接着,我们概述了典型的基于ML的AWI工作流程,包括警告数据集准备、预处理、AWI模型构建和评估阶段。在此工作流程中,我们根据警告输出格式对基于ML的AWI方法进行分类。此外,我们分析了各阶段使用的技术及其优势、劣势和分布。最后,我们为未来基于ML的AWI方法提供了切实可行的研究方向,重点关注数据改进(例如,增强警告标注策略)和模型探索(例如,探索用于AWI的大语言模型)等方面。