Machine Learning for Actionable Warning Identification: A Comprehensive Survey

Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 51 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI).

翻译：可操作警告识别（AWI）对于提升静态代码分析器的可用性起着至关重要的作用。随着机器学习（ML）的最新进展，已有多种方法被提出，将ML技术融入AWI。这些基于ML的AWI方法受益于ML从历史数据中学习细微及先前未见模式的强大能力，已展现出卓越的性能。然而，目前缺乏对这些方法的全面概述，这可能阻碍研究人员/从业者理解当前进展，并限制了在基于ML的AWI领域发现未来改进的潜力。本文系统性地回顾了最先进的基于ML的AWI方法。首先，我们采用严谨的综述方法，收集了从2000年1月1日至2023年9月1日期间的51项主要研究。接着，我们概述了典型的基于ML的AWI工作流程，包括警告数据集准备、预处理、AWI模型构建和评估阶段。在此工作流程中，我们根据警告输出格式对基于ML的AWI方法进行了分类。此外，我们分析了每个阶段所使用的技术，及其优势、劣势和分布情况。最后，我们为未来的基于ML的AWI方法提供了实用的研究方向，重点关注数据改进（例如，优化警告标注策略）和模型探索（例如，探索大型语言模型在AWI中的应用）等方面。

相关内容

Machine Learning

关注 2249

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日