Dataflow analysis is a powerful code analysis technique that reasons dependencies between program values, offering support for code optimization, program comprehension, and bug detection. Existing approaches require the successful compilation of the subject program and customizations for downstream applications. This paper introduces LLMDFA, an LLM-powered dataflow analysis framework that analyzes arbitrary code snippets without requiring a compilation infrastructure and automatically synthesizes downstream applications. Inspired by summary-based dataflow analysis, LLMDFA decomposes the problem into three sub-problems, which are effectively resolved by several essential strategies, including few-shot chain-of-thought prompting and tool synthesis. Our evaluation has shown that the design can mitigate the hallucination and improve the reasoning ability, obtaining high precision and recall in detecting dataflow-related bugs upon benchmark programs, outperforming state-of-the-art (classic) tools, including a very recent industrial analyzer.
翻译:数据流分析是一种强大的代码分析技术,通过推理程序值之间的依赖关系,为代码优化、程序理解和缺陷检测提供支持。现有方法要求对待分析程序进行成功编译,并需针对下游应用进行定制。本文提出LLMDFA,一种基于大型语言模型的数据流分析框架,无需编译基础设施即可分析任意代码片段,并自动合成下游应用。受摘要式数据流分析启发,LLMDFA将问题分解为三个子问题,并通过包括少样本思维链提示和工具合成在内的若干关键策略有效解决。我们的评估表明,该设计能缓解幻觉现象、提升推理能力,在基准程序的数据流相关缺陷检测中实现了高精度和高召回率,性能超越包括最新工业级分析器在内的最先进(经典)工具。