Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results compared to traditional pattern-based approaches. However, performing static taint analysis for JavaScript poses two major challenges. First, JavaScript's dynamic features complicate data flow extraction required for taint tracking. Second, npm's large library ecosystem makes it difficult to identify relevant sources/sinks and establish taint propagation across dependencies. In this paper, we present SemTaint, a multi-agent system that strategically combines the semantic understanding of Large Language Models (LLMs) with traditional static program analysis to extract taint specifications, including sources, sinks, call edges, and library flow summaries tailored to each package. Conceptually, SemTaint uses static program analysis to calculate a call graph and defers to an LLM to resolve call edges that cannot be resolved statically. Further, it uses the LLM to classify sources and sinks for a given CWE. The resulting taint specification is then provided to a SAST tool, which performs vulnerability analysis. We integrate SemTaint with CodeQL, a state-of-the-art SAST tool, and demonstrate its effectiveness by detecting 106 of 162 vulnerabilities previously undetectable by CodeQL. Furthermore, we find 4 novel vulnerabilities in 4 popular npm packages. In doing so, we demonstrate that LLMs can practically enhance existing static program analysis algorithms, combining the strengths of both symbolic reasoning and semantic understanding for improved vulnerability detection.
翻译:采用污点分析的静态应用安全测试工具被广泛认为相比传统基于模式的方法能提供更高质量的漏洞检测结果。然而,对JavaScript进行静态污点分析面临两大挑战。首先,JavaScript的动态特性使得污点追踪所需的数据流提取变得复杂。其次,npm庞大的库生态系统使得识别相关源/汇以及建立跨依赖项的污点传播变得困难。本文提出SemTaint,这是一个多智能体系统,它策略性地结合大型语言模型的语义理解与传统静态程序分析技术,以提取针对每个软件包定制的污点规范,包括源、汇、调用边和库流摘要。从概念上讲,SemTaint使用静态程序分析计算调用图,并委托大型语言模型解析无法静态解析的调用边。此外,它利用大型语言模型对给定通用弱点枚举分类中的源和汇进行分类。生成的污点规范随后提供给静态应用安全测试工具以执行漏洞分析。我们将SemTaint与最先进的静态应用安全测试工具CodeQL集成,通过检测出162个先前CodeQL无法检测的漏洞中的106个,证明了其有效性。此外,我们在4个流行的npm包中发现了4个新型漏洞。通过这项工作,我们证明大型语言模型能够切实增强现有静态程序分析算法,结合符号推理与语义理解的优势,实现改进的漏洞检测。