vEcho: A Paradigm Shift from Vulnerability Verification to Proactive Discovery with Large Language Models

Static Application Security Testing (SAST) tools often suffer from high false positive rates, leading to alert fatigue that consumes valuable auditing resources. Recent efforts leveraging Large Language Models (LLMs) as filters offer limited improvements; however, these methods treat LLMs as passive, stateless classifiers, which lack project-wide context and the ability to learn from analyses to discover unknown, similar vulnerabilities. In this paper, we propose vEcho, a novel framework that transforms the LLM from a passive filter into a virtual security expert capable of learning, memory, and reasoning. vEcho equips its core reasoning engine with a robust developer tool suite for deep, context-aware verification. More importantly, we introduce a novel Echoic Vulnerability Propagation (EVP) mechanism. Driven by a Cognitive Memory Module that simulates human learning, EVP enables vEcho to learn from verified vulnerabilities and proactively infer unknown, analogous flaws, achieving a paradigm shift from passive verification to active discovery. Extensive experiments on the CWE-Bench-Java dataset demonstrate vEcho's dual advantages over the state-of-the-art baseline, IRIS. Specifically, vEcho achieves a 65% detection rate, marking a 41.8% relative improvement over IRIS's 45.83%. Crucially, it simultaneously addresses alert fatigue by reducing the false positive rate to 59.78%, a 28.3% relative reduction from IRIS's 84.82%. Furthermore, vEcho proactively identified 37 additional known vulnerabilities beyond the 120 documented in the dataset, and has discovered 51 novel 0-day vulnerabilities in open-source projects.

翻译：静态应用程序安全测试（SAST）工具通常存在误报率高的问题，导致警报疲劳，消耗宝贵的审计资源。近期利用大型语言模型（LLM）作为过滤器的研究改进有限；然而，这些方法将LLM视为被动的、无状态的分类器，缺乏项目范围的上下文以及从分析中学习以发现未知的、类似漏洞的能力。在本文中，我们提出了vEcho，这是一个新颖的框架，它将LLM从被动过滤器转变为具备学习、记忆和推理能力的虚拟安全专家。vEcho为其核心推理引擎配备了一套强大的开发者工具套件，用于进行深入的、上下文感知的验证。更重要的是，我们引入了一种新颖的回声式漏洞传播（EVP）机制。在模拟人类学习的认知记忆模块驱动下，EVP使vEcho能够从已验证的漏洞中学习，并主动推断未知的、类似的缺陷，实现了从被动验证到主动发现的范式转变。在CWE-Bench-Java数据集上进行的大量实验证明了vEcho相对于最先进基线IRIS的双重优势。具体而言，vEcho实现了65%的检测率，相对于IRIS的45.83%有41.8%的相对提升。至关重要的是，它同时通过将误报率降低至59.78%来应对警报疲劳问题，相对于IRIS的84.82%有28.3%的相对降低。此外，vEcho主动识别了数据集中记录的120个漏洞之外的37个额外已知漏洞，并在开源项目中发现了51个新颖的0-day漏洞。

相关内容

Iris (数据集)

关注 2

Iris数据集是常用的分类实验数据集，由Fisher, 1936收集整理。Iris也称鸢尾花卉数据集，是一类多重变量分析的数据集。数据集包含150个数据集，分为3类，每类50个数据，每个数据包含4个属性。可通过花萼长度，花萼宽度，花瓣长度，花瓣宽度4个属性预测鸢尾花卉属于（Setosa，Versicolour，Virginica）三个种类中的哪一类。

探索大型语言模型在网络安全中的作用：一项系统综述

专知会员服务

22+阅读 · 2025年4月27日

如何将领域知识注入大模型？最新《将领域特定知识注入大语言模型》综述

专知会员服务

79+阅读 · 2025年2月24日

【UIUC博士论文】迈向可信的大型语言模型，312页pdf

专知会员服务

41+阅读 · 2024年6月8日

大型语言模型在预测和异常检测中的应用综述

专知会员服务

70+阅读 · 2024年2月19日