ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs

In cybersecurity, security analysts constantly face the challenge of mitigating newly discovered vulnerabilities in real-time, with over 300,000 vulnerabilities identified since 1999. The sheer volume of known vulnerabilities complicates the detection of patterns for unknown threats. While LLMs can assist, they often hallucinate and lack alignment with recent threats. Over 40,000 vulnerabilities have been identified in 2024 alone, which are introduced after most popular LLMs' (e.g., GPT-5) training data cutoff. This raises a major challenge of leveraging LLMs in cybersecurity, where accuracy and up-to-date information are paramount. Therefore, we aim to improve the adaptation of LLMs in vulnerability analysis by mimicking how an analyst performs such tasks. We propose ProveRAG, an LLM-powered system designed to assist in rapidly analyzing vulnerabilities with automated retrieval augmentation of web data while self-evaluating its responses with verifiable evidence. ProveRAG incorporates a self-critique mechanism to help alleviate the omission and hallucination common in the output of LLMs applied in cybersecurity applications. The system cross-references data from verifiable sources (NVD and CWE), giving analysts confidence in the actionable insights provided. Our results indicate that ProveRAG excels in delivering verifiable evidence to the user with over 99% and 97% accuracy in exploitation and mitigation strategies, respectively. ProveRAG guides analysts to secure their systems more effectively by overcoming temporal and context-window limitations while also documenting the process for future audits.

翻译：在网络安全领域，安全分析师持续面临实时缓解新发现漏洞的挑战，自1999年以来已识别超过30万个漏洞。已知漏洞的巨大数量使得检测未知威胁的模式变得复杂。虽然大语言模型（LLM）可以提供协助，但它们经常产生幻觉且难以与最新威胁保持同步。仅2024年就已识别超过4万个漏洞，这些漏洞大多出现在主流大语言模型（如GPT-5）训练数据截止日期之后，这给在网络安全中运用大语言模型带来了重大挑战，因为该领域对准确性和信息时效性要求极高。为此，我们通过模拟分析师执行漏洞分析任务的方式，致力于提升大语言模型在漏洞分析中的适应性。本文提出ProveRAG——一个基于大语言模型的系统，旨在通过自动化检索增强网络数据来协助快速分析漏洞，同时利用可验证证据对自身响应进行自我评估。ProveRAG引入自我批判机制，以缓解大语言模型在网络安全应用中常见的输出遗漏和幻觉问题。该系统交叉验证来自可溯源数据库（NVD和CWE）的数据，使分析师对所提供的可操作见解具有充分信心。实验结果表明，ProveRAG在向用户提供可验证证据方面表现卓越，其漏洞利用策略和缓解策略的准确率分别超过99%和97%。通过突破时效性与上下文窗口限制，ProveRAG能有效指导分析师加强系统安全防护，同时完整记录分析过程以供未来审计追溯。