Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize comprehensive vulnerability analysis reports. Given vulnerable source code, RAVEN generates reports following the Google Project Zero Root Cause Analysis template. The framework uses four modules: an Explorer agent for vulnerability identification, a RAG engine retrieving relevant knowledge from curated databases including Google Project Zero reports and CWE entries, an Analyst agent for impact and exploitation assessment, and a Reporter agent for structured report generation. To ensure quality, RAVEN includes a task specific LLM Judge evaluating reports across structural integrity, ground truth alignment, code reasoning quality, and remediation quality. We evaluate RAVEN on 105 vulnerable code samples covering 15 CWE types from the NIST-SARD dataset. Results show an average quality score of 54.21%, supporting the effectiveness of our approach for automated vulnerability documentation.
翻译:大型语言模型(LLMs)已在各类网络安全任务中展现出卓越能力,包括漏洞分类、检测与修复。然而,其在自动化漏洞报告文档编制与分析方面的潜力仍有待深入探索。本文提出RAVEN(检索增强型漏洞探索网络),这是一个利用LLM智能体与检索增强生成技术合成综合漏洞分析报告的框架。针对存在漏洞的源代码,RAVEN遵循Google Project Zero根因分析模板生成报告。该框架包含四个模块:用于漏洞识别的探索者智能体、可从Google Project Zero报告与CWE条目等精选数据库中检索相关知识的RAG引擎、用于影响与利用评估的分析者智能体,以及用于结构化报告生成的报告者智能体。为确保质量,RAVEN内置了任务专用的LLM评判器,从结构完整性、真实标注对齐度、代码推理质量及修复建议质量四个维度评估报告。我们在包含NIST-SARD数据集中15种CWE类型的105个漏洞代码样本上进行了评估。结果显示平均质量得分为54.21%,验证了我们的方法在自动化漏洞文档编制中的有效性。