Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval

Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented generation (RAG) systems help language models access external knowledge, but traditional vector retrieval often struggles with queries that require reasoning over relationships between entities such as threat actors, malware, and vulnerabilities. This limitation arises because relevant evidence is often distributed across multiple text fragments and documents. Knowledge graphs address this challenge by enabling structured multi-hop reasoning through explicit representations of entities and relationships. However, multiple retrieval paradigms, including graph-based, agentic, and hybrid approaches, have emerged with different assumptions and failure modes. It remains unclear how these approaches compare in realistic CTI settings and when graph grounding improves performance. We present a systematic evaluation of four RAG architectures for CTI analysis: standard vector retrieval, graph-based retrieval over a CTI knowledge graph, an agentic variant that repairs failed graph queries, and a hybrid approach combining graph queries with text retrieval. We evaluate these systems on 3,300 CTI question-answer pairs spanning factual lookups, multi-hop relational queries, analyst-style synthesis questions, and unanswerable cases. Results show that graph grounding improves performance on structured factual queries. The hybrid graph-text approach improves answer quality by up to 35 percent on multi-hop questions compared to vector RAG, while maintaining more reliable performance than graph-only systems.

翻译：网络威胁情报分析师需在大量叙事性安全报告中回答复杂问题。检索增强生成系统可帮助语言模型获取外部知识，但传统向量检索在处理涉及威胁行为体、恶意软件及漏洞等实体间关系的推理查询时往往表现不佳。这一局限性源于相关证据常分散于多个文本片段与文档中。知识图谱通过显式建模实体及其关系，可实现结构化多跳推理以应对该挑战。然而，当前涌现出基于图、智能体及混合方法等多种检索范式，它们具有不同的假设与失效模式。尚不明确这些方法在真实网络威胁情报场景中的对比效果，以及何时通过图基座能提升性能。我们针对网络威胁情报分析，对四种检索增强生成架构进行了系统性评估：标准向量检索、基于网络威胁情报知识图谱的图检索、可修复失败图查询的智能体变体，以及结合图查询与文本检索的混合方法。基于涵盖事实查询、多跳关系查询、分析师式综合问题及不可回答案例的3,300个网络威胁情报问答对开展评估。结果表明，图基座可提升结构化事实查询的性能。混合图-文本方法在多跳问题上的答案质量相比向量检索增强生成提升达35%，同时比纯图系统保持更可靠的性能。