The construction of attack technique knowledge graphs aims to transform various types of attack knowledge into structured representations for more effective attack procedure modeling. Existing methods typically rely on textual data, such as Cyber Threat Intelligence (CTI) reports, which are often coarse-grained and unstructured, resulting in incomplete and inaccurate knowledge graphs. To address these issues, we expand attack knowledge sources by incorporating audit logs and static code analysis alongside CTI reports, providing finer-grained data for constructing attack technique knowledge graphs. We propose MultiKG, a fully automated framework that integrates multiple threat knowledge sources. MultiKG processes data from CTI reports, dynamic logs, and static code separately, then merges them into a unified attack knowledge graph. Through system design and the utilization of the Large Language Model (LLM), MultiKG automates the analysis, construction, and merging of attack graphs across these sources, producing a fine-grained, multi-source attack knowledge graph. We implemented MultiKG and evaluated it using 1,015 real attack techniques and 9,006 attack intelligence entries from CTI reports. Results show that MultiKG effectively extracts attack knowledge graphs from diverse sources and aggregates them into accurate, comprehensive representations. Through case studies, we demonstrate that our approach directly benefits security tasks such as attack reconstruction and detection.
翻译:攻击技术知识图谱的构建旨在将各类攻击知识转化为结构化表示,以实现更有效的攻击过程建模。现有方法通常依赖文本数据(如网络威胁情报报告),这类数据往往粒度较粗且非结构化,导致构建的知识图谱存在不完整和不准确的问题。为解决上述问题,我们通过整合审计日志与静态代码分析数据,在传统威胁情报报告基础上扩展了攻击知识来源,为构建攻击技术知识图谱提供了更细粒度的数据支撑。本文提出MultiKG——一个全自动的多源威胁知识集成框架。该框架分别处理来自威胁情报报告、动态日志和静态代码的数据,并将其融合为统一的攻击知识图谱。通过系统设计与大语言模型的运用,MultiKG实现了跨数据源的攻击图谱自动分析、构建与融合,最终生成细粒度的多源攻击知识图谱。我们实现了MultiKG系统,并使用1,015个真实攻击技术与9,006条威胁情报报告中的攻击情报条目进行评估。实验结果表明,MultiKG能够有效从多源数据中提取攻击知识图谱,并将其聚合为精确、全面的知识表示。通过案例研究,我们进一步证明该方法能够直接助力攻击重构与检测等安全任务。