We introduce a new method for extracting structured threat behaviors from threat intelligence text. Our method is based on a multi-stage ranking architecture that allows jointly optimizing for efficiency and effectiveness. Therefore, we believe this problem formulation better aligns with the real-world nature of the task considering the large number of adversary techniques and the extensive body of threat intelligence created by security analysts. Our findings show that the proposed system yields state-of-the-art performance results for this task. Results show that our method has a top-3 recall performance of 81\% in identifying the relevant technique among 193 top-level techniques. Our tests also demonstrate that our system performs significantly better (+40\%) than the widely used large language models when tested under a zero-shot setting.
翻译:我们提出了一种从威胁情报文本中提取结构化威胁行为的新方法。该方法基于多阶段排序架构,能够在兼顾效率的同时优化有效性。我们认为,考虑到安全分析师生成的海量威胁情报文本中蕴含的大量对抗技术,该问题建模方式更贴合实际任务场景。实验表明,本系统在该任务上取得了当前最优性能。在193个顶层技术中识别相关技术时,本方法的前三位召回率达到81%。测试结果同时显示,在零样本测试环境下,本系统性能比广泛使用的大语言模型显著高出40%。