TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports

Understanding the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community. This knowledge is often present in unstructured natural language text within threat analysis reports. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format. This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports. It leverages cyber domain-specific state-of-the-art natural language processing (NLP) to augment sentences for minority class TTPs and refine pinpointing the TTPs in threat analysis reports significantly. The knowledge of threat intelligence in terms of TTPs is essential for comprehensively understanding cyber threats and enhancing detection and mitigation strategies. We create two datasets: an augmented sentence-TTP dataset of 39,296 samples and a 149 real-world cyber threat intelligence report-to-TTP dataset. Further, we evaluate TTPXHunter on the augmented sentence dataset and the cyber threat reports. The TTPXHunter achieves the highest performance of 92.42% f1-score on the augmented dataset, and it also outperforms existing state-of-the-art solutions in TTP extraction by achieving an f1-score of 97.09% when evaluated over the report dataset. TTPXHunter significantly improves cybersecurity threat intelligence by offering quick, actionable insights into attacker behaviors. This advancement automates threat intelligence analysis, providing a crucial tool for cybersecurity professionals fighting cyber threats.

翻译：理解攻击者的作战方式有助于组织采用有效的防御策略并在社区内共享情报。这些知识通常以非结构化的自然语言文本形式存在于威胁分析报告中。需要一种翻译工具来解释威胁报告句子中描述的作战方式，并将其转化为结构化格式。本研究提出一种名为TTPXHunter的方法，用于从已完成网络威胁报告中自动提取以战术、技术和程序（TTPs）形式呈现的威胁情报。该方法利用网络安全领域专用的最新自然语言处理（NLP）技术，对少数类TTPs的句子进行增强，并显著提升威胁分析报告中TTPs的精准定位能力。以TTPs形式呈现的威胁情报知识对于全面理解网络威胁、增强检测与缓解策略至关重要。我们创建了两个数据集：一个包含39,296个样本的增强句子-TTP数据集，以及一个包含149份真实网络威胁情报报告-TTP数据集。此外，我们在增强句子数据集和网络威胁报告上对TTPXHunter进行了评估。TTPXHunter在增强数据集上取得了92.42%的F1分数最高性能，在报告数据集上评估时，其TTP提取F1分数达到97.09%，优于现有最新解决方案。TTPXHunter通过提供关于攻击者行为的快速可操作见解，显著提升了网络安全威胁情报水平。这一进步实现了威胁情报分析的自动化，为应对网络威胁的网络安全专业人员提供了关键工具。