TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports

Understanding the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community. This knowledge is often present in unstructured natural language text within threat analysis reports. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format. This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports. It leverages cyber domain-specific state-of-the-art natural language processing (NLP) to augment sentences for minority class TTPs and refine pinpointing the TTPs in threat analysis reports significantly. The knowledge of threat intelligence in terms of TTPs is essential for comprehensively understanding cyber threats and enhancing detection and mitigation strategies. We create two datasets: an augmented sentence-TTP dataset of 39,296 samples and a 149 real-world cyber threat intelligence report-to-TTP dataset. Further, we evaluate TTPXHunter on the augmented sentence dataset and the cyber threat reports. The TTPXHunter achieves the highest performance of 92.42% f1-score on the augmented dataset, and it also outperforms existing state-of-the-art solutions in TTP extraction by achieving an f1-score of 97.09% when evaluated over the report dataset. TTPXHunter significantly improves cybersecurity threat intelligence by offering quick, actionable insights into attacker behaviors. This advancement automates threat intelligence analysis, providing a crucial tool for cybersecurity professionals fighting cyber threats.

翻译：理解对手的作战手法有助于组织采用高效的防御策略并在社区内共享情报。这些知识通常以非结构化的自然语言文本形式存在于威胁分析报告中。需要一种翻译工具来解读威胁报告句子中阐述的作战手法，并将其转化为结构化格式。本研究提出了一种名为TTPXHunter的方法，用于从完成的网络威胁报告中自动提取以战术、技术和程序（TTP）表示的威胁情报。该方法利用网络领域最先进的自然语言处理技术，对少数类TTP的句子进行增强，并显著提高威胁分析报告中TTP定位的精确度。以TTP形式表示的威胁情报知识对于全面理解网络威胁及增强检测与缓解策略至关重要。我们构建了两个数据集：一个包含39,296个样本的增强句子-TTP数据集，以及一个包含149份真实世界网络威胁情报报告的报告-TTP数据集。此外，我们在增强句子数据集和网络威胁报告上评估了TTPXHunter。TTPXHunter在增强数据集上取得了92.42%的最高F1分数，在报告数据集上评估时，其TTP提取的F1分数达到97.09%，超越了现有最先进的解决方案。TTPXHunter通过提供关于攻击者行为的快速、可操作洞察，显著改善了网络安全威胁情报。这一进展实现了威胁情报分析的自动化，为应对网络威胁的网络安全专业人员提供了关键工具。