Understanding the attack patterns associated with a cyberattack is crucial for comprehending the attacker's behaviors and implementing the right mitigation measures. However, majority of the information regarding new attacks is typically presented in unstructured text, posing significant challenges for security analysts in collecting necessary information. In this paper, we present a sentence classification system that can identify the attack techniques described in natural language sentences from cyber threat intelligence (CTI) reports. We propose a new method for utilizing auxiliary data with the same labels to improve classification for the low-resource cyberattack classification task. The system first trains the model using the augmented training data and then trains more using only the primary data. We validate our model using the TRAM data1 and the MITRE ATT&CK framework. Experiments show that our method enhances Macro-F1 by 5 to 9 percentage points and keeps Micro-F1 scores competitive when compared to the baseline performance on the TRAM dataset.
翻译:理解网络攻击相关的攻击模式对于掌握攻击者行为并实施恰当的缓解措施至关重要。然而,关于新型攻击的大部分信息通常以非结构化文本形式呈现,这给安全分析师收集必要信息带来了重大挑战。本文提出一种句子分类系统,能够从网络威胁情报报告中识别自然语言句子所描述的攻击技术。针对低资源网络攻击分类任务,我们提出一种利用具有相同标签的辅助数据来提升分类性能的新方法。该系统首先使用增强的训练数据对模型进行训练,随后仅使用主要数据进行进一步训练。我们使用TRAM数据1和MITRE ATT&CK框架验证了所提模型。实验表明,与TRAM数据集上的基线性能相比,我们的方法将Macro-F1提升了5至9个百分点,同时保持了具有竞争力的Micro-F1分数。