Cyber threat attribution is the process of identifying the actor of an attack incident in cyberspace. An accurate and timely threat attribution plays an important role in deterring future attacks by applying appropriate and timely defense mechanisms. Manual analysis of attack patterns gathered by honeypot deployments, intrusion detection systems, firewalls, and via trace-back procedures is still the preferred method of security analysts for cyber threat attribution. Such attack patterns are low-level Indicators of Compromise (IOC). They represent Tactics, Techniques, Procedures (TTP), and software tools used by the adversaries in their campaigns. The adversaries rarely re-use them. They can also be manipulated, resulting in false and unfair attribution. To empirically evaluate and compare the effectiveness of both kinds of IOC, there are two problems that need to be addressed. The first problem is that in recent research works, the ineffectiveness of low-level IOC for cyber threat attribution has been discussed intuitively. An empirical evaluation for the measure of the effectiveness of low-level IOC based on a real-world dataset is missing. The second problem is that the available dataset for high-level IOC has a single instance for each predictive class label that cannot be used directly for training machine learning models. To address these problems in this research work, we empirically evaluate the effectiveness of low-level IOC based on a real-world dataset that is specifically built for comparative analysis with high-level IOC. The experimental results show that the high-level IOC trained models effectively attribute cyberattacks with an accuracy of 95% as compared to the low-level IOC trained models where accuracy is 40%.
翻译:网络威胁归因是识别网络空间攻击事件行为体的过程。准确且及时的威胁归因通过应用适当且及时的防御机制,在震慑未来攻击中发挥重要作用。安全分析师仍倾向于采用人工分析方法,对蜜罐部署、入侵检测系统、防火墙以及追踪溯源程序收集的攻击模式进行归因。此类攻击模式属于低级入侵指标(IOC),代表对手在行动中使用的战术、技术、程序(TTP)及软件工具。对手极少复用这些指标,且可能对其进行操纵,导致虚假或不公平的归因。为实证评估并比较两类IOC的有效性,需解决两个问题:其一,近期研究仅凭直觉讨论了低级IOC在威胁归因中的低效性,缺乏基于真实数据集对低级IOC有效性的实证评估;其二,高级IOC的可用数据集中,每个预测类别标签仅包含单个实例,无法直接用于训练机器学习模型。为解决上述问题,本研究基于专为与高级IOC对比分析而构建的真实数据集,对低级IOC的有效性进行实证评估。实验结果表明,经高级IOC训练的模型以95%的准确率有效归因网络攻击,而经低级IOC训练的模型准确率仅为40%。