Principled Data-Driven Decision Support for Cyber-Forensic Investigations

In the wake of a cybersecurity incident, it is crucial to promptly discover how the threat actors breached security in order to assess the impact of the incident and to develop and deploy countermeasures that can protect against further attacks. To this end, defenders can launch a cyber-forensic investigation, which discovers the techniques that the threat actors used in the incident. A fundamental challenge in such an investigation is prioritizing the investigation of particular techniques since the investigation of each technique requires time and effort, but forensic analysts cannot know which ones were actually used before investigating them. To ensure prompt discovery, it is imperative to provide decision support that can help forensic analysts with this prioritization. A recent study demonstrated that data-driven decision support, based on a dataset of prior incidents, can provide state-of-the-art prioritization. However, this data-driven approach, called DISCLOSE, is based on a heuristic that utilizes only a subset of the available information and does not approximate optimal decisions. To improve upon this heuristic, we introduce a principled approach for data-driven decision support for cyber-forensic investigations. We formulate the decision-support problem using a Markov decision process, whose states represent the states of a forensic investigation. To solve the decision problem, we propose a Monte Carlo tree search based method, which relies on a k-NN regression over prior incidents to estimate state-transition probabilities. We evaluate our proposed approach on multiple versions of the MITRE ATT&CK dataset, which is a knowledge base of adversarial techniques and tactics based on real-world cyber incidents, and demonstrate that our approach outperforms DISCLOSE in terms of techniques discovered per effort spent.

翻译：在网络安全事件发生后，迅速发现威胁行为者如何突破安全防护至关重要，以便评估事件影响，并制定和部署可抵御进一步攻击的应对措施。为此，防御者可启动网络取证调查，以发现威胁行为者在事件中使用的技术。此类调查面临的一个根本挑战在于优先调查特定技术，因为调查每项技术都需要时间和精力，但取证分析师在调查之前无法确定实际使用了哪些技术。为确保快速发现，必须提供决策支持以帮助取证分析师进行优先级排序。一项近期研究表明，基于先前事件数据集的数据驱动决策支持能够提供最先进的优先级排序。然而，这种名为DISCLOSE的数据驱动方法基于启发式算法，仅利用了部分可用信息，且未逼近最优决策。为改进此启发式算法，我们提出了一种基于原则的数据驱动决策支持方法，用于网络取证调查。我们利用马尔可夫决策过程对决策支持问题进行建模，其状态代表取证调查的进展状态。为解决该决策问题，我们提出了一种基于蒙特卡洛树搜索的方法，该方法依赖于对先前事件的k-NN回归来估计状态转移概率。我们在多个版本的MITRE ATT&CK数据集（一个基于真实网络事件的对抗技术与战术知识库）上评估了所提方法，并证明其在单位投入下发现的技术数量方面优于DISCLOSE。