Principled Data-Driven Decision Support for Cyber-Forensic Investigations

In the wake of a cybersecurity incident, it is crucial to promptly discover how the threat actors breached security in order to assess the impact of the incident and to develop and deploy countermeasures that can protect against further attacks. To this end, defenders can launch a cyber-forensic investigation, which discovers the techniques that the threat actors used in the incident. A fundamental challenge in such an investigation is prioritizing the investigation of particular techniques since the investigation of each technique requires time and effort, but forensic analysts cannot know which ones were actually used before investigating them. To ensure prompt discovery, it is imperative to provide decision support that can help forensic analysts with this prioritization. A recent study demonstrated that data-driven decision support, based on a dataset of prior incidents, can provide state-of-the-art prioritization. However, this data-driven approach, called DISCLOSE, is based on a heuristic that utilizes only a subset of the available information and does not approximate optimal decisions. To improve upon this heuristic, we introduce a principled approach for data-driven decision support for cyber-forensic investigations. We formulate the decision-support problem using a Markov decision process, whose states represent the states of a forensic investigation. To solve the decision problem, we propose a Monte Carlo tree search based method, which relies on a k-NN regression over prior incidents to estimate state-transition probabilities. We evaluate our proposed approach on multiple versions of the MITRE ATT&CK dataset, which is a knowledge base of adversarial techniques and tactics based on real-world cyber incidents, and demonstrate that our approach outperforms DISCLOSE in terms of techniques discovered per effort spent.

翻译：在网络安全事件发生后，及时揭示威胁行为者如何突破安全防线至关重要，这有助于评估事件影响并制定部署防护对策以抵御后续攻击。为此，防御者可启动网络取证调查，发现威胁行为者在事件中使用的技术。此类调查的核心挑战在于需对特定技术的调查进行优先级排序，因为每项技术调查均需投入时间与精力，而取证分析师在完成调查前无法预知哪些技术被实际使用。为确保快速发现，必须为取证分析师提供辅助决策支持以优化优先级排序。近期研究表明，基于历史事件数据集的数据驱动决策支持可实现最优优先级排序。然而，这种名为DISCLOSE的数据驱动方法基于仅利用部分可用信息的启发式算法，无法逼近最优决策。为改进该启发式方法，我们提出一种基于原则的数据驱动网络取证调查决策支持方法。通过马尔可夫决策过程构建决策支持问题模型，其状态代表取证调查进展阶段。为解决该决策问题，我们提出基于蒙特卡洛树搜索的方法，该方法利用k近邻回归对历史事件进行状态转移概率估计。在基于真实网络攻击事件构建的对抗技术与战术知识库MITRE ATT&CK数据集的多个版本上，我们验证了所提方法在单位投入下的技术发现数量上优于DISCLOSE。