This Research proposes a Novel Reinforcement Learning (RL) model to optimise malware forensics investigation during cyber incident response. It aims to improve forensic investigation efficiency by reducing false negatives and adapting current practices to evolving malware signatures. The proposed RL framework leverages techniques such as Q-learning and the Markov Decision Process (MDP) to train the system to identify malware patterns in live memory dumps, thereby automating forensic tasks. The RL model is based on a detailed malware workflow diagram that guides the analysis of malware artefacts using static and behavioural techniques as well as machine learning algorithms. Furthermore, it seeks to address challenges in the UK justice system by ensuring the accuracy of forensic evidence. We conduct testing and evaluation in controlled environments, using datasets created with Windows operating systems to simulate malware infections. The experimental results demonstrate that RL improves malware detection rates compared to conventional methods, with the RL model's performance varying depending on the complexity and learning rate of the environment. The study concludes that while RL offers promising potential for automating malware forensics, its efficacy across diverse malware types requires ongoing refinement of reward systems and feature extraction methods.
翻译:本研究提出了一种新型强化学习模型,旨在优化网络事件响应过程中的恶意软件取证调查。该模型通过减少假阴性率并使现有实践适应不断演变的恶意软件特征,以提高取证调查效率。所提出的强化学习框架利用Q学习和马尔可夫决策过程等技术,训练系统识别实时内存转储中的恶意软件模式,从而实现取证任务的自动化。该强化学习模型基于详细的恶意软件工作流程图,该图指导使用静态与行为分析技术以及机器学习算法来分析恶意软件制品。此外,该研究旨在通过确保证据的准确性,应对英国司法体系中的相关挑战。我们在受控环境中使用基于Windows操作系统创建的模拟恶意软件感染数据集进行了测试与评估。实验结果表明,与传统方法相比,强化学习提高了恶意软件检测率,且强化学习模型的性能随环境复杂度和学习率的变化而有所不同。研究结论指出,虽然强化学习在自动化恶意软件取证方面展现出潜力,但其在不同类型恶意软件上的有效性仍需持续优化奖励系统和特征提取方法。