This paper investigates the application of Deep Reinforcement Learning (DRL) for attributing malware to specific Advanced Persistent Threat (APT) groups through detailed behavioural analysis. By analysing over 3500 malware samples from 12 distinct APT groups, the study utilises sophisticated tools like Cuckoo Sandbox to extract behavioural data, providing a deep insight into the operational patterns of malware. The research demonstrates that the DRL model significantly outperforms traditional machine learning approaches such as SGD, SVC, KNN, MLP, and Decision Tree Classifiers, achieving an impressive test accuracy of 89.27 %. It highlights the model capability to adeptly manage complex, variable, and elusive malware attributes. Furthermore, the paper discusses the considerable computational resources and extensive data dependencies required for deploying these advanced AI models in cybersecurity frameworks. Future research is directed towards enhancing the efficiency of DRL models, expanding the diversity of the datasets, addressing ethical concerns, and leveraging Large Language Models (LLMs) to refine reward mechanisms and optimise the DRL framework. By showcasing the transformative potential of DRL in malware attribution, this research advocates for a responsible and balanced approach to AI integration, with the goal of advancing cybersecurity through more adaptable, accurate, and robust systems.
翻译:本文研究了深度强化学习(DRL)在通过对恶意软件进行详细行为分析,从而将其归因于特定高级持续性威胁(APT)组织中的应用。通过分析来自12个不同APT组织的超过3500个恶意软件样本,本研究利用Cuckoo Sandbox等复杂工具提取行为数据,从而深入洞察恶意软件的操作模式。研究表明,DRL模型显著优于传统的机器学习方法,如SGD、SVC、KNN、MLP和决策树分类器,实现了高达89.27%的测试准确率。该模型突出展示了其能够熟练处理复杂、多变且难以捉摸的恶意软件属性的能力。此外,本文还讨论了在网络安全框架中部署这些先进人工智能模型所需的大量计算资源和广泛的数据依赖性。未来的研究旨在提高DRL模型的效率、扩展数据集的多样性、解决伦理问题,并利用大型语言模型(LLMs)来优化奖励机制和完善DRL框架。通过展示DRL在恶意软件归因方面的变革潜力,本研究倡导一种负责任且平衡的人工智能集成方法,旨在通过更具适应性、准确性和鲁棒性的系统来推进网络安全。