Malware represents a significant security concern in today's digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed (``zero-day") malware and are limited by customized virtual machine (VM) environments. To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. The generated network topologies are input into the GraphSAGE architecture to efficiently distinguish between benign and malicious software applications, with the operation names denoted as node features. Importantly, the GraphSAGE models analyze the network's topological geometry to make predictions, enabling them to detect state-of-the-art malware and prevent potential damage during execution in a VM. To evaluate our approach, we conduct a study on a dataset comprising source code from 24,376 applications, specifically written in C/C++, sourced directly from widely-recognized malware and various types of benign software. The results show a high detection performance with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 99.85%. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution when compared to current state-of-the-art malware detection methods.
翻译:恶意软件构成了当今数字环境中的重大安全威胁,因为它可能破坏或禁用操作系统、窃取敏感用户信息并占用宝贵的磁盘空间。然而,当前的恶意软件检测方法(如基于静态分析和动态分析的方法)难以识别新开发的(“零日”)恶意软件,并且受限于定制的虚拟机环境。为克服这些限制,我们提出了一种新颖的恶意软件检测方法,该方法利用了深度学习、数学技术和网络科学。我们的方法侧重于静态和动态分析,并利用低级虚拟机(LLVM)在复杂网络中对应用程序进行描述。生成的网络拓扑结构被输入GraphSAGE架构,以高效区分良性软件和恶意软件应用程序,其中操作名称被作为节点特征。重要的是,GraphSAGE模型通过分析网络的拓扑几何结构进行预测,从而能够检测最先进的恶意软件,并防止在虚拟机执行过程中可能造成的损害。为评估我们的方法,我们在一个包含24,376个应用程序源代码的数据集上进行了研究,这些代码专门使用C/C++编写,直接来源于广泛认可的恶意软件和各类良性软件。结果显示,检测性能优异,接收者操作特征曲线下面积(AUROC)达到99.85%。与当前最先进的恶意软件检测方法相比,我们的方法标志着恶意软件检测领域的重大改进,提供了更为准确和高效的解决方案。