Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and significant degradation of the model. To address those challenges, we introduce a feature graph-based malware detection method, MFGraph, to characterize applications by learning feature-to-feature relationships to achieve improved detection accuracy while mitigating the impact of concept drift. In MFGraph, we construct a feature graph using static features extracted from binary PE files, then apply a deep graph convolutional network to learn the representation of the feature graph. Finally, we employ the representation vectors obtained from the output of a three-layer perceptron to differentiate between benign and malicious software. We evaluated our method on the EMBER dataset, and the experimental results demonstrate that it achieves an AUC score of 0.98756 on the malware detection task, outperforming other baseline models. Furthermore, the AUC score of MFGraph decreases by only 5.884% in one year, indicating that it is the least affected by concept drift.
翻译:恶意软件会严重损害信息的完整性与可信度,且处于持续演化状态。现有的基于特征融合的检测方法通常忽略特征间的关联性,而简单的特征拼接会降低模型表征能力,导致检测准确率低下。此外,这些方法易受概念漂移影响,导致模型性能显著下降。为应对这些挑战,我们提出一种基于特征图的恶意软件检测方法 MFGraph,通过学习特征间关系来表征应用程序,从而在提升检测准确率的同时缓解概念漂移的影响。在 MFGraph 中,我们利用从二进制 PE 文件中提取的静态特征构建特征图,随后应用深度图卷积网络学习特征图的表示。最后,我们通过三层感知器输出的表示向量来区分良性软件与恶意软件。我们在 EMBER 数据集上评估了所提方法,实验结果表明其在恶意软件检测任务中取得了 0.98756 的 AUC 分数,优于其他基线模型。此外,MFGraph 在一年内的 AUC 分数仅下降 5.884%,表明其受概念漂移的影响最小。