Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and significant degradation of the model. To address those challenges, we introduce a feature graph-based malware detection method, MFGraph, to characterize applications by learning feature-to-feature relationships to achieve improved detection accuracy while mitigating the impact of concept drift. In MFGraph, we construct a feature graph using static features extracted from binary PE files, then apply a deep graph convolutional network to learn the representation of the feature graph. Finally, we employ the representation vectors obtained from the output of a three-layer perceptron to differentiate between benign and malicious software. We evaluated our method on the EMBER dataset, and the experimental results demonstrate that it achieves an AUC score of 0.98756 on the malware detection task, outperforming other baseline models. Furthermore, the AUC score of MFGraph decreases by only 5.884% in one year, indicating that it is the least affected by concept drift.
翻译:恶意软件会严重破坏信息的完整性和可信度,且处于持续演化状态。现有基于特征融合的检测方法普遍忽略特征之间的关联性,而简单的特征拼接会降低模型的表征能力,导致检测精度下降。此外,这些方法容易受到概念漂移的影响,造成模型性能显著退化。针对上述挑战,我们提出了一种基于特征图的恶意软件检测方法MFGraph,通过学习特征间的关系表征应用软件,在提升检测精度的同时缓解概念漂移的影响。在MFGraph中,我们利用从二进制PE文件中提取的静态特征构建特征图,随后应用深度图卷积网络学习特征图的表示。最后,采用三层感知机输出的表征向量区分良性软件与恶意软件。我们在EMBER数据集上进行了评估,实验结果表明该方法在恶意软件检测任务中AUC得分达0.98756,优于其他基线模型。此外,MFGraph的AUC得分在一年内仅下降5.884%,表明其受概念漂移影响最小。