Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. More recently, the application of such techniques on graph-structured data has achieved state-of-the-art performance in various domains and demonstrates promising results in learning more robust representations from malware. Yet, no literature review focusing on graph-based deep learning for malware detection exists. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures, leading to an efficient detection by downstream classifiers. This paper also reviews adversarial attacks that are utilized to fool graph-based detection methods. Challenges and future research directions are discussed at the end of the paper.
翻译:恶意软件检测因恶意软件数量与复杂性的增加而备受关注。基于签名和启发式的传统检测方法被用于恶意软件检测,但遗憾的是,这些方法对未知攻击的泛化能力较差,且易被混淆技术规避。近年来,机器学习(ML)尤其是深度学习(DL)通过从数据中学习有效表征,在恶意软件检测中取得了显著成果,并成为优于传统方法的解决方案。最近,此类技术在图表征数据上的应用已在多个领域达到最先进性能,并在从恶意软件中学习更鲁棒的表征方面展现出前景。然而,目前尚无聚焦于基于图的深度学习用于恶意软件检测的文献综述。本综述通过深入文献回顾,总结并统一了现有工作在常用方法与架构下的研究成果。我们特别证明了图神经网络(GNN)在从以表达性图结构表示的恶意软件中学习鲁棒嵌入方面具有竞争力,从而通过下游分类器实现高效检测。本文还综述了用于欺骗基于图检测方法的对抗攻击。最后讨论了面临的挑战与未来研究方向。