Malware is a significant threat to the security of computer systems and networks which requires sophisticated techniques to analyze the behavior and functionality for detection. Traditional signature-based malware detection methods have become ineffective in detecting new and unknown malware due to their rapid evolution. One of the most promising techniques that can overcome the limitations of signature-based detection is to use control flow graphs (CFGs). CFGs leverage the structural information of a program to represent the possible paths of execution as a graph, where nodes represent instructions and edges represent control flow dependencies. Machine learning (ML) algorithms are being used to extract these features from CFGs and classify them as malicious or benign. In this survey, we aim to review some state-of-the-art methods for malware detection through CFGs using ML, focusing on the different ways of extracting, representing, and classifying. Specifically, we present a comprehensive overview of different types of CFG features that have been used as well as different ML algorithms that have been applied to CFG-based malware detection. We provide an in-depth analysis of the challenges and limitations of these approaches, as well as suggest potential solutions to address some open problems and promising future directions for research in this field.
翻译:恶意软件对计算机系统和网络安全构成重大威胁,需要复杂的技术来分析其行为和功能以进行检测。传统的基于签名的恶意软件检测方法因恶意软件快速演变而难以有效检测新型和未知恶意软件。控制流图(CFG)是克服签名检测局限性的最有前景的技术之一。CFG利用程序的结构信息,将可能的执行路径表示为图结构,其中节点表示指令,边表示控制流依赖关系。机器学习(ML)算法被用于从CFG中提取这些特征,并将其分类为恶意或良性。本综述旨在回顾通过CFG结合ML进行恶意软件检测的部分前沿方法,重点关注不同的提取、表示和分类方式。具体而言,我们全面概述了已采用的CFG特征类型,以及应用于基于CFG的恶意软件检测的各种ML算法。我们深入分析了这些方法面临的挑战与局限性,并提出了解决若干开放问题的潜在方案及该领域有前景的未来研究方向。