The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationships. However, as the use of transformers for malware analysis is still in the infancy stage, it is critical to evaluate, systematize, and contextualize existing literature to foster future research. This Systematization of Knowledge (SoK) paper aims to provide a comprehensive analysis of transformer-based approaches designed for malware analysis. Based on our systematic analysis of existing knowledge, we structure and propose taxonomies based on: (a) how different transformers are adapted, organized, and modified across various use cases; and (b) how diverse feature types and their representation capabilities are reflected. We also provide an inventory of datasets used to explore multiple research avenues in the use of transformers for malware analysis and discuss open challenges with future research directions. We believe that this SoK paper will assist the research community in gaining detailed insights from existing work and will serve as a foundational resource for implementing novel research using transformers for malware analysis.
翻译:Transformer的引入是人工智能研究和应用的重要突破,因为它是生成式人工智能的基础。Transformer在网络安全领域,特别是恶意软件分析领域,具有广阔的应用前景。这得益于Transformer模型在处理长序列特征和理解上下文关系方面的灵活性。然而,由于Transformer在恶意软件分析中的应用仍处于起步阶段,评估、系统化和梳理现有文献以推动未来研究至关重要。本知识系统化(SoK)论文旨在对基于Transformer的恶意软件分析方法进行全面分析。基于对现有知识的系统分析,我们构建并提出了以下分类体系:(a)不同的Transformer如何在不同用例中被调整、组织和修改;(b)多样化的特征类型及其表示能力如何体现。我们还整理了用于探索Transformer在恶意软件分析中多种研究路径的数据集清单,并讨论了未来研究方向中的开放挑战。我们相信,本SoK论文将帮助研究社区从现有工作中获得详细见解,并作为利用Transformer进行恶意软件分析、开展创新研究的基础资源。