Recently, deep learning (DL) approaches to vulnerability detection have gained significant traction. These methods demonstrate promising results, often surpassing traditional static code analysis tools in effectiveness. In this study, we explore a novel approach to vulnerability detection utilizing the tools from topological data analysis (TDA) on the attention matrices of the BERT model. Our findings reveal that traditional machine learning (ML) techniques, when trained on the topological features extracted from these attention matrices, can perform competitively with pre-trained language models (LLMs) such as CodeBERTa. This suggests that TDA tools, including persistent homology, are capable of effectively capturing semantic information critical for identifying vulnerabilities.
翻译:近年来,基于深度学习(DL)的漏洞检测方法受到广泛关注。这些方法展现出良好的效果,在检测效能上常超越传统的静态代码分析工具。本研究探索了一种新颖的漏洞检测方法:通过对BERT模型注意力矩阵进行拓扑数据分析(TDA),利用其提取的特征实现检测。研究发现,基于从注意力矩阵提取的拓扑特征训练的传统机器学习(ML)方法,其性能可与CodeBERTa等预训练语言模型(LLM)相媲美。这表明包括持续同调在内的TDA工具能有效捕捉对识别漏洞至关重要的语义信息。