A timely software update is vital to combat the increasing security vulnerabilities. However, some software vendors may secretly patch their vulnerabilities without creating CVE entries or even describing the security issue in their change log. Thus, it is critical to identify these hidden security patches and defeat potential N-day attacks. Researchers have employed various machine learning techniques to identify security patches in open-source software, leveraging the syntax and semantic features of the software changes and commit messages. However, all these solutions cannot be directly applied to the binary code, whose instructions and program flow may dramatically vary due to different compilation configurations. In this paper, we propose BinGo, a new security patch detection system for binary code. The main idea is to present the binary code as code property graphs to enable a comprehensive understanding of program flow and perform a language model over each basic block of binary code to catch the instruction semantics. BinGo consists of four phases, namely, patch data pre-processing, graph extraction, embedding generation, and graph representation learning. Due to the lack of an existing binary security patch dataset, we construct such a dataset by compiling the pre-patch and post-patch source code of the Linux kernel. Our experimental results show BinGo can achieve up to 80.77% accuracy in identifying security patches between two neighboring versions of binary code. Moreover, BinGo can effectively reduce the false positives and false negatives caused by the different compilers and optimization levels.
翻译:及时的软件更新对于应对日益增长的安全漏洞至关重要。然而,部分软件厂商可能在不创建CVE条目甚至不记录安全问题的变更日志的情况下秘密修补漏洞。因此,识别这些隐蔽的安全补丁并抵御潜在的N-day攻击至关重要。研究人员已采用多种机器学习技术,利用软件变更的语法与语义特征以及提交日志来识别开源软件中的安全补丁。然而,这些方法均无法直接应用于二进制代码 —— 其指令与程序流会因编译配置不同而产生显著差异。本文提出BinGo,一种面向二进制代码的新型安全补丁检测系统。核心思想是将二进制代码表示为代码属性图以实现程序流的全面理解,并对每个基本块进行语言模型建模以捕获指令语义。BinGo包含四个阶段:补丁数据预处理、图提取、嵌入生成与图表示学习。针对现有二进制安全补丁数据集的缺失,我们通过编译Linux内核修补前与修补后的源代码构建了此类数据集。实验结果表明,BinGo在识别相邻版本二进制代码间安全补丁时准确率可达80.77%。此外,BinGo能有效降低由不同编译器及优化等级导致的误报与漏报。