Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference is fragile when the logics of processing input messages may vary among different protocol implementations, and (2) the semantic inference is limited by inadequate and inaccurate inference rules. To tackle these challenges, we present BinPRE, a binary analysis based PRE tool. BinPRE incorporates (1) an instruction-based semantic similarity analysis strategy for format extraction; (2) a novel library composed of atomic semantic detectors for improving semantic inference adequacy; and (3) a cluster-and-refine paradigm to further improve semantic inference accuracy. We have evaluated BinPRE against five existing PRE tools, including Polyglot, AutoFormat, Tupni, BinaryInferno and DynPRE. The evaluation results on eight widely-used protocols show that BinPRE outperforms the prior PRE tools in both format and semantic inference. BinPRE achieves the perfection of 0.73 on format extraction and the F1-score of 0.74 (0.81) on semantic inference of types (functions), respectively. The field inference results of BinPRE have helped improve the effectiveness of protocol fuzzing by achieving 5-29% higher branch coverage, compared to those of the best prior PRE tool. BinPRE has also helped discover one new zero-day vulnerability, which otherwise cannot be found.
翻译:协议逆向工程(PRE)旨在缺乏源代码的情况下推断网络协议的规范。具体而言,字段推断是PRE中推断字段格式与语义的关键步骤。基于二进制分析的PRE技术是执行字段推断的主要方法类别之一。然而,此类技术面临两个关键挑战:(1) 当不同协议实现中处理输入消息的逻辑可能变化时,格式推断较为脆弱;(2) 语义推断受限于不充分且不准确的推断规则。为应对这些挑战,我们提出了BinPRE,一种基于二进制分析的PRE工具。BinPRE融合了:(1) 基于指令的语义相似性分析策略以提取格式;(2) 由原子语义检测器组成的新型库以提升语义推断的充分性;(3) 聚类-精炼范式以进一步提高语义推断的准确性。我们已将BinPRE与五种现有PRE工具(包括Polyglot、AutoFormat、Tupni、BinaryInferno和DynPRE)进行了对比评估。在八个广泛使用的协议上的评估结果表明,BinPRE在格式与语义推断方面均优于先前的PRE工具。BinPRE在格式提取上达到了0.73的完美度,在类型(功能)的语义推断上分别取得了0.74(0.81)的F1分数。与先前最佳PRE工具相比,BinPRE的字段推断结果通过实现5-29%更高的分支覆盖率,提升了协议模糊测试的有效性。BinPRE还帮助发现了一个原本无法找到的新零日漏洞。