Inferring protocol formats is critical for many security applications. However, existing format-inference techniques often miss many formats, because almost all of them are in a fashion of dynamic analysis and rely on a limited number of network packets to drive their analysis. If a feature is not present in the input packets, the feature will be missed in the resulting formats. We develop a novel static program analysis for format inference. It is well-known that static analysis does not rely on any input packets and can achieve high coverage by scanning every piece of code. However, for efficiency and precision, we have to address two challenges, namely path explosion and disordered path constraints. To this end, our approach uses abstract interpretation to produce a novel data structure called the abstract format graph. It delimits precise but costly operations to only small regions, thus ensuring precision and efficiency at the same time. Our inferred formats are of high coverage and precisely specify both field boundaries and semantic constraints among packet fields. Our evaluation shows that we can infer formats for a protocol in one minute with >95% precision and recall, much better than four baseline techniques. Our inferred formats can substantially enhance existing protocol fuzzers, improving the coverage by 20% to 260% and discovering 53 zero-days with 47 assigned CVEs. We also provide case studies of adopting our inferred formats in other security applications including traffic auditing and intrusion detection.
翻译:推断协议格式对众多安全应用至关重要。然而,现有格式推断技术常遗漏诸多格式,原因在于它们几乎均采用动态分析方式,并依赖有限数量的网络数据包驱动分析。若输入数据包中未包含某特征,该特征将在最终格式中被遗漏。我们提出了一种新颖的静态程序分析方法用于格式推断。众所周知,静态分析不依赖任何输入数据包,且可通过扫描每段代码实现高覆盖率。但为兼顾效率与精度,我们必须应对路径爆炸和路径约束无序两大挑战。为此,我们的方法采用抽象解释,生成一种名为抽象格式图的新型数据结构。它将精确但代价高昂的操作限定于小范围区域,从而同时保证精度与效率。我们推断出的格式具有高覆盖率,并能精确指定字段边界及数据包字段间的语义约束。评估结果显示,我们可在一分钟内推断出协议格式,精确率与召回率均超95%,显著优于四种基线技术。这些推断格式可大幅增强现有协议模糊测试工具,覆盖率提升20%至260%,并发现53个零日漏洞(其中47个已获CVE编号)。我们还通过案例研究展示了推断格式在流量审计与入侵检测等其他安全应用中的实际应用。