Security Knowledge-Guided Fuzzing of Deep Learning Libraries

There have been many Deep Learning (DL) fuzzers proposed in the literature. However, most of them only focused on high-level APIs that are used by users, which results in a large number of APIs used by library developers being untested. Additionally, they use general input generation rules to generate malformed inputs such as random value generation and boundary-input generation, which are ineffective to generate DL-specific malformed inputs. To fill this gap, we first conduct an empirical study regarding root cause analysis on 447 history security vulnerabilities of two of the most popular DL libraries, i.e., PyTorch and TensorFlow, for characterizing and understanding their malicious inputs. As a result, we categorize 18 rules regarding the construction of malicious inputs, which we believe can be used to generate effective malformed inputs for testing DL libraries. We further design and implement Orion, a new fuzzer that tests DL libraries by utilizing our malformed input generation rules mined from real-world deep learning security vulnerabilities. Specifically, Orion first collects API invocation code from various sources such as API documentation, source code, developer tests, and publicly available repositories on GitHub. Then Orion instruments these code snippets to dynamically trace execution information for each API such as parameters' types, shapes, and values. Then, Orion combines the malformed input generation rules and the dynamic execution information to create inputs to test DL libraries. Our evaluation on TensorFlow and PyTorch shows that Orion reports 143 bugs and 68 of which are previously unknown. Among the 68 new bugs, 58 have been fixed or confirmed by developers after we report them and the left are awaiting confirmation. Compared to the state-of-the-art DL fuzzers (i.e., FreeFuzz and DocTer), Orion detects 21% and 34% more bugs respectively.

翻译：文献中已提出许多深度学习模糊测试工具。然而，大多数仅关注用户使用的高级API，导致库开发者使用的大量API未经测试。此外，这些工具采用通用输入生成规则（如随机值生成和边界输入生成）来构造畸形输入，无法有效生成深度学习特有的畸形输入。为填补这一空白，我们首先对两个最流行的深度学习库（PyTorch和TensorFlow）的447个历史安全漏洞进行根因分析的实证研究，以刻画和理解其恶意输入特征。基于分析结果，我们归纳了18条恶意输入构建规则，这些规则可用于生成有效的畸形输入以测试深度学习库。我们进一步设计并实现了新型模糊测试工具Orion，通过利用从真实深度学习安全漏洞中挖掘的畸形输入生成规则来测试深度学习库。具体而言，Orion首先从API文档、源代码、开发者测试和GitHub公开仓库等多种来源收集API调用代码，然后对这些代码片段进行插桩，以动态追踪每个API的参数类型、形状和值等执行信息。接着，Orion结合畸形输入生成规则与动态执行信息创建输入，用于测试深度学习库。我们在TensorFlow和PyTorch上的评估表明，Orion报告了143个错误，其中68个为先前未知。在68个新发现的错误中，58个在报告后已被开发者修复或确认，其余等待确认。与最先进的深度学习模糊测试工具（FreeFuzz和DocTer）相比，Orion检测到的错误分别多出21%和34%。