Security Knowledge-Guided Fuzzing of Deep Learning Libraries

Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner case test inputs. Furthermore, a substantial number of developer APIs crucial for library development remain untested, as they are typically not well-documented and lack clear usage guidelines. To fill this gap, we propose a novel fuzzer named Orion, which combines guided test input generation and corner case test input generation based on a set of fuzzing rules constructed from historical data that is known to trigger vulnerabilities in the implementation of DL APIs. To extract the fuzzing rules, we first conduct an empirical study regarding the root cause analysis of 376 vulnerabilities in two of the most popular DL libraries, i.e., PyTorch and TensorFlow. We then construct the rules based on the root causes of the historical vulnerabilities. Our evaluation shows that Orion reports 135 vulnerabilities on the latest releases of TensorFlow and PyTorch, 76 of which were confirmed by the library developers. Among the 76 confirmed vulnerabilities, 69 are previously unknown, and 7 have already been fixed. The rest are awaiting further confirmation. Regarding end-user APIs, Orion was able to detect 31.8% and 90% more vulnerabilities on TensorFlow and PyTorch, respectively, compared to the state-of-the-art conventional fuzzer, i.e., DeepRel. When compared to the state-of-the-art LLM-based DL fuzzer, AtlasFuzz, Orion detected 13.63% more vulnerabilities on TensorFlow and 18.42% more vulnerabilities on PyTorch. Regarding developer APIs, Orion stands out by detecting 117% more vulnerabilities on TensorFlow and 100% more vulnerabilities on PyTorch compared to the most relevant fuzzer designed for developer APIs, such as FreeFuzz.

翻译：近年来，众多深度学习模糊测试工具被提出用于测试深度学习库。然而，它们要么执行无引导的输入生成（例如，生成输入时未考虑API参数之间的关系），要么仅支持有限的边界案例测试输入。此外，大量对库开发至关重要的开发者API仍未得到测试，因为这些API通常缺乏良好文档且没有明确的使用指南。为填补这一空白，我们提出一种名为Orion的新型模糊测试工具，它结合了引导式测试输入生成与基于历史数据构建的模糊测试规则的边界案例测试输入生成，这些历史数据已知会触发深度学习API实现中的漏洞。为提取模糊测试规则，我们首先对两个最流行的深度学习库（即PyTorch和TensorFlow）中376个漏洞的根本原因分析进行了实证研究。然后基于历史漏洞的根源构建规则。评估表明，Orion在TensorFlow和PyTorch的最新版本上报告了135个漏洞，其中76个已获库开发者确认。在76个已确认漏洞中，69个为先前未知，7个已被修复，其余有待进一步确认。在终端用户API方面，与最先进的传统模糊测试工具DeepRel相比，Orion在TensorFlow和PyTorch上分别多检测出31.8%和90%的漏洞；与最先进的基于大语言模型的深度学习模糊测试工具AtlasFuzz相比，Orion在TensorFlow上多检测出13.63%的漏洞，在PyTorch上多检测出18.42%的漏洞。在开发者API方面，与专为开发者API设计的最相关模糊测试工具（如FreeFuzz）相比，Orion在TensorFlow上多检测出117%的漏洞，在PyTorch上多检测出100%的漏洞。