BertRLFuzzer: A BERT and Reinforcement Learning based Fuzzer

We present a novel tool BertRLFuzzer, a BERT and Reinforcement Learning (RL) based fuzzer aimed at finding security vulnerabilities. BertRLFuzzer works as follows: given a list of seed inputs, the fuzzer performs grammar-adhering and attack-provoking mutation operations on them to generate candidate attack vectors. The key insight of BertRLFuzzer is the combined use of two machine learning concepts. The first one is the use of semi-supervised learning with language models (e.g., BERT) that enables BertRLFuzzer to learn (relevant fragments of) the grammar of a victim application as well as attack patterns, without requiring the user to specify it explicitly. The second one is the use of RL with BERT model as an agent to guide the fuzzer to efficiently learn grammar-adhering and attack-provoking mutation operators. The RL-guided feedback loop enables BertRLFuzzer to automatically search the space of attack vectors to exploit the weaknesses of the given victim application without the need to create labeled training data. Furthermore, these two features together enable BertRLFuzzer to be extensible, i.e., the user can extend BertRLFuzzer to a variety of victim applications and attack vectors automatically (i.e., without explicitly modifying the fuzzer or providing a grammar). In order to establish the efficacy of BertRLFuzzer we compare it against a total of 13 black box and white box fuzzers over a benchmark of 9 victim websites. We observed a significant improvement in terms of time to first attack (54% less than the nearest competing tool), time to find all vulnerabilities (40-60% less than the nearest competing tool), and attack rate (4.4% more attack vectors generated than the nearest competing tool). Our experiments show that the combination of the BERT model and RL-based learning makes BertRLFuzzer an effective, adaptive, easy-to-use, automatic, and extensible fuzzer.

翻译：我们提出一款新型工具BertRLFuzzer，这是一种基于BERT与强化学习的模糊测试工具，旨在发现安全漏洞。BertRLFuzzer的运行机制如下：给定种子输入列表后，该工具对其进行遵循语法且能引发攻击的变异操作，生成候选攻击向量。其核心思路在于联合运用两种机器学习概念：首先，通过语言模型（如BERT）进行半监督学习，使BertRLFuzzer无需用户明确指定即可学习目标应用的语法（相关片段）与攻击模式；其次，将BERT模型作为智能体与强化学习结合，引导模糊测试器高效学习既符合语法又能诱发攻击的变异算子。该强化学习驱动的反馈循环使BertRLFuzzer无需手动创建标注训练数据，即可自动搜索攻击向量空间以利用目标应用的脆弱性。此外，这两项特性共同赋予BertRLFuzzer可扩展性——用户可自动将其扩展到多种目标应用与攻击向量场景，无需显式修改测试器或提供语法规则。为验证BertRLFuzzer的有效性，我们将其与13款黑盒及白盒模糊测试工具在9个目标网站基准上进行对比。结果显示，BertRLFuzzer在首次攻击时间（比最接近的竞品工具减少54%）、发现全部漏洞时间（比最接近的竞品工具减少40%-60%）以及攻击效率（生成的攻击向量比最接近竞品多4.4%）方面均取得显著提升。实验表明，BERT模型与强化学习学习的结合使BertRLFuzzer成为一款高效、自适应、易用、自动化且可扩展的模糊测试工具。