FuzzSlice: Pruning False Positives in Static Analysis Warnings Through Function-Level Fuzzing

Manual confirmation of static analysis reports is a daunting task. This is due to both the large number of warnings and the high density of false positives among them. Fuzzing techniques have been proposed to verify static analysis warnings. However, a major limitation is that fuzzing the whole project to reach all static analysis warnings is not feasible. This can take several days and exponential machine time to increase code coverage linearly. Therefore, we propose FuzzSlice, a novel framework that automatically prunes possible false positives among static analysis warnings. Unlike prior work that mostly focuses on confirming true positives among static analysis warnings, which requires end-to-end fuzzing, FuzzSlice focuses on ruling out potential false positives, which are the majority in static analysis reports. The key insight that we base our work on is that a warning that does not yield a crash when fuzzed at the function level in a given time budget is a possible false positive. To achieve this, FuzzSlice first aims to generate compilable code slices at the function level and then fuzzes these code slices instead of the entire binary. FuzzSlice is also unlikely to misclassify a true bug as a false positive because the crashing input can be reproduced by a fuzzer at the function level as well. We evaluate FuzzSlice on the Juliet synthetic dataset and real-world complex C projects. Our evaluation shows that the ground truth in the Juliet dataset had 864 false positives which were all detected by FuzzSlice. For the open-source repositories, we were able to get the developers from two of these open-source repositories to independently label these warnings. FuzzSlice automatically identifies 33 out of 53 false positives confirmed by developers in these two repositories. Thus FuzzSlice reduces false positives by 62.26% in the open-source repositories and by 100% in the Juliet dataset.

翻译：人工确认静态分析报告是一项艰巨的任务，这既源于警告数量庞大，也源于其中误报的高密度。已有研究提出利用模糊测试技术验证静态分析警告，但主要局限在于对整个项目进行模糊测试以覆盖所有静态分析警告并不可行——这种方式可能需要数天时间，且代码覆盖率线性增长所需的机器时间呈指数级增加。为此，我们提出FuzzSlice这一新型框架，该框架能自动削减静态分析警告中的潜在误报。与以往主要关注确认静态分析警告中真阳性（需端到端模糊测试）的工作不同，FuzzSlice专注于排除潜在误报——这类问题在静态分析报告中占绝大多数。我们的核心思路在于：在给定时间预算内对函数级代码切片进行模糊测试时，若某警告未触发崩溃，则该警告为潜在误报。为实现此目标，FuzzSlice首先生成可编译的函数级代码切片，随后对这些代码切片（而非完整二进制文件）执行模糊测试。由于函数级模糊器同样能复现引发崩溃的输入，FuzzSlice也几乎不会将真实缺陷误判为误报。我们在Juliet合成数据集和真实复杂C项目上评估了FuzzSlice。实验表明，Juliet数据集的真实标注中包含864个误报，FuzzSlice全部检测成功。针对开源代码仓库，我们获得了两个仓库开发者的独立标注反馈。FuzzSlice在这两个仓库中自动识别出开发人员确认的53个误报中的33个，从而将开源仓库的误报率降低62.26%，在Juliet数据集中实现100%的误报削减。