Many software projects employ manual code review to gatekeep defects and vulnerabilities in the code before integration. However, reviewers often work under time pressure and rely primarily on static inspection, leaving the dynamic aspects of the program unexplored. Dynamic analyses could reveal such behaviors, but they are rarely integrated into reviews. Among them, fuzzing is typically applied later to uncover crashing bugs. Yet its ability to exercise code with diverse inputs makes it promising for exposing non-crashing, but unexpected, behaviors earlier. Still, without suitable mechanisms to analyze program behaviors, the rich data produced during fuzzing remains inaccessible to reviewers, limiting its practical value in this context. We hypothesize that unexpected variations in program behaviors could signify potential bugs. The impact of code changes can be automatically captured at runtime. Representing program behavior as likely invariants, dynamic properties consistently observed at specific program points, can provide practical signals of behavioral changes. Such signals offer a way to distinguish between intended changes and unexpected behavioral shifts from code changes. We present FuzzSight, a framework that leverages likely invariants from non-crashing fuzzing inputs to highlight behavioral differences across program versions. By surfacing such differences, it provides insights into which code blocks may need closer attention. In our evaluation, FuzzSight flagged 75% of regression bugs and up to 80% of vulnerabilities uncovered by 24-hour fuzzing. It also outperformed SAST in identifying buggy code blocks, achieving ten times higher detection rates with fewer false alarms. In summary, FuzzSight demonstrates the potential and value of leveraging fuzzing and invariant analysis for early-stage code review, bridging static inspection with dynamic behavioral insights.
翻译:许多软件项目采用人工代码审查机制,旨在代码集成前拦截缺陷与安全漏洞。然而,审查者常在时间压力下工作,且主要依赖静态检查,致使程序的动态特性未能得到充分探查。动态分析本可揭示此类行为,却鲜少被整合至审查流程中。其中,模糊测试通常被后期用于发现崩溃性缺陷,但其通过多样化输入执行代码的能力,使其在早期暴露非崩溃性异常行为方面颇具潜力。然而,若无合适的程序行为分析机制,模糊测试过程中产生的丰富数据对审查者而言仍难以利用,限制了其在此场景中的实用价值。我们提出假设:程序行为的意外变异可能预示着潜在缺陷。代码变更的影响可在运行时自动捕获。将程序行为表征为可能不变式——即在特定程序点持续观测到的动态属性——可为行为变化提供实用的信号指示。此类信号提供了一种区分预期变更与代码变更引发的意外行为偏移的方法。本文提出FuzzSight框架,该框架利用非崩溃模糊测试输入产生的可能不变式,突显不同程序版本间的行为差异。通过呈现此类差异,该框架可揭示哪些代码块可能需要更细致的审查。评估结果表明,FuzzSight成功标记了24小时模糊测试所发现回归缺陷的75%及安全漏洞的80%。在识别缺陷代码块方面,其表现优于静态应用安全测试工具,检测率提升十倍且误报更少。综上所述,FuzzSight证明了融合模糊测试与不变式分析在早期代码审查中的潜力与价值,实现了静态检查与动态行为洞察的有效衔接。