Fuzzing has become a popular technique for automatically detecting vulnerabilities and bugs by generating unexpected inputs. In recent years, the fuzzing process has been integrated into continuous integration workflows (i.e., continuous fuzzing), enabling short and frequent testing cycles. Despite its widespread adoption, prior research has not examined whether the effectiveness of continuous fuzzing varies across programming languages. This study conducts a large-scale cross-language analysis to examine how fuzzing bug characteristics and detection efficiency differ among languages. We analyze 61,444 fuzzing bugs and 999,248 builds from 559 OSS-Fuzz projects categorized by primary language. Our findings reveal that (i) C++ and Rust exhibit higher fuzzing bug detection frequencies, (ii) Rust and Python show low vulnerability ratios but tend to expose more critical vulnerabilities, (iii) crash types vary across languages and unreproducible bugs are more frequent in Go but rare in Rust, and (iv) Python attains higher patch coverage but suffers from longer time-to-detection. These results demonstrate that fuzzing behavior and effectiveness are strongly shaped by language design, providing insights for language-aware fuzzing strategies and tool development.
翻译:模糊测试已成为通过生成意外输入来自动检测漏洞与缺陷的流行技术。近年来,模糊测试流程已被集成至持续集成工作流(即持续模糊测试),实现了短周期高频测试。尽管该技术已被广泛采用,现有研究尚未探讨持续模糊测试的有效性是否因编程语言而异。本研究通过大规模跨语言分析,探究不同语言间模糊测试缺陷的特征与检测效率差异。我们分析了来自559个按主要语言分类的OSS-Fuzz项目的61,444个模糊测试缺陷与999,248次构建。研究发现表明:(i) C++与Rust表现出更高的模糊测试缺陷检测频率;(ii) Rust与Python的漏洞比例较低,但倾向于暴露更严重的关键漏洞;(iii) 崩溃类型因语言而异,且不可复现缺陷在Go中更常见,而在Rust中极少出现;(iv) Python能实现更高的补丁覆盖率,但缺陷检测耗时更长。这些结果证明模糊测试的行为模式与有效性受语言设计的显著影响,为开发语言感知的模糊测试策略与工具提供了重要启示。