Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that runs several isolated searches in parallel and periodically migrates high-value candidates to maintain diversity. FunFuzz derives initial generation prompts from documentation and initializes islands with topic-specific instructions, then continuously adapts prompts using feedback-guided selection. During fuzzing, candidates are prioritized by incremental compiler coverage, while compiler-internal failure signals are used to identify crash-inducing inputs. We evaluate FunFuzz on compiler fuzzing, where inputs are source programs and success is measured by compiler coverage and unique compiler-internal failures. Across repeated 24-hour campaigns on GCC and Clang, FunFuzz achieves higher compiler coverage than previous LLM-driven baselines and discovers more unique failure-triggering inputs.
翻译:现代模糊测试工具越来越多地使用大语言模型(LLM)生成结构化输入,但LLM驱动的模糊测试对提示初始化与采样方差高度敏感,这可能降低探索效率并导致冗余输入。我们提出FunFuzz——一种多孤岛进化式模糊测试框架,通过并行运行多个独立搜索,并定期迁移高价值候选解以维持多样性。FunFuzz从文档中推导初始生成提示,以主题型指令初始化各孤岛,随后利用反馈引导的选择机制持续调整提示。在模糊测试过程中,候选解依据增量式编译器覆盖率进行优先级排序,同时利用编译器内部故障信号识别引发崩溃的输入。我们在编译器模糊测试场景下评估FunFuzz(输入为源程序,通过编译器覆盖率与独有的编译器内部故障数量衡量性能)。针对GCC和Clang的多次24小时测试显示,FunFuzz相比现有LLM驱动基线实现了更高的编译器覆盖率,并发现了更多独特的故障触发输入。