Literature in traditional program fuzzing has confirmed that effectiveness is largely impacted by redundancy among initial seeds, thereby proposing a series of seed selection methods. JVM fuzzing, compared to traditional ones, presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. However, it remains unclear whether the existing seed selection methods are suitable for JVM fuzzing and whether utilizing program features can enhance effectiveness. To address this, we devise a total of 10 initial seed selection methods, comprising coverage-based, prefuzz-based, and program-feature-based methods. We then conduct an empirical study on three JVM implementations to extensively evaluate the performance of the seed selection methods within two SOTA fuzzing techniques (JavaTailor and VECT). Specifically, we examine performance from three aspects: (i) effectiveness and efficiency using widely studied initial seeds, (ii) effectiveness using the programs in the wild, and (iii) the ability to detect new bugs. Evaluation results first show that the program-feature-based method that utilizes the control flow graph not only has a significantly lower time overhead (i.e., 30s), but also outperforms other methods, achieving 142% to 269% improvement compared to the full set of initial seeds. Second, results reveal that the initial seed selection greatly improves the quality of wild programs and exhibits complementary effectiveness by detecting new behaviors. Third, results demonstrate that given the same testing period, initial seed selection improves the JVM fuzzing techniques by detecting more unknown bugs. Particularly, 21 out of the 25 detected bugs have been confirmed or fixed by developers. This work takes the first look at initial seed selection in JVM fuzzing, confirming its importance in fuzzing effectiveness and efficiency.
翻译:传统程序模糊测试领域的研究已证实,初始种子间的冗余性会显著影响测试效果,因此提出了一系列种子选择方法。相较于传统模糊测试,JVM模糊测试具有独特特性,包括大规模复杂代码结构以及兼具语法和语义特征的程序。然而,现有种子选择方法是否适用于JVM模糊测试,以及利用程序特征能否提升测试效果,目前尚不明确。为此,我们设计了共计10种初始种子选择方法,涵盖基于覆盖率、预模糊测试和基于程序特征的方法。随后,我们在三种JVM实现上开展实证研究,通过两种先进模糊测试技术(JavaTailor和VECT)全面评估种子选择方法的性能。具体从三个维度进行考察:(i)使用广泛研究的初始种子时的效能与效率;(ii)使用实际环境程序时的有效性;(iii)检测新缺陷的能力。评估结果首先表明,基于控制流图的程序特征方法不仅具有显著更低的时间开销(约30秒),且性能优于其他方法,相较于完整初始种子集合实现了142%至269%的效能提升。其次,研究发现初始种子选择能大幅提升实际环境程序的质量,并通过检测新行为展现出互补效应。第三,结果表明在相同测试周期内,初始种子选择通过检测更多未知缺陷改进了JVM模糊测试技术。特别值得注意的是,在检测到的25个缺陷中,已有21个获得开发者确认或修复。本研究首次系统探讨JVM模糊测试中的初始种子选择问题,证实了其对测试效能与效率的重要影响。