PerfGen: Automated Performance Benchmark Generation for Big Data Analytics

Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent. However, due to the lack of inputs that can trigger such performance symptoms, it is hard to debug and test big data analytics. We design PerfGen to automatically generate inputs for the purpose of performance testing. PerfGen overcomes three challenges when naively using automated fuzz testing for the purpose of performance testing. First, typical greybox fuzzing relies on coverage as a guidance signal and thus is unlikely to trigger interesting performance behavior. Therefore, PerfGen provides performance monitor templates that a user can extend to serve as a set of guidance metrics for grey-box fuzzing. Second, performance symptoms may occur at an intermediate or later stage of a big data analytics pipeline. Thus, PerfGen uses a phased fuzzing approach. This approach identifies symptom-causing intermediate inputs at an intermediate stage first and then converts them to the inputs at the beginning of the program with a pseudo-inverse function generated by a large language model. Third, PerfGen defines sets of skew-inspired input mutations, which increases the chance of inducing performance problems. We evaluate PerfGen using four case studies. PerfGen achieves at least 11x speedup compared to a traditional fuzzing approach when generating inputs to trigger performance symptoms. Additionally, identifying intermediate inputs first and then converting them to original inputs enables PerfGen to generate such workloads in less than 0.004% of the iterations required by a baseline approach.

翻译：大数据分析中的诸多性能不佳症状（如计算倾斜、数据倾斜和内存倾斜）均具有输入依赖性。然而，由于缺乏能够触发此类性能症状的输入数据，大数据分析的调试与测试工作面临严峻挑战。本文设计PerfGen系统，旨在为性能测试目的自动生成输入数据。PerfGen解决了将自动化模糊测试直接应用于性能测试时面临的三大难题。首先，传统灰盒模糊测试以代码覆盖率为导向指标，难以触发具有研究价值的性能行为。为此，PerfGen提供可扩展的性能监控模板，用户可将其拓展为灰盒模糊测试的引导指标体系。其次，性能症状可能出现在大数据分析流水线的中间或后期阶段。PerfGen采用分阶段模糊测试策略：先识别中间阶段引发症状的中间输入，再通过大型语言模型生成的伪逆函数将其转换为程序起始端的输入数据。第三，PerfGen定义基于倾斜特征的输入变异规则集，显著提升诱发性能问题的概率。通过四个案例研究对PerfGen进行评估：在生成触发性能症状的输入数据时，PerfGen相比传统模糊测试方法实现至少11倍的加速；通过先识别中间输入再转换至原始输入的策略，PerfGen生成测试负载所需的迭代次数仅需基线方法的0.004%。