Optimization benchmarks play a fundamental role in assessing algorithm performance; however, existing artificial benchmarks often fail to capture the diversity and irregularity of real-world problem structures, while benchmarks derived from real-world problems are costly and difficult to construct. To address these challenges, we propose an evolutionary automatic benchmark generation framework that leverages a large language model (LLM) as a generative operator, termed the LLM-driven evolutionary benchmark generator (LLM-EBG). In this framework, the LLM serves as an evolutionary operator that generates and evolves benchmark problems within a flexible, expressive representation space. As a case study, we generate unconstrained single-objective continuous minimization problems represented as mathematical expressions designed to induce significant performance differences between a genetic algorithm (GA) and differential evolution (DE). Experimental results show that LLM-EBG successfully produces benchmark problems in which the designated target algorithm consistently outperforms the comparative algorithm in more than 80\% of trials. Furthermore, exploratory landscape analysis reveals that benchmarks favoring GA are highly sensitive to variable scaling, demonstrating that the proposed framework can generate problems with distinct geometric characteristics that reflect the intrinsic search behaviors of different optimization algorithms.
翻译:优化基准在评估算法性能方面发挥着基础性作用;然而,现有的人工基准往往无法捕捉现实世界问题结构的多样性和不规则性,而源自现实世界问题的基准则构建成本高昂且难度较大。为应对这些挑战,我们提出了一种进化式自动基准生成框架,该框架利用大语言模型作为生成算子,称为LLM驱动的进化基准生成器。在此框架中,LLM充当进化算子,在灵活、富有表现力的表示空间内生成并演化基准问题。作为案例研究,我们生成了以数学表达式表示的无约束单目标连续最小化问题,这些问题旨在诱导遗传算法与差分进化算法之间产生显著的性能差异。实验结果表明,LLM-EBG成功生成的基准问题中,指定目标算法在超过80%的试验中持续优于对比算法。此外,探索性景观分析表明,有利于GA的基准对变量缩放高度敏感,这证明所提框架能够生成具有不同几何特征的问题,这些特征反映了不同优化算法固有的搜索行为。