Benchmarking in continuous black-box optimisation is hindered by the limited structural diversity of existing test suites such as BBOB. We explore whether large language models embedded in an evolutionary loop can be used to design optimisation problems with clearly defined high-level landscape characteristics. Using the LLaMEA framework, we guide an LLM to generate problem code from natural-language descriptions of target properties, including multimodality, separability, basin-size homogeneity, search-space homogeneity and globallocal optima contrast. Inside the loop we score candidates through ELA-based property predictors. We introduce an ELA-space fitness-sharing mechanism that increases population diversity and steers the generator away from redundant landscapes. A complementary basin-of-attraction analysis, statistical testing and visual inspection, verifies that many of the generated functions indeed exhibit the intended structural traits. In addition, a t-SNE embedding shows that they expand the BBOB instance space rather than forming an unrelated cluster. The resulting library provides a broad, interpretable, and reproducible set of benchmark problems for landscape analysis and downstream tasks such as automated algorithm selection.
翻译:连续黑盒优化领域的基准测试受限于现有测试套件(如BBOB)的结构多样性不足。本研究探索了嵌入进化循环的大语言模型能否用于设计具有明确定义的高层景观特征的优化问题。利用LLaMEA框架,我们引导LLM根据目标特性的自然语言描述生成问题代码,这些特性包括多模态性、可分离性、盆地尺寸均匀性、搜索空间均匀性以及全局-局部最优解对比度。在循环内部,我们通过基于ELA的特性预测器对候选问题评分。我们引入了ELA空间适应度共享机制,以增加种群多样性并引导生成器远离冗余景观。通过互补的吸引盆分析、统计检验和可视化检查,验证了多数生成函数确实展现出预期的结构特征。此外,t-SNE嵌入分析表明这些函数拓展了BBOB实例空间,而非形成无关聚类。最终构建的库为景观分析及自动化算法选择等下游任务提供了一套广泛、可解释且可复现的基准问题集。