LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI

Large language models have enabled automated algorithm design (AAD) by generating optimization algorithms directly from natural-language prompts. While evolutionary frameworks such as LLaMEA demonstrate strong exploratory capabilities across the algorithm design space, their search dynamics are entirely driven by fitness feedback, leaving substantial information about the generated code unused. We propose a mechanism for guiding AAD using feedback constructed from graph-theoretic and complexity features extracted from the abstract syntax trees of the generated algorithms, based on a surrogate model learned over an archive of evaluated solutions. Using explainable AI techniques, we identify features that substantially affect performance and translate them into natural-language mutation instructions that steer subsequent LLM-based code generation without restricting expressivity. We propose LLaMEA-SAGE, which integrates this feature-driven guidance into LLaMEA, and evaluate it across several benchmarks. We show that the proposed structured guidance achieves the same performance faster than vanilla LLaMEA in a small controlled experiment. In a larger-scale experiment using the MA-BBOB suite from the GECCO-MA-BBOB competition, our guided approach achieves superior performance compared to state-of-the-art AAD methods. These results demonstrate that signals derived from code can effectively bias LLM-driven algorithm evolution, bridging the gap between code structure and human-understandable performance feedback in automated algorithm design.

翻译：大型语言模型通过直接从自然语言提示生成优化算法，实现了自动化算法设计。虽然LLaMEA等进化框架在算法设计空间中展现出强大的探索能力，但其搜索动态完全由适应度反馈驱动，导致生成的代码中包含的大量信息未被利用。我们提出一种利用从生成算法的抽象语法树中提取的图论和复杂性特征构建反馈来指导自动化算法设计的方法，该反馈基于在已评估解决方案档案库上学习的代理模型。通过可解释人工智能技术，我们识别出对性能有显著影响的特征，并将其转化为自然语言变异指令，从而在不限制表达性的前提下引导后续基于大型语言模型的代码生成。我们提出了LLaMEA-SAGE，将这种特征驱动的指导机制集成到LLaMEA中，并在多个基准测试中进行了评估。结果表明，在一个小型受控实验中，所提出的结构化指导机制能以更快的速度达到与原始LLaMEA相同的性能。在使用GECCO-MA-BBOB竞赛中的MA-BBOB测试套件进行的大规模实验中，我们的指导方法相比最先进的自动化算法设计方法取得了更优的性能。这些结果证明，从代码中提取的信号能够有效引导大型语言模型驱动的算法进化，从而在自动化算法设计中弥合代码结构与人类可理解的性能反馈之间的鸿沟。