Expensive optimization problems (EOPs) are black-box tasks with costly objective evaluations and no gradient access, making the evaluation budget the key bottleneck. Surrogate-assisted evolutionary algorithms (SAEAs) reduce evaluations via surrogate predictions, but conventional surrogates often require frequent retraining as populations evolve, incurring overhead. This paper proposes R2SAEA, a reinforcement-trained relation-based large language model (LLM) surrogate assisted evolutionary algorithm. We cast relation-based surrogate modeling as an in-context pairwise reasoning task. To enable efficient inference in evolutionary loops, we develop an anchor-based iterative context construction strategy that reduces prompt complexity from quadratic to linear in population size, and a voting-based aggregation scheme that converts predicted relations into scores for offspring selection. We further build an RL pipeline from evolutionary trajectories and fine-tune Qwen2.5 with GRPO. Experiments on single- and multi-objective benchmarks show improved relation prediction and state-of-the-art optimization performance over strong SAEA baselines and general LLMs. Quantization also enables efficient edge deployment, supporting a zero-shot surrogate paradigm without per-generation retraining. Code and models are available at https://github.com/Septend9/R2SAEA.
翻译:昂贵优化问题(EOPs)指目标评估代价高昂且无法获取梯度的黑箱任务,评估预算成为关键瓶颈。代理辅助进化算法(SAEAs)通过代理预测减少评估次数,但传统代理模型在种群进化过程中通常需要频繁重训练,带来额外开销。本文提出R2SAEA——一种基于强化训练的关系型大语言模型(LLM)代理辅助进化算法。我们将基于关系的代理建模转化为上下文内成对推理任务。为实现进化循环中的高效推理,我们开发了锚点迭代式上下文构建策略,将提示复杂度从种群规模的二次方降至线性,并设计了基于投票的聚合方案,将预测关系转化为子代选择评分。进一步,我们构建了基于进化轨迹的强化学习流水线,利用GRPO对Qwen2.5进行微调。在单目标和多目标基准测试上的实验表明,相比强基线的SAEA算法及通用LLM,本方法在关系预测与优化性能上均达到最优水平。量化技术还支持高效的边缘端部署,实现了无需逐代重训练的零样本代理范式。代码与模型已开源至https://github.com/Septend9/R2SAEA。