Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also benefits from knowledge sharing across tasks to accelerate convergence. Empirical results demonstrate that the reward functions discovered by our approach could be helpful for boosting existing MetaBBO works, underscoring the importance of reward design in MetaBBO. We provide READY's project at https://anonymous.4open.science/r/ICML_READY-747F.
翻译:元黑盒优化(MetaBBO)是优化领域新兴的研究方向,其核心思想是通过强化学习元学习算法设计策略以提升优化性能。现有MetaBBO研究中的奖励函数均由专家人工设计,这不可避免地引入了设计偏差与奖励破解风险。本文提出使用大语言模型(LLM)作为自动化奖励发现工具来解决上述问题。具体而言,我们分别从效果与效率两个维度进行设计:在效果层面,我们借鉴启发式演化的思想,在基于LLM的迭代式程序搜索过程中引入定制化的演化范式,确保奖励函数的持续改进;在效率层面,我们进一步提出多任务演化架构,支持为不同MetaBBO方法并行发现奖励函数。这种并行机制还能通过跨任务知识共享加速收敛过程。实验结果表明,通过本方法发现的奖励函数能有效提升现有MetaBBO方法的性能,印证了奖励设计在MetaBBO研究中的重要性。我们在https://anonymous.4open.science/r/ICML_READY-747F 公开了READY的项目代码。