Black-Box Optimization (BBO) has found successful applications in many fields of science and engineering. Recently, there has been a growing interest in meta-learning particular components of BBO algorithms to speed up optimization and get rid of tedious hand-crafted heuristics. As an extension, learning the entire algorithm from data requires the least labor from experts and can provide the most flexibility. In this paper, we propose RIBBO, a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion. RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks, leveraging the in-context learning ability of large models to extract task information and make decisions accordingly. Central to our method is to augment the optimization histories with \textit{regret-to-go} tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories. The integration of regret-to-go tokens enables RIBBO to automatically generate sequences of query points that satisfy the user-desired regret, which is verified by its universally good empirical performance on diverse problems, including BBO benchmark functions, hyper-parameter optimization and robot control problems.
翻译:黑盒优化(BBO)已在科学与工程的诸多领域获得成功应用。近年来,学界对元学习BBO算法特定组件以加速优化过程、摆脱繁琐人工启发的兴趣日益增长。作为延伸,从数据中学习完整算法对专家人力需求最低,并能提供最大灵活性。本文提出RIBBO方法,通过端到端方式从离线数据中强化学习BBO算法。RIBBO采用表达能力强的序列模型,学习多种行为算法与任务产生的优化历史轨迹,利用大模型的上下文学习能力提取任务信息并据此决策。本方法的核心在于使用\textit{遗憾值-未来}标记对优化历史进行增强,该标记旨在基于历史未来部分的累积遗憾表征算法性能。通过集成遗憾值-未来标记,RIBBO能够自动生成满足用户期望遗憾值的查询点序列,其在多样化问题上的优异实证表现验证了该特性,包括BBO基准函数、超参数优化及机器人控制问题。