Meta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data from related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function independently, joint training of both components remains an open challenge. This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data. Early on, we notice that training transformer-based neural processes from scratch with RL is challenging due to insufficient supervision, especially when rewards are sparse. We formalise this claim with a combinatorial analysis showing that the widely used notion of regret as a reward signal exhibits a logarithmic sparsity pattern in trajectory lengths. To tackle this problem, we augment the RL objective with an auxiliary task that guides part of the architecture to learn a valid probabilistic model as an inductive bias. We demonstrate that our method achieves state-of-the-art regret results against various baselines in experiments on standard hyperparameter optimisation tasks and also outperforms others in the real-world problems of mixed-integer programming tuning, antibody design, and logic synthesis for electronic design automation.
翻译:元贝叶斯优化旨在通过利用相关任务的数据提高贝叶斯优化的样本效率。虽然以往方法成功实现了对代理模型或采集函数的独立元学习,但两者的联合训练仍是一个悬而未决的挑战。本文提出了首个端到端可微分的元贝叶斯优化框架,该框架将神经过程泛化为通过Transformer架构学习采集函数。我们利用强化学习实现这一端到端框架,以应对采集函数标注数据缺失的问题。我们早期发现,由于监督信号不足(尤其在奖励稀疏时),从头开始用强化学习训练基于Transformer的神经过程十分困难。通过组合分析我们形式化了这一论断,证明作为奖励信号的广泛使用的遗憾概念在轨迹长度上呈现对数稀疏模式。为解决该问题,我们在强化学习目标中引入辅助任务,引导部分架构将学习有效概率模型作为归纳偏置。实验表明,在标准超参数优化任务中,我们的方法针对多种基线取得了最先进的遗憾结果,并在混合整数规划调优、抗体设计、电子设计自动化逻辑综合等现实问题中优于其他方法。