Offline model-based policy optimization seeks to optimize a learned surrogate objective function without querying the true oracle objective during optimization. However, inaccurate surrogate model predictions are frequently encountered along the optimization trajectory. To address this limitation, we propose generative adversarial Bayesian optimization (GABO) using adaptive source critic regularization, a task-agnostic framework for Bayesian optimization that employs a Lipschitz-bounded source critic model to constrain the optimization trajectory to regions where the surrogate function is reliable. We show that under certain assumptions for the continuous input space prior, our algorithm dynamically adjusts the strength of the source critic regularization. GABO outperforms existing baselines on a number of different offline optimization tasks across a variety of scientific domains. Our code is available at https://github.com/michael-s-yao/gabo
翻译:离线基于模型的策略优化旨在优化一个学习到的代理目标函数,而无需在优化过程中查询真实的标准目标函数。然而,沿着优化轨迹常会遇到不准确的代理模型预测。为解决这一限制,我们提出了生成式对抗贝叶斯优化(GABO),采用自适应源评判器正则化,这是一种与任务无关的贝叶斯优化框架,利用李普希兹有界的源评判器模型将优化轨迹约束在代理函数可靠的区域。我们证明,在连续输入空间先验的某些假设下,我们的算法动态调整源评判器正则化的强度。GABO在多个不同科学领域的离线优化任务中优于现有基准方法。我们的代码可在 https://github.com/michael-s-yao/gabo 获取。