To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing protein sequences for expensive to evaluate bio-chemical properties. The advantages of our approach are two-fold: the problem is treated in the continuous setting, and available prior knowledge over sequences can be incorporated directly. More specifically, we utilize available and learned distributions over the problem domain for a weighting of the Hellinger distance which yields a covariance function. We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms and empirically assess our method on two bio-chemical sequence optimization tasks.
翻译:为在离散数据且仅有少量可用目标观测值的条件下进行高效优化,是贝叶斯优化领域的一项挑战。本文提出目标函数的连续松弛方法,并证明推理与优化在计算上具有可行性。我们重点考虑观测数据极少且预算严格受限的优化场景——其动机源于对代价高昂的生物化学性质评估的蛋白质序列进行优化。本方法具有双重优势:既可在连续空间处理问题,又能直接纳入序列的先验知识。具体而言,我们利用问题域上已获取及学习到的分布对Hellinger距离进行加权,从而构造协方差函数。研究表明,由此产生的采集函数可通过连续或离散优化算法进行优化,并在两个生物化学序列优化任务上对方法进行了实证评估。