This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of ``offline" data examples. While recent years have seen a flurry of work on applying various machine learning techniques to the offline optimization problem, the majority of these work focused on learning a surrogate of the unknown objective function and then applying existing optimization algorithms. While the idea of modeling the unknown objective function is intuitive and appealing, from the learning point of view it also makes it very difficult to tune the objective of the learner according to the objective of optimization. Instead of learning and then optimizing the unknown objective function, in this paper we take on a less intuitive but more direct view that optimization can be thought of as a process of sampling from a generative model. To learn an effective generative model from the offline data examples, we consider the standard technique of ``re-weighting", and our main technical contribution is a probably approximately correct (PAC) lower bound on the natural optimization objective, which allows us to jointly learn a weight function and a score-based generative model. The robustly competitive performance of the proposed approach is demonstrated via empirical studies using the standard offline optimization benchmarks.
翻译:本文考虑离线优化问题,其中目标函数除了一组"离线"数据样本外是未知的。尽管近年来涌现了大量将各种机器学习技术应用于离线优化问题的研究,但其中大多数工作侧重于学习未知目标函数的代理模型,然后应用现有优化算法。虽然对未知目标函数进行建模的思路直观且具有吸引力,但从学习角度来看,这使得根据优化目标调整学习器的目标变得非常困难。本文不采用先学习再优化未知目标函数的思路,而是采取一种不那么直观但更直接的观点:优化可以被视为从生成模型中采样的过程。为了从离线数据样本中学习有效的生成模型,我们考虑了标准的"重加权"技术,主要技术贡献是给出了自然优化目标的一个概率近似正确(PAC)下界,该下界使我们能够联合学习权重函数和基于得分的生成模型。通过使用标准离线优化基准进行的实证研究,验证了所提方法具有稳健的竞争性性能。