While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set. It is not clear theoretically when existing approaches can even perform better than the naive approach that simply selects the best design in the dataset. In this paper, we study how structure can enable sample-efficient data-driven optimization. To formalize the notion of structure, we introduce functional graphical models (FGMs) and show theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems. This allows us to derive much more practical regret bounds for DDO, and the result implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data. We further present a data-driven optimization algorithm that inferes the FGM structure itself, either over the original input variables or a latent variable representation of the inputs.
翻译:虽然机器学习模型通常用于解决预测问题,但我们也常希望将其应用于优化问题。例如,给定一个蛋白质及其对应荧光水平的数据集,我们可能希望优化出具有最高荧光水平的新蛋白质。这类数据驱动优化(DDO)带来的挑战远超标准预测问题,因为我们需要模型能够成功预测优于训练集中最佳设计的新设计的性能。现有方法何时能比简单选择数据集中最佳设计的朴素方法表现更优,这在理论上并不明确。本文研究结构如何能够实现样本高效的数据驱动优化。为形式化结构概念,我们引入功能图模型(FGMs),并从理论上证明它们如何通过将原始高维优化问题分解为更小的子问题,实现有原则的数据驱动优化。这使我们能够推导出更实用的DDO遗憾界,结果表明,当朴素方法因离线数据覆盖不足而失败时,基于FGM的DDO能够获得近乎最优的设计。我们进一步提出一种数据驱动优化算法,该算法可自动推断FGM结构——可基于原始输入变量或输入的潜在变量表示进行推断。