In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $Ł(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $Ł(s)$, effectively sampling from a target distribution proportional to $Ł(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.
翻译:在零阶优化中,我们仅通过函数评估来最小化函数 $d(\cdot)$,该函数可能编码组合可行性。我们关注于解还必须满足定性约束或符合复杂先验分布的场景。为此,我们引入了一个新框架,其中此类约束由一个初始生成先验 $Ł(\cdot)$ 表示,例如一个大型语言模型(LLM)。目标在于找到解 $s$,使其在最小化 $d(s)$ 的同时,在 $Ł(s)$ 下具有高概率,这实际上是从一个与 $Ł(s) \cdot e^{-T \cdot d(s)}$ 成比例的目标分布中进行采样,其中 $T$ 为温度参数。虽然该框架与经典的基于模型的优化(例如交叉熵方法)相一致,但现有理论并不适用于推导黑盒深度生成模型的样本复杂度界限。因此,我们提出了一种新的学习假设,称之为 \emph{粗粒度可学习性},即一个能够访问多项式数量样本的智能体可以学习一个模型,其逐点密度在多项式因子内逼近目标分布。利用这一假设,我们设计了一种迭代算法,该算法采用 Metropolis-Hastings 校正,可证明地使用多项式数量的样本来近似目标分布。据我们所知,这是首批为基于深度生成先验的模型优化建立此类样本复杂度保证的工作之一。我们提供了两条证据支持粗粒度可学习性假设。理论上,我们证明最大似然估计自然诱导出所需的覆盖特性,该特性对标准指数族模型和误设模型均成立。实证上,我们证明 LLMs 能够调整其学习到的分布以适应零阶反馈,从而解决组合优化问题。