Sample-Efficient Optimization over Generative Priors via Coarse Learnability

We study zeroth-order optimization where solutions must minimize a cost $d(s)$ while maintaining high probability under a complex generative prior $L(s)$ (e.g., a parameterized model). This reduces to sampling from a target distribution proportional to $L(s) e^{-T \cdot d(s)}$. Since classical model-based optimization (MBO) lacks finite-sample guarantees for expressive approximate learners, we introduce "coarse learnability", a flexible statistical assumption requiring only that a learned model covers the target's probability mass within a polynomial factor. Leveraging this assumption, we design an iterative MBO algorithm called \alift with a sample correction step that provably approximates the target using only a polynomial number of samples. We apply this framework to globally optimizing non-convex objectives bounded by a quadratic envelope in $R^n$, where we show this assumption is naturally satisfied for a family of "optimistic" posterior distributions. To reach global $\varepsilon$-optimality, this implies a sample complexity of $\widetilde{O}(\log 1/\varepsilon)$, a rate characteristic of optimistic space-partitioning methods. We further justify coarse learnability as an assumption for generative priors theoretically, proving that in simple settings, parametric maximum likelihood estimation and over-smoothed kernel density estimators naturally satisfy it. Finally, one motivation for our framework comes from inference-time alignment. Though our primary contribution pertains to the theoretical foundations of MBO, we provide qualitative evidence that, in simple settings, even primitive LLMs can shift their distributions toward lower-cost regions when fine-tuned with zeroth-order feedback.

翻译：我们研究零阶优化问题，其中解必须在最小化代价函数$d(s)$的同时，以高概率保持与复杂生成先验$L(s)$（如参数化模型）的一致性。这等价于从与$L(s) e^{-T \cdot d(s)}$成正比的目標分布中采样。由于经典基于模型的优化（MBO）对表达能力强的近似学习器缺乏有限样本保证，我们引入了“粗可学习性”——一种灵活的统计假设，仅要求学习模型在多项式因子范围内覆盖目标分布的概率质量。基于这一假设，我们设计了名为\alift的迭代式MBO算法，该算法通过样本校正步骤，仅需多项式数量的样本即可证明性地逼近目标分布。我们将该框架应用于全局优化具有二次包络界的非凸目标函数（在$\mathbb{R}^n$空间中），并证明在此类问题中，对于一族“乐观”后验分布，该假设自然成立。为实现全局$\varepsilon$-最优性，该方法的样本复杂度为$\widetilde{O}(\log 1/\varepsilon)$，这一速率是乐观空间划分方法的典型特征。我们进一步从理论上论证了粗可学习性作为生成先验假设的合理性，证明在简单场景中，参数化极大似然估计和过度平滑的核密度估计自然满足该假设。最后，我们框架的动机之一来自推理时对齐。尽管我们的主要贡献在于MBO的理论基础，但我们提供的定性证据表明：在简单设定下，即使原始的大型语言模型（LLM）通过零阶反馈微调后，也能将其分布向低代价区域迁移。