Bayesian optimization in large unstructured discrete spaces is often hindered by the computational cost of maximizing acquisition functions due to the absence of gradients. We propose a scalable alternative based on Thompson sampling that eliminates the need for acquisition function maximization by directly parameterizing the probability that a candidate yields the maximum reward. Our approach, Thompson Sampling via Fine-Tuning (ToSFiT) leverages the prior knowledge embedded in prompt-conditioned large language models, and incrementally adapts them toward the posterior. Theoretically, we derive a novel regret bound for a variational formulation of Thompson Sampling that matches the strong guarantees of its standard counterpart. Our analysis reveals the critical role of careful adaptation to the posterior probability of maximality -- a principle that underpins our ToSFiT algorithm. Empirically, we validate our method on three diverse tasks: FAQ response refinement, thermally stable protein search, and quantum circuit design. Within a collection of methods covering Bayesian optimization, reinforcement learning, and evolutionary search, ToSFiT exhibits both state-of-the-art sample efficiency and computational efficiency.
翻译:在大型非结构化离散空间中进行贝叶斯优化时,由于梯度信息的缺失,获取函数最大化过程往往因计算成本过高而受阻。我们提出一种基于汤普森采样的可扩展替代方案,该方法通过直接参数化候选方案产生最大奖励的概率,从而避免了获取函数最大化的需求。我们提出的基于微调的汤普森采样(ToSFiT)方法充分利用了提示条件大型语言模型中嵌入的先验知识,并逐步使其向后验分布自适应。在理论层面,我们为汤普森采样的变分形式推导出一个新颖的遗憾界,该界限与其标准形式具有同样严格的理论保证。我们的分析揭示了谨慎适应最大后验概率的关键作用——这一原则构成了ToSFiT算法的理论基础。在实证研究中,我们在三个不同任务上验证了本方法的有效性:FAQ响应优化、热稳定蛋白质搜索和量子电路设计。在涵盖贝叶斯优化、强化学习和进化搜索的多种方法对比中,ToSFiT在样本效率和计算效率方面均展现出最先进的性能。