Bayesian optimization in large unstructured discrete spaces is often hindered by the computational cost of maximizing acquisition functions due to the absence of gradients. We propose a scalable alternative based on Thompson sampling that eliminates the need for acquisition function maximization by directly parameterizing the probability that a candidate yields the maximum reward. Our approach, Thompson Sampling via Fine-Tuning (ToSFiT) leverages the prior knowledge embedded in prompt-conditioned large language models, and incrementally adapts them toward the posterior. Theoretically, we derive a novel regret bound for a variational formulation of Thompson Sampling that matches the strong guarantees of its standard counterpart. Our analysis reveals the critical role of careful adaptation to the posterior probability of maximality--a principle that underpins our ToSFiT algorithm. Empirically, we validate our method on three diverse tasks: FAQ response refinement, thermally stable protein search, and quantum circuit design. We demonstrate that online fine-tuning significantly improves sample efficiency, with negligible impact on computational efficiency.
翻译:在大型非结构化离散空间中进行贝叶斯优化常因梯度缺失导致获取函数最大化计算成本过高而受阻。我们提出一种基于汤普森采样的可扩展替代方案,该方法通过直接参数化候选解获得最大奖励的概率,从而消除了获取函数最大化的需求。我们提出的基于微调的汤普森采样方法,利用提示条件大语言模型中嵌入的先验知识,并逐步使其向后验分布自适应。理论上,我们推导出汤普森采样变分形式的新颖遗憾界,该界限与其标准形式具有同样严格的保证。我们的分析揭示了谨慎适应最大后验概率的关键作用——这一原则构成了ToSFiT算法的理论基础。实证方面,我们在三个不同任务上验证了该方法:FAQ响应优化、热稳定蛋白质搜索和量子电路设计。实验表明在线微调能显著提升样本效率,且对计算效率的影响可忽略不计。