Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both - LLMs to provide a rich and flexible input space for Bayesian optimization and - GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks - ranging from general chemistry to reaction and molecular property optimization - demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling - without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.
翻译:大语言模型(LLMs)能够在其潜在空间中编码复杂关系,但在不确定性条件下利用它们进行优化仍然具有挑战性。我们通过一种新颖的架构来弥补这一差距,该架构通过深度核方法将LLM微调重新构建为高斯过程(GP)边缘似然优化。我们引入了基于LLM的深度核,与GP联合优化,以保留两者的优势——LLM为贝叶斯优化提供丰富而灵活的输入空间,而GP则通过预测不确定性对此空间进行建模,从而实现更高效的采样。应用于Buchwald-Hartwig反应优化时,我们的方法在仅50次优化迭代中,相比静态LLM嵌入,将高性能反应的发现率几乎提高了一倍(从覆盖前5%反应的24%提升至43%)。我们还观察到,相比领域特定表征,性能提升了14%,且无需专门的特征工程。在涵盖从通用化学到反应及分子性质优化的19个基准测试上进行的大量实证评估表明,我们的方法在以下方面均表现出鲁棒性、通用性和一致的改进:(1) 任务类型,(2) LLM架构(编码器、解码器、编码器-解码器),(3) 预训练领域(化学相关或通用领域)以及(4) 超参数设置(仅在单个数据集上一次性调优)。最后,我们解释了这些改进的原因:通过边缘似然进行的联合LLM-GP优化隐式地执行了对比学习,通过调整表征以产生(1) 结构更优的嵌入空间,(2) 改进的不确定性校准,以及(3) 更高效的采样——而无需任何外部损失函数。这项工作既为样本高效优化提供了实际进展,也深入揭示了实现有效贝叶斯优化的关键因素。