GOLLuM：基于高斯过程优化的大语言模型——通过贝叶斯优化重构大语言模型微调范式 (GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization)

Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both - LLMs to provide a rich and flexible input space for Bayesian optimization and - GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks - ranging from general chemistry to reaction and molecular property optimization - demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling - without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.

翻译：大语言模型（LLMs）能够在其潜在空间中编码复杂关系，但在不确定性条件下利用它们进行优化仍然具有挑战性。我们通过一种新颖的架构来弥补这一差距，该架构通过深度核方法将LLM微调重新构建为高斯过程（GP）边缘似然优化。我们引入了基于LLM的深度核，与GP联合优化，以保留两者的优势——LLM为贝叶斯优化提供丰富而灵活的输入空间，而GP则通过预测不确定性对此空间进行建模，从而实现更高效的采样。应用于Buchwald-Hartwig反应优化时，我们的方法在仅50次优化迭代中，相比静态LLM嵌入，将高性能反应的发现率几乎提高了一倍（从覆盖前5%反应的24%提升至43%）。我们还观察到，相比领域特定表征，性能提升了14%，且无需专门的特征工程。在涵盖从通用化学到反应及分子性质优化的19个基准测试上进行的大量实证评估表明，我们的方法在以下方面均表现出鲁棒性、通用性和一致的改进：(1) 任务类型，(2) LLM架构（编码器、解码器、编码器-解码器），(3) 预训练领域（化学相关或通用领域）以及(4) 超参数设置（仅在单个数据集上一次性调优）。最后，我们解释了这些改进的原因：通过边缘似然进行的联合LLM-GP优化隐式地执行了对比学习，通过调整表征以产生(1) 结构更优的嵌入空间，(2) 改进的不确定性校准，以及(3) 更高效的采样——而无需任何外部损失函数。这项工作既为样本高效优化提供了实际进展，也深入揭示了实现有效贝叶斯优化的关键因素。