Large language models (LLMs) have recently shown significant potential in various biological tasks such as protein engineering and molecule design. These tasks typically involve black-box discrete sequence optimization, where the challenge lies in generating sequences that are not only biologically feasible but also adhere to hard fine-grained constraints. However, LLMs often struggle with such constraints, especially in biological contexts where verifying candidate solutions is costly and time-consuming. In this study, we explore the possibility of employing LLMs as highly-constrained bilevel optimizers through a methodology we refer to as Language Model Optimization with Margin Expectation (LLOME). This approach combines both offline and online optimization, utilizing limited oracle evaluations to iteratively enhance the sequences generated by the LLM. We additionally propose a novel training objective -- Margin-Aligned Expectation (MargE) -- that trains the LLM to smoothly interpolate between the reward and reference distributions. Lastly, we introduce a synthetic test suite that bears strong geometric similarity to real biophysical problems and enables rapid evaluation of LLM optimizers without time-consuming lab validation. Our findings reveal that, in comparison to genetic algorithm baselines, LLMs achieve significantly lower regret solutions while requiring fewer test function evaluations. However, we also observe that LLMs exhibit moderate miscalibration, are susceptible to generator collapse, and have difficulty finding the optimal solution when no explicit ground truth rewards are available.
翻译:大型语言模型(LLMs)近期在蛋白质工程与分子设计等生物任务中展现出显著潜力。此类任务通常涉及黑盒离散序列优化,其核心挑战在于生成不仅具有生物可行性,同时满足严格细粒度约束的序列。然而,LLMs在此类约束条件下常面临困难,尤其在生物场景中,候选解决方案的验证成本高昂且耗时。本研究通过我们提出的"基于边界期望的语言模型优化"方法,探索将LLMs作为高度受限的双层优化器的可能性。该方法结合离线与在线优化策略,利用有限的评估资源迭代改进LLM生成的序列。我们进一步提出一种新颖的训练目标——边界对齐期望——该目标训练LLM在奖励分布与参考分布之间实现平滑插值。最后,我们构建了一个与真实生物物理问题具有强几何相似性的合成测试集,可在无需耗时实验室验证的情况下快速评估LLM优化器的性能。研究结果表明,相较于遗传算法基线,LLMs能以更少的测试函数评估次数获得显著更低遗憾值的解。然而,我们也观察到LLMs存在中等程度的校准偏差,易受生成器坍缩影响,且在缺乏显式真实奖励信号时难以找到最优解。