The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused token, or suboptimal due to unbalanced exploration and exploitation as a consequence of the LLMs prior experience. Here we propose ERP, Entropy-Reinforced Planning for Transformer Decoding, which employs an entropy-reinforced planning algorithm to enhance the Transformer decoding process and strike a balance between exploitation and exploration. ERP aims to achieve improvements in multiple properties compared to direct sampling from the Transformer. We evaluated ERP on the SARS-CoV-2 virus (3CLPro) and human cancer cell target protein (RTCB) benchmarks and demonstrated that, in both benchmarks, ERP consistently outperforms the current state-of-the-art algorithm by 1-5 percent, and baselines by 5-10 percent, respectively. Moreover, such improvement is robust across Transformer models trained with different objectives. Finally, to further illustrate the capabilities of ERP, we tested our algorithm on three code generation benchmarks and outperformed the current state-of-the-art approach as well. Our code is publicly available at: https://github.com/xuefeng-cs/ERP.
翻译:药物发现的目标是识别具有特定药学性质的化合物,使其能够与靶标结合。现有的大语言模型(LLMs)在分子生成任务中能够实现较高的标记匹配似然分数。然而,仅依赖LLM解码通常会导致生成无效分子(因单个标记误用)或次优分子(因LLM先验经验导致的探索与利用失衡)。本文提出ERP(熵增强规划用于Transformer解码),该算法采用熵增强规划策略来改进Transformer解码过程,并在利用与探索之间实现平衡。ERP旨在相较于直接从Transformer中采样,在多种性质上取得提升。我们在SARS-CoV-2病毒(3CLPro)和人类癌细胞靶蛋白(RTCB)基准测试上评估了ERP,结果表明:在两个基准测试中,ERP均持续优于当前最优算法1-5个百分点,并优于基线方法5-10个百分点。此外,这种改进在不同训练目标的Transformer模型间均表现稳健。最后,为进一步展示ERP的能力,我们在三个代码生成基准测试上验证了该算法,其性能同样超越了当前最优方法。我们的代码已公开于:https://github.com/xuefeng-cs/ERP。