T-Rex: Text-assisted Retrosynthesis Prediction

As a fundamental task in computational chemistry, retrosynthesis prediction aims to identify a set of reactants to synthesize a target molecule. Existing template-free approaches only consider the graph structures of the target molecule, which often cannot generalize well to rare reaction types and large molecules. Here, we propose T-Rex, a text-assisted retrosynthesis prediction approach that exploits pre-trained text language models, such as ChatGPT, to assist the generation of reactants. T-Rex first exploits ChatGPT to generate a description for the target molecule and rank candidate reaction centers based both the description and the molecular graph. It then re-ranks these candidates by querying the descriptions for each reactants and examines which group of reactants can best synthesize the target molecule. We observed that T-Rex substantially outperformed graph-based state-of-the-art approaches on two datasets, indicating the effectiveness of considering text information. We further found that T-Rex outperformed the variant that only use ChatGPT-based description without the re-ranking step, demonstrate how our framework outperformed a straightforward integration of ChatGPT and graph information. Collectively, we show that text generated by pre-trained language models can substantially improve retrosynthesis prediction, opening up new avenues for exploiting ChatGPT to advance computational chemistry. And the codes can be found at https://github.com/lauyikfung/T-Rex.

翻译：作为计算化学领域的基础任务，逆合成预测旨在识别合成目标分子所需的反应物集合。现有无模板方法仅考虑目标分子的图结构，难以泛化至罕见反应类型及大分子场景。本文提出T-Rex，一种利用预训练文本语言模型（如ChatGPT）辅助反应物生成的文本辅助逆合成预测方法。T-Rex首先利用ChatGPT生成目标分子的文本描述，基于描述与分子图对候选反应中心进行排序，随后通过查询各反应物的文本描述并评估哪组反应物能最优合成目标分子，对候选方案进行重排序。实验表明，T-Rex在两个数据集上显著优于基于图结构的先进方法，验证了引入文本信息的有效性。进一步研究发现，T-Rex的性能优于仅使用ChatGPT生成描述而无重排序步骤的变体，凸显了本框架相较于直接整合ChatGPT与图信息方法的优势。综上，预训练语言模型生成的文本能显著提升逆合成预测性能，为利用ChatGPT推动计算化学发展开辟了新路径。相关代码已开源至https://github.com/lauyikfung/T-Rex。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日