log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling

Accurate prediction of chemical reaction yields is crucial for optimizing organic synthesis, potentially reducing time and resources spent on experimentation. With the rise of artificial intelligence (AI), there is growing interest in leveraging AI-based methods to accelerate yield predictions without conducting in vitro experiments. We present log-RRIM, an innovative graph transformer-based framework designed for predicting chemical reaction yields. Our approach implements a unique local-to-global reaction representation learning strategy. This approach initially captures detailed molecule-level information and then models and aggregates intermolecular interactions, ensuring that the impact of varying-sizes molecular fragments on yield is accurately accounted for. Another key feature of log-RRIM is its integration of a cross-attention mechanism that focuses on the interplay between reagents and reaction centers. This design reflects a fundamental principle in chemical reactions: the crucial role of reagents in influencing bond-breaking and formation processes, which ultimately affect reaction yields. log-RRIM outperforms existing methods in our experiments, especially for medium to high-yielding reactions, proving its reliability as a predictor. Its advanced modeling of reactant-reagent interactions and sensitivity to small molecular fragments make it a valuable tool for reaction planning and optimization in chemical synthesis. The data and codes of log-RRIM are accessible through https://github.com/ninglab/Yield_log_RRIM.

翻译：准确预测化学反应产率对于优化有机合成至关重要，有望减少实验所需的时间和资源消耗。随着人工智能（AI）的兴起，利用基于AI的方法加速产率预测而无需进行体外实验的兴趣日益增长。本文提出log-RRIM，一种创新的基于图Transformer的框架，专为预测化学反应产率而设计。我们的方法实施了一种独特的从局部到全局的反应表征学习策略：该策略首先捕获详细的分子水平信息，随后建模并聚合分子间相互作用，从而确保不同尺寸的分子片段对产率的影响被准确考量。log-RRIM的另一关键特点是其集成了关注试剂与反应中心之间相互作用的交叉注意力机制。这一设计反映了化学反应的一个基本原理：试剂在影响键断裂与形成过程中的关键作用，这些过程最终决定了反应产率。在我们的实验中，log-RRIM的表现优于现有方法，特别是对于中高收率的反应，证明了其作为预测工具的可靠性。其对反应物-试剂相互作用的高级建模以及对小分子片段的敏感性，使其成为化学合成中反应规划与优化的宝贵工具。log-RRIM的数据与代码可通过 https://github.com/ninglab/Yield_log_RRIM 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日