Molecular representation learning (MRL) has long been crucial in the fields of drug discovery and materials science, and it has made significant progress due to the development of natural language processing (NLP) and graph neural networks (GNNs). NLP treats the molecules as one dimensional sequential tokens while GNNs treat them as two dimensional topology graphs. Based on different message passing algorithms, GNNs have various performance on detecting chemical environments and predicting molecular properties. Herein, we propose Directed Graph Attention Networks (D-GATs): the expressive GNNs with directed bonds. The key to the success of our strategy is to treat the molecular graph as directed graph and update the bond states and atom states by scaled dot-product attention mechanism. This allows the model to better capture the sub-structure of molecular graph, i.e., functional groups. Compared to other GNNs or Message Passing Neural Networks (MPNNs), D-GATs outperform the state-of-the-art on 13 out of 15 important molecular property prediction benchmarks.
翻译:分子表示学习(MRL)在药物发现和材料科学领域长期至关重要,并因自然语言处理(NLP)和图神经网络(GNNs)的发展而取得重大进展。NLP将分子视为一维序列标记,而GNNs将其视为二维拓扑图。基于不同的消息传递算法,GNNs在检测化学环境和预测分子性质方面表现出不同的性能。在此,我们提出定向图注意力网络(D-GATs):一种具有定向键的表达性GNNs。我们策略成功的关键在于将分子图视为有向图,并通过缩放点积注意力机制更新键状态和原子状态。这使得模型能够更好地捕捉分子图的子结构,即官能团。与其他GNNs或消息传递神经网络(MPNNs)相比,D-GATs在15个重要分子性质预测基准中的13个上超越了当前最优方法。