Recently, Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP) due to their high reliability in characterizing the latent relationship among graph nodes (i.e., the atoms in a molecule). However, most existing GT-based methods usually explore the basic interactions between pairwise atoms, and thus they fail to consider the important interactions among critical motifs (e.g., functional groups consisted of several atoms) of molecules. As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e.g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP. To address this issue, we propose a novel Atom-Motif Contrastive Transformer (AMCT), which not only explores the atom-level interactions but also considers the motif-level interactions. Since the representations of atoms and motifs for a given molecule are actually two different views of the same instance, they are naturally aligned to generate the self-supervisory signals for model training. Meanwhile, the same motif can exist in different molecules, and hence we also employ the contrastive loss to maximize the representation agreement of identical motifs across different molecules. Finally, in order to clearly identify the motifs that are critical in deciding the properties of each molecule, we further construct a property-aware attention mechanism into our learning framework. Our proposed AMCT is extensively evaluated on seven popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness when compared with the state-of-the-art methods.
翻译:近期,图Transformer(GT)模型因其在表征图节点(即分子中的原子)间潜在关系方面的高可靠性,被广泛应用于分子性质预测任务。然而,现有基于GT的方法主要探索成对原子间的基本相互作用,却忽略了分子中关键基序(如由多个原子构成的官能团)间的重要交互。由于分子中的基序是决定分子性质(如毒性、溶解度)的重要模式,忽视基序相互作用势必影响分子性质预测的有效性。针对此问题,本文提出一种新型原子-基序对比Transformer(AMCT),该模型不仅探索原子级相互作用,还兼顾基序级交互。由于给定分子的原子与基序表征实为同一实例的两个不同视图,两者天然对齐可生成用于模型训练的自监督信号。同时,相同基序可能存在于不同分子中,故采用对比损失最大化跨分子相同基序表征的一致性。最后,为清晰识别决定各分子性质的关键基序,我们在学习框架中进一步构建性质感知注意力机制。AMCT在七个主流基准数据集上获得广泛评估,定量与定性结果均充分证明其相较当前最优方法的有效性。