Graph Multi-Similarity Learning for Molecular Property Prediction

Enhancing accurate molecular property prediction relies on effective and proficient representation learning. It is crucial to incorporate diverse molecular relationships characterized by multi-similarity (self-similarity and relative similarities) between molecules. However, current molecular representation learning methods fall short in exploring multi-similarity and often underestimate the complexity of relationships between molecules. Additionally, previous multi-similarity approaches require the specification of positive and negative pairs to attribute distinct predefined weights to different relative similarities, which can introduce potential bias. In this work, we introduce Graph Multi-Similarity Learning for Molecular Property Prediction (GraphMSL) framework, along with a novel approach to formulate a generalized multi-similarity metric without the need to define positive and negative pairs. In each of the chemical modality spaces (e.g.,molecular depiction image, fingerprint, NMR, and SMILES) under consideration, we first define a self-similarity metric (i.e., similarity between an anchor molecule and another molecule), and then transform it into a generalized multi-similarity metric for the anchor through a pair weighting function. GraphMSL validates the efficacy of the multi-similarity metric across MoleculeNet datasets. Furthermore, these metrics of all modalities are integrated into a multimodal multi-similarity metric, which showcases the potential to improve the performance. Moreover, the focus of the model can be redirected or customized by altering the fusion function. Last but not least, GraphMSL proves effective in drug discovery evaluations through post-hoc analyses of the learnt representations.

翻译：提升分子性质预测的准确性依赖于有效且高效的表征学习。关键在于融合由分子间多相似性（自相似性与相对相似性）刻画的多样分子关系。然而，当前的分子表征学习方法在探索多相似性方面存在不足，且常低估分子间关系的复杂性。此外，以往的多相似性方法需指定正负样本对，为不同相对相似性赋予预定义的差异化权重，这可能引入潜在偏差。本文提出面向分子性质预测的图多相似性学习框架（GraphMSL），并引入无需定义正负样本对的广义多相似性度量新方法。在考虑的每个化学模态空间（如分子结构图像、指纹图谱、核磁共振谱及SMILES表示）中，我们首先定义自相似性度量（即锚定分子与另一分子的相似度），进而通过样本对加权函数将其转化为锚定分子的广义多相似性度量。GraphMSL在MoleculeNet数据集上验证了多相似性度量的有效性。此外，所有模态的度量被整合为多模态多相似性度量，展现出提升性能的潜力。同时，通过调整融合函数可重新定向或定制模型焦点。最后但同样重要的是，GraphMSL通过对所学表征的事后分析，在药物发现评估中证明了其有效性。