Effective molecular representation learning is essential for molecular property prediction. Contrastive learning, a prominent self-supervised approach for molecular representation learning, relies on establishing positive and negative pairs. However, this binary similarity categorization oversimplifies the nature of complex molecular relationships and overlooks the degree of relative similarities among molecules, posing challenges to the effectiveness and generality of representation learning. In response to this challenge, we propose the Graph Multi-Similarity Learning for Molecular Property Prediction (GraphMSL) framework. GraphMSL incorporates a generalized multi-similarity metric in a continuous scale, capturing self-similarity and relative similarities. The unimodal multi-similarity metrics are derived from various chemical modalities, and the fusion of these metrics into a multimodal form significantly enhances the effectiveness of GraphMSL. In addition, the flexibility of fusion function can reshape the focus of the model to convey different chemical semantics. GraphMSL proves effective in drug discovery evaluations through various downstream tasks and post-hoc analysis of learnt representations. Its notable performance suggests significant potential for the exploration of new drug candidates.
翻译:有效的分子表征学习对于分子性质预测至关重要。对比学习作为一种代表性的自监督分子表征学习方法,依赖于构建正负样本对。然而,这种二元相似性分类过度简化了复杂分子关系的本质,且忽略了分子间相对相似度的层次,对表征学习的有效性和通用性构成了挑战。针对这一问题,我们提出了面向分子性质预测的图多相似度学习框架(GraphMSL)。GraphMSL采用连续尺度上的广义多相似度度量,从而捕捉自相似度和相对相似度。单模态多相似度度量源自多种化学模态,而将这些度量融合为多模态形式则显著增强了GraphMSL的有效性。此外,融合函数的灵活性能够重塑模型关注重点,以传递不同的化学语义。通过各项下游任务及对所学表征的事后分析,GraphMSL在药物发现评估中展现出有效性。其显著性能表明,该方法在探索新候选药物方面具有巨大潜力。