Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may lead to contrary molecular properties. In this paper, we propose a universal SMILE-graph representation learning model, namely UniMAP. Firstly, an embedding layer is employed to obtain the token and node/edge representation in SMILES and graph, respectively. A multi-layer Transformer is then utilized to conduct deep cross-modality fusion. Specially, four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global (i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to achieve comprehensive cross-modality fusion. We evaluate UniMAP on various downstream tasks, i.e. molecular property prediction, drug-target affinity prediction and drug-drug interaction. Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.We also visualize the learned representations to demonstrate the effect of multi-modality integration.
翻译:分子表示学习是许多药物相关应用的基础。现有的大多数分子预训练模型局限于使用单一分子模态,即SMILES序列或图表示。为有效利用这两种模态,我们认为关键在于捕获SMILES与图之间细粒度的"语义"关联,因为细微的序列/图差异可能导致相反的分子性质。本文提出一种通用的SMILES-图表示学习模型UniMAP。首先通过嵌入层分别获取SMILES中的词元表示和图中的节点/边表示,随后采用多层Transformer进行深度跨模态融合。特别设计了四种预训练任务:多级跨模态掩码(CMM)、SMILES-图匹配(SGM)、片段级对齐(FLA)以及领域知识学习(DKL)。通过整合全局对齐(SGM与DKL)与局部对齐(CMM与FLA),实现了全面的跨模态融合。我们在多项下游任务(分子性质预测、药物-靶点亲和力预测、药物-药物相互作用)上评估UniMAP,实验结果表明其性能优于当前最先进的预训练方法。同时通过可视化学习到的表示,验证了多模态整合的有效性。