Self-supervised molecular representation learning is critical for molecule-based tasks such as AI-assisted drug discovery. Recent studies consider leveraging both 2D and 3D information for representation learning, with straightforward alignment strategies that treat each modality separately. In this work, we introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND), which blends atom relations from different modalities into one unified relation matrix for encoding, then recovers modality-specific information for both 2D and 3D structures. By treating atom relationships as anchors, seemingly dissimilar 2D and 3D manifolds are aligned and integrated at fine-grained relation-level organically. Extensive experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks. We further provide theoretical insights from the perspective of mutual-information maximization, demonstrating that our method unifies contrastive, generative (inter-modal prediction) and mask-then-predict (intra-modal prediction) objectives into a single cohesive blend-then-predict framework.
翻译:自监督分子表示学习对于基于分子的任务(如AI辅助药物发现)至关重要。近期研究考虑利用二维和三维信息进行表示学习,并采用分别处理每种模态的直面对齐策略。在本工作中,我们提出一种新颖的"先融合后预测"自监督学习方法(MoleBLEND),该方法将不同模态的原子关系融合为统一的关联矩阵进行编码,然后恢复二维和三维结构的模态特定信息。通过将原子关系作为锚点,看似不同的二维与三维流形在细粒度的关系层面被有机对齐与整合。大量实验表明,MoleBLEND在主要二维/三维基准测试中均达到最先进性能。我们进一步从互信息最大化的角度提供理论洞见,证明该方法将对比学习、生成式(跨模态预测)与掩码预测(模态内预测)目标统一至一个连贯的"先融合后预测"框架中。