Tensor-DTI：基于对比嵌入学习的生物分子相互作用预测增强方法 (Tensor-DTI: Enhancing Biomolecular Interaction Prediction with Contrastive Embedding Learning)

from arxiv, Accepted at the Generative and Experimental Perspectives for Biomolecular Design Workshop at ICLR 2025 and at the Learning Meaningful Representations of Life Workshop at ICLR 2025

Accurate drug-target interaction (DTI) prediction is essential for computational drug discovery, yet existing models often rely on single-modality predefined molecular descriptors or sequence-based embeddings with limited representativeness. We propose Tensor-DTI, a contrastive learning framework that integrates multimodal embeddings from molecular graphs, protein language models, and binding-site predictions to improve interaction modeling. Tensor-DTI employs a siamese dual-encoder architecture, enabling it to capture both chemical and structural interaction features while distinguishing interacting from non-interacting pairs. Evaluations on multiple DTI benchmarks demonstrate that Tensor-DTI outperforms existing sequence-based and graph-based models. We also conduct large-scale inference experiments on CDK2 across billion-scale chemical libraries, where Tensor-DTI produces chemically plausible hit distributions even when CDK2 is withheld from training. In enrichment studies against Glide docking and Boltz-2 co-folder, Tensor-DTI remains competitive on CDK2 and improves the screening budget required to recover moderate fractions of high-affinity ligands on out-of-family targets under strict family-holdout splits. Additionally, we explore its applicability to protein-RNA and peptide-protein interactions. Our findings highlight the benefits of integrating multimodal information with contrastive objectives to enhance interaction-prediction accuracy and to provide more interpretable and reliability-aware models for virtual screening.

翻译：准确的药物-靶标相互作用（DTI）预测对于计算药物发现至关重要，然而现有模型通常依赖于单模态预定义的分子描述符或表征能力有限的序列嵌入。我们提出了Tensor-DTI，一种对比学习框架，它整合了来自分子图、蛋白质语言模型和结合位点预测的多模态嵌入，以改进相互作用建模。Tensor-DTI采用孪生双编码器架构，使其能够捕捉化学和结构相互作用特征，同时区分相互作用与非相互作用对。在多个DTI基准测试上的评估表明，Tensor-DTI优于现有的基于序列和基于图的模型。我们还在数十亿规模化学库上对CDK2进行了大规模推理实验，即使CDK2在训练中被排除，Tensor-DTI仍能产生化学上合理的命中分布。在与Glide对接和Boltz-2共折叠器的富集研究中，Tensor-DTI在CDK2上保持竞争力，并在严格的家族排除划分下，改善了恢复外家族靶标中高亲和力配体中等比例所需的筛选预算。此外，我们探索了其在蛋白质-RNA和肽-蛋白质相互作用中的适用性。我们的研究结果突显了将多模态信息与对比目标相结合的优势，以提高相互作用预测的准确性，并为虚拟筛选提供更具可解释性和可靠性感知的模型。