Therapeutic peptides represent a unique class of pharmaceutical agents crucial for the treatment of human diseases. Recently, deep generative models have exhibited remarkable potential for generating therapeutic peptides, but they only utilize sequence or structure information alone, which hinders the performance in generation. In this study, we propose a Multi-Modal Contrastive Diffusion model (MMCD), fusing both sequence and structure modalities in a diffusion framework to co-generate novel peptide sequences and structures. Specifically, MMCD constructs the sequence-modal and structure-modal diffusion models, respectively, and devises a multi-modal contrastive learning strategy with intercontrastive and intra-contrastive in each diffusion timestep, aiming to capture the consistency between two modalities and boost model performance. The inter-contrastive aligns sequences and structures of peptides by maximizing the agreement of their embeddings, while the intra-contrastive differentiates therapeutic and non-therapeutic peptides by maximizing the disagreement of their sequence/structure embeddings simultaneously. The extensive experiments demonstrate that MMCD performs better than other state-of-theart deep generative methods in generating therapeutic peptides across various metrics, including antimicrobial/anticancer score, diversity, and peptide-docking.
翻译:治疗性肽是一类独特且关键的药物制剂,用于治疗人类疾病。近年来,深度生成模型在生成治疗性肽方面展现了显著潜力,但现有方法仅单独利用序列或结构信息,这限制了生成性能。在本研究中,我们提出了一种多模态对比扩散模型(MMCD),该模型在扩散框架中融合序列和结构两种模态,共同生成新型肽序列与结构。具体而言,MMCD分别构建了序列模态和结构模态的扩散模型,并在每个扩散时间步中设计了包含模态间对比和模态内对比的多模态对比学习策略,旨在捕捉两种模态之间的一致性并提升模型性能。模态间对比通过最大化肽序列和结构嵌入的相似性来对齐两者,而模态内对比则通过同时最大化治疗性肽与非治疗性肽的序列/结构嵌入的差异性来区分它们。大量实验表明,MMCD在生成治疗性肽的各项指标(包括抗菌/抗癌得分、多样性及肽对接)上均优于其他最先进的深度生成方法。