D-Flow: Multi-modality Flow Matching for D-peptide Design

from arxiv, The paper is withdrawn due to an oversight in authorship confirmation and final draft approval. Not all listed co-authors reviewed or consented to the submission, including the corresponding authorship designation. This withdrawal allows for proper review and consent from all authors before resubmission

Proteins play crucial roles in biological processes, with therapeutic peptides emerging as promising pharmaceutical agents. They allow new possibilities to leverage target binding sites that were previously undruggable. While deep learning (DL) has advanced peptide discovery, generating D-proteins composed of D-amino acids remains challenging due to the scarcity of natural examples. This paper proposes D-Flow, a full-atom flow-based framework for {de novo} D-peptide design. D-Flow is conditioned on receptor binding and utilizes a comprehensive representation of peptide structure, incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins, which converts L-receptors' chirality. Furthermore, we enhance D-Flow's capacity by integrating large protein language models (PLMs) with structural awareness through a lightweight structural adapter. A two-stage training pipeline and a controlling toolkit also enable D-Flow to transition from general protein design to targeted binder design while preserving pretraining knowledge. Extensive experimental results on the PepMerge benchmark demonstrate D-Flow's effectiveness, particularly in developing peptides with entire D-residues. This approach represents a significant advancement in computational D-peptide design, offering unique opportunities for bioorthogonal and stable molecular tools and diagnostics. The code is available in~\url{https://github.com/smiles724/PeptideDesign}.

翻译：蛋白质在生物过程中发挥着关键作用，治疗性肽作为有前景的药物制剂正在兴起。它们为利用先前不可成药的靶点结合位点提供了新的可能性。尽管深度学习（DL）已推进了肽的发现，但由于天然样本的稀缺，生成由D-氨基酸构成的D-蛋白质仍然具有挑战性。本文提出了D-Flow，一种基于全原子流的从头D-肽设计框架。D-Flow以受体结合为条件，并利用肽结构的全面表示，包括骨架框架、侧链角度和离散氨基酸类型。为解决D-蛋白质训练数据的缺乏，我们实施了一种镜像算法，以转换L-受体的手性。此外，我们通过轻量级结构适配器整合具有结构感知能力的大型蛋白质语言模型（PLMs），增强了D-Flow的能力。两阶段训练流程和控制工具包也使D-Flow能够从通用蛋白质设计过渡到靶向结合剂设计，同时保留预训练知识。在PepMerge基准测试上的广泛实验结果证明了D-Flow的有效性，特别是在开发具有全D-残基的肽方面。该方法代表了计算D-肽设计的重要进展，为生物正交且稳定的分子工具和诊断提供了独特机会。代码可在~\url{https://github.com/smiles724/PeptideDesign} 获取。