Proteins play crucial roles in biological processes, with therapeutic peptides emerging as promising pharmaceutical agents. They allow for new possibilities to leverage target binding sites that were previously undruggable. Although deep learning has advanced peptide discovery, generating D-proteins composed of D-amino acids remains challenging because of the scarcity of natural examples. This paper proposes D-Flow, a full-atom flow-based framework for de novo D-peptide design. D-Flow is conditioned on receptor binding and utilizes a comprehensive representation of peptide structure, incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins, which convert the chirality of L-receptors. Furthermore, we enhance D-Flow's capacity by integrating large protein language models with structural awareness through a lightweight structural adapter. A two-stage training pipeline and a controlling toolkit also enable D-Flow to transition from a general protein design to a targeted binder design while preserving pre-training knowledge. Extensive experimental results on the PepMerge benchmark demonstrate D-Flow's effectiveness, particularly in developing peptides with entire D-residues. This approach represents a significant advancement in computational D-peptide design, offering unique opportunities for bioorthogonal and stable molecular tools and diagnostics. The code is available in https://github.com/smiles724/PeptideDesign.
翻译:蛋白质在生物过程中扮演着关键角色,其中治疗性肽作为有前景的药物分子崭露头角。它们为利用先前难以成药的靶点结合位点提供了新的可能性。尽管深度学习已推动肽发现领域取得进展,但由于天然样本稀缺,生成由D-氨基酸构成的D-蛋白仍具挑战性。本文提出D-Flow,一种基于全原子流的从头设计D-肽框架。D-Flow以受体结合为条件,采用涵盖骨架框架、侧链角度和离散氨基酸类型的全面肽结构表征。为解决D-蛋白训练数据匮乏的问题,我们实现了镜像转换算法,该算法可转换L-受体的手性。此外,我们通过轻量级结构适配器整合具有结构感知能力的大规模蛋白质语言模型,从而增强D-Flow的建模能力。两阶段训练流程与控制工具包还使D-Flow能够从通用蛋白质设计转向靶向结合剂设计,同时保留预训练知识。在PepMerge基准测试上的大量实验结果证明了D-Flow的有效性,特别是在开发全D-残基肽方面。该方法代表了计算D-肽设计领域的重大进展,为生物正交且稳定的分子工具与诊断技术提供了独特机遇。代码发布于https://github.com/smiles724/PeptideDesign。