Proteins play crucial roles in biological processes, with therapeutic peptides emerging as promising pharmaceutical agents. They allow new possibilities to leverage target binding sites that were previously undruggable. While deep learning (DL) has advanced peptide discovery, generating D-proteins composed of D-amino acids remains challenging due to the scarcity of natural examples. This paper proposes D-Flow, a full-atom flow-based framework for {de novo} D-peptide design. D-Flow is conditioned on receptor binding and utilizes a comprehensive representation of peptide structure, incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins, which converts L-receptors' chirality. Furthermore, we enhance D-Flow's capacity by integrating large protein language models (PLMs) with structural awareness through a lightweight structural adapter. A two-stage training pipeline and a controlling toolkit also enable D-Flow to transition from general protein design to targeted binder design while preserving pretraining knowledge. Extensive experimental results on the PepMerge benchmark demonstrate D-Flow's effectiveness, particularly in developing peptides with entire D-residues. This approach represents a significant advancement in computational D-peptide design, offering unique opportunities for bioorthogonal and stable molecular tools and diagnostics. The code is available in~\url{https://github.com/smiles724/PeptideDesign}.
翻译:蛋白质在生物过程中扮演着关键角色,其中治疗性肽作为有前景的药物制剂正崭露头角。它们为利用先前难以成药的靶点结合位点提供了新的可能性。尽管深度学习(DL)已推动肽发现取得进展,但由于天然样本稀缺,生成由D-氨基酸构成的D-蛋白仍具挑战。本文提出D-Flow,一种基于全原子流的从头D-肽设计框架。D-Flow以受体结合为条件,并采用包含骨架框架、侧链角度和离散氨基酸类型的全面肽结构表征。为解决D-蛋白训练数据匮乏问题,我们实现了镜像算法以转换L-受体的手性。此外,我们通过轻量级结构适配器整合具有结构感知能力的大型蛋白质语言模型(PLMs),以增强D-Flow的建模能力。两阶段训练流程与控制工具包还使D-Flow能够从通用蛋白质设计转向靶向结合剂设计,同时保留预训练知识。在PepMerge基准测试上的大量实验结果证明了D-Flow的有效性,尤其是在开发全D-残基肽方面。该方法代表了计算D-肽设计领域的重大进展,为生物正交且稳定的分子工具与诊断技术提供了独特机遇。代码发布于~\url{https://github.com/smiles724/PeptideDesign}。