Macrocyclic peptides are promising therapeutic candidates for intracellular targets, but their design requires simultaneous control over non-natural monomer chemistry, ring topology, membrane permeability, and target binding. Existing SMILES- or HELM-string generative models either operate in long atom-level sequence spaces or treat monomers as symbolic tokens with limited chemical grounding. We introduce PepALD, an Autoregressive Latent Diffusion (ALD) foundation model for \textit{de novo} macrocyclic peptide generation. The model represents HELM monomers with structured chemical embeddings, generates each residue through context-conditioned diffusion in chemically informed latent space, predicts R-group-aware ring closures during autoregressive generation, and aligns the denoiser to affinity rewards using winner-protected diffusion-adapted preference optimization. In silico experiments demonstrate PepALD's generation quality and reward-optimization performance against representative peptide generation baselines.
翻译:大环肽是靶向细胞内靶点的有前景治疗候选分子,但其设计需在非天然单体化学性质、环拓扑结构、膜渗透性及靶点结合能力间实现协同控制。现有基于SMILES或HELM字符串的生成模型要么在原子级长序列空间中运行,要么将单体视为符号化标记而缺乏充分的化学基础。我们提出PepALD——一种用于从头设计大环肽的自回归潜扩散(ALD)基础模型。该模型通过结构化化学嵌入表示HELM单体,在化学知识引导的潜空间中基于上下文条件扩散生成每个残基,在自回归生成过程中预测R基团感知的环闭合反应,并采用胜者保护扩散适配偏好优化方法将去噪器与亲和力奖励对齐。计算机模拟实验表明,PepALD在生成质量与奖励优化性能方面均优于代表性肽生成基线模型。