GeoCycler: Reward-Aligned 3D Diffusion for Constraint-Conditioned Cyclic Peptide Design

Jingjie Zhang,Hanqun Cao,Haosen Shi,He Mutian,Yu Wang,Zijun Gao,Fang Wu,Xiaojun Yao,Chang-Yu Hsieh,Sinno Jialin Pan,Pranam Chatterjee,Chunbin Gu,Pheng-Ann Heng

Cyclic peptides are attractive therapeutic modalities because their closed-ring topology can improve stability and target specificity. However, de novo cyclic peptide design remains challenging for diffusion generators, as macrocyclization requires satisfying sparse, non-smooth, and compositional geometric constraints. Existing constraint-conditioned methods largely rely on inference-time guidance, which can steer samples toward desired closures but does not directly change the learned generative distribution. We propose GeoCycler, a reward-weighted diffusion alignment framework for training conditional latent diffusion models toward macrocyclization feasibility. GeoCycler introduces a type-gated stair reward that activates distance-based shaping only when prerequisite residue or linker types are satisfied, providing dense geometric feedback while avoiding misleading signals from chemically incompatible anchors. Together with positive-only reward weighting and replay-based stabilization, GeoCycler aligns a single generator across multiple cyclization topologies. On the LNR benchmark, GeoCycler improves pass@5 closure success over strong guidance-based baselines across stapled, head-to-tail, disulfide, and bicyclic settings. In particular, it improves head-to-tail success by 20.8 percentage points over CP-Composer while maintaining comparable amino-acid and backbone-dihedral statistics. These results suggest that training-time alignment to sparse geometric constraints is a promising alternative to relying solely on post hoc sampling-time correction for cyclic peptide generation.

翻译：环肽因其闭环拓扑结构可提升稳定性和靶点特异性，成为颇具吸引力的治疗模态。然而，对于扩散生成模型而言，从头设计环肽仍具挑战性，因为大环化需满足稀疏、非平滑且组合型的几何约束。现有条件约束方法多依赖推理时引导，虽能引导样本趋向理想闭合结构，却未直接改变学习到的生成分布。我们提出GeoCycler——一种奖励加权扩散对齐框架，用于训练条件潜扩散模型以优化大环化可行性。GeoCycler引入类型门控阶梯奖励机制：仅在满足前体残基或连接子类型条件时激活基于距离的塑形，从而在提供密集几何反馈的同时避免化学不相容锚点引发误导信号。结合正例奖励加权与回放稳定化策略，GeoCycler使单一生成器可适配多种环化拓扑。在LNR基准测试中，相较于强引导基线方法，GeoCycler在钉合、首尾环化、二硫键及双环四种场景下将pass@5闭合成功率提升显著。特别地，在保持可比的氨基酸与主链二面角统计特性的同时，其首尾环化成功率较CP-Composer提升20.8个百分点。上述结果表明，针对稀疏几何约束进行训练时对齐策略，是替代单纯依赖事后采样时校正的环肽生成方法的有前景方案。