Antimicrobial peptide discovery is challenged by the astronomical size of peptide space and the relative scarcity of active peptides. Generative models provide continuous latent "maps" of peptide space, but conventionally ignore decoder-induced geometry and rely on flat Euclidean metrics, rendering exploration and optimization distorted and inefficient. Prior manifold-based remedies assume fixed intrinsic dimensionality, which critically fails in practice for peptide data. Here, we introduce PepCompass, a geometry-aware framework for peptide exploration and optimization. At its core, we define a Union of $\kappa$-Stable Riemannian Manifolds $\mathbb{M}^{\kappa}$, a family of decoder-induced manifolds that captures local geometry while ensuring computational stability. We propose two local exploration methods: Second-Order Riemannian Brownian Efficient Sampling, which provides a convergent second-order approximation to Riemannian Brownian motion, and Mutation Enumeration in Tangent Space, which reinterprets tangent directions as discrete amino-acid substitutions. Combining these yields Local Enumeration Bayesian Optimization (LE-BO), an efficient algorithm for local activity optimization. Finally, we introduce Potential-minimizing Geodesic Search (PoGS), which interpolates between prototype embeddings along property-enriched geodesics, biasing discovery toward seeds, i.e. peptides with favorable activity. In-vitro validation confirms the effectiveness of PepCompass: PoGS yields four novel seeds, and subsequent optimization with LE-BO discovers 25 highly active peptides with broad-spectrum activity, including against resistant bacterial strains. These results demonstrate that geometry-informed exploration provides a powerful new paradigm for antimicrobial peptide design.
翻译:抗菌肽的发现面临肽空间规模庞大与活性肽相对稀缺的挑战。生成模型虽能提供肽空间的连续潜在“图谱”,但传统方法忽略解码器诱导的几何结构,依赖平坦的欧几里得度量,导致探索与优化过程存在扭曲且效率低下。先前基于流形的改进方法假设固定的本征维度,这在肽数据实践中存在严重缺陷。本文提出PepCompass——一种几何感知的肽探索与优化框架。其核心是定义κ-稳定黎曼流形并集$\mathbb{M}^{\kappa}$,该解码器诱导的流形族能捕捉局部几何特征,同时确保计算稳定性。我们提出两种局部探索方法:二阶黎曼布朗高效采样(提供黎曼布朗运动的收敛二阶近似)与切空间突变枚举(将切方向重新解释为离散氨基酸替换)。结合二者形成局部枚举贝叶斯优化(LE-BO),这是一种用于局部活性优化的高效算法。最后,我们提出势最小化测地线搜索(PoGS),该方法沿属性增强的测地线在原型嵌入间进行插值,使发现过程偏向种子肽(即具有优良活性的肽)。体外实验验证了PepCompass的有效性:PoGS产生四种新型种子肽,后续通过LE-BO优化发现25种具有广谱活性的高活性肽(包括针对耐药菌株的活性)。这些结果表明,几何感知的探索为抗菌肽设计提供了强大的新范式。