Common deep learning approaches for antibody engineering focus on modeling the marginal distribution of sequences. By treating sequences as independent samples, however, these methods overlook affinity maturation as a rich and largely untapped source of information about the evolutionary process by which antibodies explore the underlying fitness landscape. In contrast, classical phylogenetic models explicitly represent evolutionary dynamics but lack the expressivity to capture complex epistatic interactions. We bridge this gap with CoSiNE, a continuous-time Markov chain parameterized by a deep neural network. Mathematically, we prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process, capturing epistatic effects with an error bound that is quadratic in branch length. Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction by explicitly disentangling selection from context-dependent somatic hypermutation. Finally, we introduce Guided Gillespie, a classifier-guided sampling scheme that steers CoSiNE at inference time, enabling efficient optimization of antibody binding affinity toward specific antigens.
翻译:当前抗体工程中常见的深度学习方法主要聚焦于序列边缘分布的建模。然而,这些方法将序列视为独立样本,忽视了亲和力成熟过程——这一过程作为抗体探索底层适应度景观的进化机制,蕴含着丰富且尚未被充分利用的信息。相比之下,经典系统发育模型虽能明确表征进化动力学,却缺乏捕捉复杂上位相互作用的表现能力。我们通过CoSiNE(一种由深度神经网络参数化的连续时间马尔可夫链)来弥合这一差距。在数学上,我们证明CoSiNE为难以处理的连续点突变过程提供了一阶近似,其捕捉上位效应的误差界限与分支长度的平方成正比。实证研究表明,CoSiNE通过显式解耦自然选择与上下文依赖的体细胞超突变,在零样本变异效应预测任务中超越了当前最先进的语言模型。最后,我们提出引导式吉莱斯皮采样方案,该方案在推理阶段引导CoSiNE的生成过程,实现了针对特定抗原的抗体结合亲和力高效优化。