Effective population size (Ne(t)) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods enable the inference of Ne(t) trajectories through time from phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking Ne(t) to external covariates. Existing approaches typically impose log-linear relationships between covariates and Ne(t), which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant Ne(t) through a Gaussian process (GP) prior. The GP, a distribution over functions, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-Ne(t) relationships, mitigates bias under nonlinear associations, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in Ne(t) trajectories. Through simulation studies and three empirical applications - yellow fever virus dynamics in Brazil (2016-2018), late-Quaternary musk ox demography, and HIV-1 CRF02-AG evolution in Cameroon - we demonstrate that our method both confirms linear relationships where appropriate and reveals nonlinear covariate effects that would otherwise be missed or mischaracterized. This framework advances phylodynamic inference by enabling more accurate and biologically realistic modeling of how environmental and epidemiological factors shape population size through time.
翻译:有效种群大小(Ne(t))是种群遗传学和系统动力学中的一个基本参数,用于量化遗传多样性并揭示种群历史。基于溯祖理论的方法能够从分子序列数据重建的系统发育树中推断Ne(t)随时间变化的轨迹。理解种群动态的生态与环境驱动因素需要将Ne(t)与外部协变量相关联。现有方法通常假设协变量与Ne(t)之间存在对数线性关系,这可能无法捕捉复杂的生物学过程,并在真实关系为非线性时引入偏差。本文提出一种灵活的贝叶斯框架,通过高斯过程先验将协变量整合到具有分段常数Ne(t)的溯祖模型中。高斯过程作为一种函数分布,能够自然地容纳非线性协变量效应,而无需引入限制性的参数假设。该框架改进了协变量与Ne(t)关系的估计,减轻了非线性关联下的偏差,并提供了在协变量空间内可解释的不确定性量化。为平衡全局协变量驱动模式与局部时间动态,我们将高斯过程先验与高斯马尔可夫随机场耦合,以强制Ne(t)轨迹的平滑性。通过模拟研究及三个实证应用——巴西黄热病毒动态(2016-2018)、晚第四纪麝牛种群历史,以及喀麦隆HIV-1 CRF02-AG毒株的演化——我们证明本方法既能在适当时确认线性关系,又能揭示原本可能被忽略或错误表征的非线性协变量效应。该框架通过实现更准确且生物学上更真实的环境与流行病学因素如何随时间塑造种群规模的建模,推动了系统动力学推断的发展。