We construct a reliable estimation of evolutionary parameters within the Wright-Fisher model, which describes changes in allele frequencies due to selection and genetic drift, from time-series data. Such data exists for biological populations, for example via artificial evolution experiments, and for the cultural evolution of behavior, such as linguistic corpora that document historical usage of different words with similar meanings. Our method of analysis builds on a Beta-with-Spikes approximation to the distribution of allele frequencies predicted by the Wright-Fisher model. We introduce a self-contained scheme for estimating the parameters in the approximation, and demonstrate its robustness with synthetic data, especially in the strong-selection and near-extinction regimes where previous approaches fail. We further apply to allele frequency data for baker's yeast (Saccharomyces cerevisiae), finding a significant signal of selection in cases where independent evidence supports such a conclusion. We further demonstrate the possibility of detecting time-points at which evolutionary parameters change in the context of a historical spelling reform in the Spanish language.
翻译:我们构建了一种基于Wright-Fisher模型(该模型描述由选择和遗传漂变导致的等位基因频率变化)从时间序列数据中对进化参数进行可靠估计的方法。这类数据存在于生物群体中(例如通过人工进化实验)以及文化进化行为中(例如记录具有相似含义的不同词汇历史使用情况的语言语料库)。我们的分析方法基于Wright-Fisher模型预测的等位基因频率分布的Beta-脉冲近似。我们提出了一种自包含方案来估计该近似中的参数,并通过合成数据展示了其鲁棒性,特别是在先前方法失效的强选择与近灭绝情景下。我们进一步将方法应用于酿酒酵母(Saccharomyces cerevisiae)的等位基因频率数据,在独立证据支持选择结论的案例中检测到显著的选择信号。此外,我们还展示了在西班牙语历史拼写改革背景下检测进化参数变化时间点的可能性。