In the Admixture Model, the probability that an individual carries a certain allele at a specific marker depends on the allele frequencies in $K$ ancestral populations and the proportion of the individual's genome originating from these populations. The markers are assumed to be independent. The Linkage Model is a Hidden Markov Model (HMM) that extends the Admixture Model by incorporating linkage between neighboring loci. We prove consistency and asymptotic normality of maximum likelihood estimators (MLEs) for the ancestry of individuals in the Linkage Model, complementing earlier results by \citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025} for the Admixture Model. These results are used to prove that a statistical test that allows for model selection between the Admixture Model and the Linkage Model is an asymptotic level-$\alpha$-test. Finally, we demonstrate the practical relevance of our results by applying the test to real-world data from the 1000 Genomes Project.
翻译:在混合模型中,个体在特定标记位点携带某等位基因的概率取决于$K$个祖先群体的等位基因频率以及个体基因组源自这些群体的比例。各标记位点被假定为相互独立。连锁模型是一种隐马尔可夫模型,它通过纳入相邻位点间的连锁关系扩展了混合模型。我们证明了连锁模型中个体祖先成分最大似然估计量的一致性与渐近正态性,这补充了\citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025}针对混合模型的已有结果。基于这些结论,我们证明了用于混合模型与连锁模型间模型选择的统计检验是渐近水平$\alpha$检验。最后,我们通过将检验应用于千人基因组计划的真实数据,展示了研究结果的实际意义。