This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the development of advanced robust statistical models tailored for genomic prediction from single nucleotide polymorphism data in plant and animal breeding and multi-field trials. The manuscript highlights the significance of incorporating all estimated effects of marker loci into the statistical framework and aiming to reduce the high dimensionality of data while preserving critical information. This paper introduces a new robust statistical framework for genomic prediction, employing one-stage and two-stage linear mixed model analyses along with utilizing the popular robust minimum density power divergence estimator (MDPDE) to estimate genetic effects on phenotypic traits. The study illustrates the superior performance of the proposed MDPDE-based genomic prediction and associated heritability estimation procedures over existing competitors through extensive empirical experiments on artificial datasets and application to a real-life maize breeding dataset. The results showcase the robustness and accuracy of the proposed MDPDE-based approaches, especially in the presence of data contamination, emphasizing their potential applications in improving breeding programs and advancing genomic prediction of phenotyping traits.
翻译:本手稿深入探讨了基因组学与表型预测的交叉领域,重点关注应对噪声协变量和混杂因素引入的复杂性所需的统计创新。主要侧重于开发先进的稳健统计模型,专门用于植物和动物育种及多环境试验中基于单核苷酸多态性数据的基因组预测。手稿强调了将标记位点的所有估计效应纳入统计框架的重要性,并致力于在保留关键信息的同时降低数据的高维性。本文引入了一种新的稳健基因组预测统计框架,采用单阶段和双阶段线性混合模型分析,并利用流行的稳健最小密度功率散度估计量(MDPDE)来估计表型性状的遗传效应。研究通过对人工数据集的广泛实证实验以及在实际玉米育种数据集上的应用,证明了所提出的基于MDPDE的基因组预测及相关遗传力估计方法相较于现有竞争方法的优越性能。结果展示了所提出的基于MDPDE方法的稳健性和准确性,特别是在数据存在污染的情况下,凸显了其在改良育种计划和推进表型性状基因组预测方面的潜在应用价值。