In this paper, we propose double machine learning procedures to estimate genetic relatedness between two traits in a model-free framework. Most existing methods require specifying certain parametric models involving the traits and genetic variants. However, the bias due to model mis-specification may yield misleading statistical results. Moreover, the semiparametric efficient bounds for estimators of genetic relatedness are still lacking. In this paper, we develop semi-parametric efficient and model-free estimators and construct valid confidence intervals for two important measures of genetic relatedness: genetic covariance and genetic correlation, allowing both continuous and discrete responses. Based on the derived efficient influence functions of genetic relatedness, we propose a consistent estimator of the genetic covariance as long as one of genetic values is consistently estimated. The data of two traits may be collected from the same group or different groups of individuals. Various numerical studies are performed to illustrate our introduced procedures. We also apply proposed procedures to analyze Carworth Farms White mice genome-wide association study data.
翻译:本文提出双机器学习程序,用于在无模型框架下估计两个性状之间的遗传关联性。现有方法大多需要指定涉及性状和遗传变体的特定参数模型,但模型误设导致的偏差可能产生误导性统计结果。此外,遗传关联性估计量的半参数有效界仍属空白。本文针对遗传协方差和遗传相关性这两项重要遗传关联性度量,开发了半参数有效且无需模型假设的估计方法,并构建了可靠的置信区间,该方法同时适用于连续型和离散型响应变量。基于推导出的遗传关联性有效影响函数,只要其中一种遗传值能被一致估计,我们即可获得遗传协方差的一致估计量。两个性状的数据可来自同一群体或不同群体。通过多项数值研究验证了所提方法的有效性,并将该方法应用于Carworth Farms White小鼠全基因组关联研究数据分析。