In this paper, we propose semiparametric efficient estimators of genetic relatedness between two traits in a model-free framework. Most existing methods require specifying certain parametric models involving the traits and genetic variants. However, the bias due to model misspecification may yield misleading statistical results. Moreover, the semiparametric efficient bounds for estimators of genetic relatedness are still lacking. In this paper, we develop semiparametric efficient estimators with machine learning methods and construct valid confidence intervals for two important measures of genetic relatedness: genetic covariance and genetic correlation, allowing both continuous and discrete responses. Based on the derived efficient influence functions of genetic relatedness, we propose a consistent estimator of the genetic covariance as long as one of genetic values is consistently estimated. The data of two traits may be collected from the same group or different groups of individuals. Various numerical studies are performed to illustrate our introduced procedures. We also apply proposed procedures to analyze Carworth Farms White mice genome-wide association study data.
翻译:本文在无模型框架下提出了两种性状之间遗传相关性的半参数有效估计量。现有方法大多需要指定涉及性状和遗传变异的特定参数模型。然而,模型设定错误导致的偏差可能产生误导性的统计结果。此外,遗传相关性估计量的半参数有效界仍未有明确定论。本文利用机器学习方法发展了半参数有效估计量,并为遗传协方差和遗传相关性这两个重要遗传关联度量构建了有效置信区间,该方法同时适用于连续型和离散型响应变量。基于推导出的遗传相关性有效影响函数,我们提出只要其中一个遗传值能被一致估计,即可得到遗传协方差的一致性估计量。两个性状的数据可以来自相同或不同个体群体。通过多项数值研究验证了所提方法的有效性,并将该方法应用于Carworth Farms White小鼠全基因组关联研究数据的分析。