Estimating the sharing of genetic effects across different conditions is important to many statistical analyses of genomic data. The patterns of sharing arising from these data are often highly heterogeneous. To flexibly model these heterogeneous sharing patterns, Urbut et al. (2019) proposed the multivariate adaptive shrinkage (MASH) method to jointly analyze genetic effects across multiple conditions. However, multivariate analyses using MASH (as well as other multivariate analyses) require good estimates of the sharing patterns, and estimating these patterns efficiently and accurately remains challenging. Here we describe new empirical Bayes methods that provide improvements in speed and accuracy over existing methods. The two key ideas are: (1) adaptive regularization to improve accuracy in settings with many conditions; (2) improving the speed of the model fitting algorithms by exploiting analytical results on covariance estimation. In simulations, we show that the new methods provide better model fits, better out-of-sample performance, and improved power and accuracy in detecting the true underlying signals. In an analysis of eQTLs in 49 human tissues, our new analysis pipeline achieves better model fits and better out-of-sample performance than the existing MASH analysis pipeline. We have implemented the new methods, which we call ``Ultimate Deconvolution'', in an R package, udr, available on GitHub.
翻译:在不同条件下共享遗传效应的估计对于基因组数据的许多统计分析至关重要。这些数据中产生的共享模式通常高度异质。为灵活建模这些异质共享模式,Urbut等人(2019)提出了多元自适应收缩(MASH)方法,以联合分析多条件下的遗传效应。然而,使用MASH进行的多元分析(以及其他多元分析)需要良好的共享模式估计,而高效准确地估计这些模式仍具挑战性。本文描述了新的经验贝叶斯方法,其在速度和精度上均优于现有方法。两个关键创新是:(1) 自适应正则化以提升多条件场景下的估计精度;(2) 通过利用协方差估计的解析结果改进模型拟合算法的速度。模拟实验表明,新方法能提供更优的模型拟合、更好的样本外表现,并在检测真实潜在信号时具有更高的统计功效和精度。在对49个人类组织eQTL的分析中,我们的新分析流程相较于现有MASH分析流程实现了更优的模型拟合和样本外表现。我们将这些新方法命名为"终极反卷积"(Ultimate Deconvolution),并已在GitHub上发布R包udr。