High-dimensional data often arise from clinical genomics research to infer relevant predictors of a particular trait. A way to improve the predictive performance is to include information on the predictors derived from prior knowledge or previous studies. Such information is also referred to as ``co-data''. To this aim, we develop a novel Bayesian model for including co-data in a high-dimensional regression framework, called Informative Horseshoe regression (infHS). The proposed approach regresses the prior variances of the regression parameters on the co-data variables, improving variable selection and prediction. We implement both a Gibbs sampler and a Variational approximation algorithm. The former is suited for applications of moderate dimensions which, besides prediction, target posterior inference, whereas the computational efficiency of the latter allows handling a very large number of variables. We show the benefits from including co-data with a simulation study. Eventually, we demonstrate that infHS outperforms competing approaches for two genomics applications.
翻译:临床基因组学研究常涉及高维数据,以推断特定性状的相关预测因子。提升预测性能的一种途径是引入基于先验知识或既往研究的预测因子信息,此类信息亦被称为"协数据"。为此,我们提出一种新型贝叶斯模型——知识型Horseshoe回归(infHS),用于在高维回归框架中整合协数据。该方法通过将回归参数的先验方差对协数据变量进行回归建模,从而优化变量选择与预测性能。我们同时实现了吉布斯采样器与变分近似算法:前者适用于中等维度且需兼顾预测与后验推断的场景,后者则凭借计算效率优势可处理海量变量。通过仿真研究验证了引入协数据的效益,并在两项基因组学应用中证明infHS的性能优于现有对比方法。