In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.
翻译:本文提出了一种新颖的贝叶斯特征选择回归方法——图估计变效应回归(VERGE)。我们的模型具有若干关键特性,使其能够有效利用基因组学或影像学研究数据集中存在的复杂结构。我们区分了预测因子(即用于结果预测模型的特征)与个体层面协变量(其调节预测因子对结果的影响)。我们构建了一个变系数建模框架,在该框架中推断预测变量之间的网络关系,并利用该网络信息促进相关预测因子的选择。我们采用变量选择的尖峰-厚板先验,该先验能够同时选择网络关联的预测变量以及调节预测因子效应的协变量。通过模拟研究,我们证明该方法在特征选择与预测准确性方面均优于现有替代方法。我们通过一个应用案例阐释VERGE方法,该案例旨在表征肠道微生物组特征对肥胖的影响,我们识别出一组微生物分类单元及其生态依赖关系。我们允许包括性别和膳食摄入变量在内的个体层面协变量调节微生物组预测因子的系数,从而为这些因素之间的相互作用提供了更深入的见解。