In this paper, we study the Bayesian multi-task variable selection problem, where the goal is to select activated variables for multiple related data sets simultaneously. Our proposed method generalizes the spike-and-slab prior to multiple data sets, and we prove its posterior consistency in high-dimensional regimes. To calculate the posterior distribution, we propose a novel variational Bayes algorithm based on the recently developed "sum of single effects" model of Wang et al. (2020). Finally, motivated by differential gene network analysis in biology, we extend our method to joint learning of multiple directed acyclic graphical models. Both simulation studies and real gene expression data analysis are conducted to show the effectiveness of the proposed method.
翻译:本文研究贝叶斯多任务变量选择问题,其目标在于同时从多个相关数据集中识别激活变量。我们提出的方法将尖峰-平板先验推广至多数据集场景,并证明了该方法在高维条件下的后验一致性。为了计算后验分布,我们基于Wang等(2020)近期提出的"单效应之和"模型,开发了一种新颖的变分贝叶斯算法。最后,受生物学中差异基因网络分析的启发,我们将该方法扩展至多个有向无环图模型的联合学习。通过仿真实验与真实基因表达数据分析,验证了所提方法的有效性。