We discuss the use of empirical Bayes for data integration, in the sense of transfer learning. Our main interest is in settings where one wishes to learn structure (e.g. feature selection) and one only has access to incomplete data from previous studies, such as summaries, estimates or lists of relevant features. We discuss differences between full Bayes and empirical Bayes, and develop a computational framework for the latter. We discuss how empirical Bayes attains consistent variable selection under weaker conditions (sparsity and betamin assumptions) than full Bayes and other standard criteria do, and how it attains faster convergence rates. Our high-dimensional regression examples show that fully Bayesian inference enjoys excellent properties, and that data integration with empirical Bayes can offer moderate yet meaningful improvements in practice.
翻译:本文探讨了经验贝叶斯方法在数据集成(即迁移学习范畴)中的应用。我们主要关注以下场景:当研究者希望学习数据结构(例如特征选择),但仅能获取先前研究中的不完整数据(如统计摘要、参数估计值或相关特征列表)时。我们系统比较了完全贝叶斯与经验贝叶斯方法的差异,并为后者构建了计算框架。研究证明,在较宽松的条件(稀疏性与β最小值假设)下,经验贝叶斯方法相较于完全贝叶斯及其他标准准则能实现更稳健的变量选择一致性,并获得更快的收敛速率。通过高维回归实例表明:完全贝叶斯推断具有优良的统计特性,而结合经验贝叶斯的数据集成方法在实践中能产生适度但具有实际意义的性能提升。