Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture background biological information on the intricate relations among the relevant genes. In this paper, we initiate a study of Bayes optimal learning in high-dimensional linear regression with network side information. To this end, we first introduce a simple generative model (called the Reg-Graph model) which posits a joint distribution for the supervised data and the observed network through a common set of latent parameters. Next, we introduce an iterative algorithm based on Approximate Message Passing (AMP) which is provably Bayes optimal under very general conditions. In addition, we characterize the limiting mutual information between the latent signal and the data observed, and thus precisely quantify the statistical impact of the network side information. Finally, supporting numerical experiments suggest that the introduced algorithm has excellent performance in finite samples.
翻译:在基因组学、蛋白质组学和神经科学等应用中,频繁出现伴随网络形式边信息的监督学习问题。例如,在遗传学应用中,网络边信息能够精确捕捉相关基因间复杂关系的背景生物学信息。本文首次系统研究了带有网络边信息的高维线性回归中的贝叶斯最优学习问题。为此,我们首先引入一个简单的生成模型(称为Reg-Graph模型),该模型通过一组共同潜在参数对监督数据与观测网络建立联合分布。随后,我们提出一种基于近似消息传递的迭代算法,该算法在极一般条件下具有可证明的贝叶斯最优性。此外,我们刻画了潜在信号与观测数据之间的极限互信息,从而精确定量评估网络边信息的统计影响。最后,支持性数值实验表明,该算法在有限样本下具有卓越性能。