Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture background biological information on the intricate relations among the relevant genes. In this paper, we initiate a study of Bayes optimal learning in high-dimensional linear regression with network side information. To this end, we first introduce a simple generative model (called the Reg-Graph model) which posits a joint distribution for the supervised data and the observed network through a common set of latent parameters. Next, we introduce an iterative algorithm based on Approximate Message Passing (AMP) which is provably Bayes optimal under very general conditions. In addition, we characterize the limiting mutual information between the latent signal and the data observed, and thus precisely quantify the statistical impact of the network side information. Finally, supporting numerical experiments suggest that the introduced algorithm has excellent performance in finite samples.
翻译:在基因组学、蛋白质组学和神经科学等应用中,经常出现以网络形式提供侧信息的监督学习问题。例如在遗传学应用中,网络侧信息能够准确捕捉相关基因间复杂关系的背景生物学信息。本文首次系统研究具有网络侧信息的高维线性回归中的贝叶斯最优学习。为此,我们首先提出一个简单生成模型(称为Reg-Graph模型),该模型通过一组公共潜在参数建立监督数据与观测网络的联合分布。其次,我们引入一种基于近似消息传递(AMP)的迭代算法,该算法在非常一般的条件下可证明达到贝叶斯最优。此外,我们刻画了潜在信号与观测数据之间的极限互信息,从而精确量化网络侧信息的统计影响。最后,支持性数值实验表明,所提算法在有限样本下具有优异性能。