Studying conditional independence structure among many variables with few observations is a challenging task. Gaussian Graphical Models (GGMs) tackle this problem by encouraging sparsity in the precision matrix through an $l_p$ regularization with $p\leq1$. However, since the objective is highly non-convex for sub-$l_1$ pseudo-norms, most approaches rely on the $l_1$ norm. In this case frequentist approaches allow to elegantly compute the solution path as a function of the shrinkage parameter $\lambda$. Instead of optimizing the penalized likelihood, the Bayesian formulation introduces a Laplace prior on the precision matrix. However, posterior inference for different $\lambda$ values requires repeated runs of expensive Gibbs samplers. We propose a very general framework for variational inference in GGMs that unifies the benefits of frequentist and Bayesian frameworks. Specifically, we propose to approximate the posterior with a matrix-variate Normalizing Flow defined on the space of symmetric positive definite matrices. As a key improvement on previous work, we train a continuum of sparse regression models jointly for all regularization parameters $\lambda$ and all $l_p$ norms, including non-convex sub-$l_1$ pseudo-norms. This is achieved by conditioning the flow on $p>0$ and on the shrinkage parameter $\lambda$. We have then access with one model to (i) the evolution of the posterior for any $\lambda$ and for any $l_p$ (pseudo-) norms, (ii) the marginal log-likelihood for model selection, and (iii) we can recover the frequentist solution paths as the MAP, which is obtained through simulated annealing.
翻译:在观测数据稀少的情况下研究多个变量间的条件独立结构是一项具有挑战性的任务。高斯图模型通过采用 $p\leq1$ 的 $l_p$ 正则化促使精度矩阵稀疏化来解决该问题。然而,由于在次 $l_1$ 伪范数下目标函数高度非凸,大多数方法依赖 $l_1$ 范数。此时频率学派方法能够优雅地计算作为收缩参数 $\lambda$ 函数的解路径。不同于优化惩罚似然函数,贝叶斯方法在精度矩阵上引入拉普拉斯先验分布。但针对不同 $\lambda$ 值的后验推断需要重复运行昂贵的吉布斯采样器。我们提出了一个非常通用的高斯图模型变分推断框架,统一了频率学派和贝叶斯方法的优势。具体而言,我们提出在对称正定矩阵空间上定义矩阵变量归一化流来近似后验分布。作为对先前工作的关键改进,我们联合训练一个连续统的稀疏回归模型,使其适用于所有正则化参数 $\lambda$ 和所有 $l_p$ 范数(包括非凸的次 $l_1$ 伪范数)。这是通过将流条件化为 $p>0$ 和收缩参数 $\lambda$ 来实现的。由此单一模型即可提供:(i) 任意 $\lambda$ 值和任意 $l_p$(伪)范数下的后验演化过程,(ii) 用于模型选择的边际对数似然值,(iii) 通过模拟退火获得最大后验估计,进而恢复频率学派的解路径。