We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X \in \mathbb{R}^{n \times d}$. Standard approaches amount to searching for a precision matrix $\Theta$ representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^{2}$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting. In this work, we adopt a compressive viewpoint and aim to estimate a sparse $\Theta$ from a \emph{sketch} of the data, i.e. a low-dimensional vector of size $m \ll d^{2}$ carefully designed from $X$ using non-linear random features. Under certain assumptions on the spectrum of $\Theta$ (or its condition number), we show that it is possible to estimate it from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by compressed sensing theory and involve restricted isometry properties and instance optimal decoders. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
翻译:我们考虑从具有 $n$ 个样本的数据集 $X \in \mathbb{R}^{n \times d}$ 中学习一个表征 $d$ 个变量间统计关系的图模型问题。标准方法在于搜索一个代表高斯图模型的精度矩阵 $\Theta$,使其能充分解释数据。然而,大多数基于最大似然的估计器通常需要存储经验协方差矩阵的 $d^{2}$ 个值,这在高维场景下可能变得难以实现。本研究采用压缩视角,旨在从数据的\textit{草图}(即利用非线性随机特征从 $X$ 精心设计的、大小为 $m \ll d^{2}$ 的低维向量)中估计稀疏的 $\Theta$。在 $\Theta$ 的谱(或其条件数)满足特定假设的条件下,我们证明可以从大小为 $m=\Omega\left((d+2k)\log(d)\right)$ 的草图中对其进行估计,其中 $k$ 为底层图的最大边数。这些信息论保证受压缩感知理论启发,涉及受限等距性质和实例最优解码器。我们探讨了基于图形套索(作为一种特定去噪器)的迭代算法实现实用恢复的可能性。在合成数据集上,我们比较了本方法与图形套索,证明了即使数据集被压缩,本方法仍具有优越性能。