We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples $X \in \mathbb{R}^{n \times d}$. Standard approaches amount to searching for a precision matrix $\Theta$ representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^{2}$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting. In this work, we adopt a compressive viewpoint and aim to estimate a sparse $\Theta$ from a sketch of the data, i.e. a low-dimensional vector of size $m \ll d^{2}$ carefully designed from $X$ using nonlinear random features. Under certain assumptions on the spectrum of $\Theta$ (or its condition number), we show that it is possible to estimate it from a sketch of size $m=\Omega((d+2k)\log(d))$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by compressed sensing theory and involve restricted isometry properties and instance optimal decoders. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
翻译:我们考虑学习一个图模型的问题,该图用于建模具有n个样本X ∈ ℝ^{n×d}的数据集中d个变量之间的统计关系。标准方法需要搜索一个代表高斯图模型的精度矩阵Θ,以充分解释数据。然而,大多数基于最大似然的估计器通常需要存储经验协方差矩阵的d²个值,这在高维场景下可能变得不可行。在本工作中,我们采用压缩视角,旨在从数据的草图(即通过非线性随机特征从X精心设计的m ≪ d²维低维向量)中估计稀疏的Θ。在关于Θ谱(或其条件数)的特定假设下,我们证明可以从大小为m=Ω((d+2k)log(d))的草图中估计出Θ,其中k是底层图的最大边数。这些信息论保证受压缩感知理论启发,涉及限制等距性质和实例最优解码器。我们研究了通过基于图形套索的迭代算法(视为特定去噪器)实现实际恢复的可能性。我们将我们的方法与图形套索在合成数据集上进行比较,证明了即使在数据集被压缩的情况下,该方法也具有优越的性能。