Graphical model selection is a seemingly impossible task when many pairs of variables are never jointly observed; this requires inference of conditional dependencies with no observations of corresponding marginal dependencies. This under-explored statistical problem arises in neuroimaging, for example, when different partially overlapping subsets of neurons are recorded in non-simultaneous sessions. We call this statistical challenge the "Graph Quilting" problem. We study this problem in the context of sparse inverse covariance learning, and focus on Gaussian graphical models where we show that missing parts of the covariance matrix yields an unidentifiable precision matrix specifying the graph. Nonetheless, we show that, under mild conditions, it is possible to correctly identify edges connecting the observed pairs of nodes. Additionally, we show that we can recover a minimal superset of edges connecting variables that are never jointly observed. Thus, one can infer conditional relationships even when marginal relationships are unobserved, a surprising result! To accomplish this, we propose an $\ell_1$-regularized partially observed likelihood-based graph estimator and provide performance guarantees in population and in high-dimensional finite-sample settings. We illustrate our approach using synthetic data, as well as for learning functional neural connectivity from calcium imaging data.
翻译:图模型选择在变量对从未被联合观测时看似不可能完成的任务;这要求在没有对应边际依赖观测的情况下推断条件依赖关系。这一未被充分探索的统计问题出现在神经影像学中,例如,当不同部分重叠的神经元子集在非同步实验中记录时。我们将这一统计挑战称为"图拼接"问题。我们在稀疏逆协方差学习的背景下研究该问题,重点关注高斯图模型,并证明协方差矩阵缺失部分会导致指定图的精度矩阵不可识别。尽管如此,我们证明在温和条件下,可以正确识别连接观测节点对的边。此外,我们证明可以恢复连接从未被联合观测的变量的最小超集。因此,即使在边际关系未被观测的情况下也能推断条件关系,这一结果令人惊讶!为此,我们提出了一种基于ℓ1正则化部分观测似然的图估计量,并在总体以及高维有限样本场景中提供了性能保证。我们通过合成数据以及从钙成像数据学习功能神经连接性来展示我们的方法。