Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Gaussian graphical models can be used to extract conditional dependencies between the features of the dataset. This is often done by making an independence assumption about the samples, but this assumption is rarely satisfied in reality. However, state-of-the-art approaches that avoid this assumption are not scalable, with $O(n^3)$ runtime and $O(n^2)$ space complexity. In this paper, we introduce a method that has $O(n^2)$ runtime and $O(n)$ space complexity, without assuming independence. We validate our model on both synthetic and real-world datasets, showing that our method's accuracy is comparable to that of prior work We demonstrate that our approach can be used on unprecedentedly large datasets, such as a real-world 1,000,000-cell scRNA-seq dataset; this was impossible with previous approaches. Our method maintains the flexibility of prior work, such as the ability to handle multi-modal tensor-variate datasets and the ability to work with data of arbitrary marginal distributions. An additional advantage of our method is that, unlike prior work, our hyperparameters are easily interpretable.

翻译：高斯图模型可用于提取数据集中特征间的条件依赖关系。现有方法通常假设样本间相互独立，但这一假设在现实中很少成立。然而，避免该假设的现有前沿方法缺乏可扩展性，其时间复杂度为$O(n^3)$，空间复杂度为$O(n^2)$。本文提出一种无需独立性假设的方法，其时间复杂度为$O(n^2)$，空间复杂度为$O(n)$。我们在合成数据集和真实数据集上验证了模型性能，结果表明本方法的精度与现有工作相当。我们证明了该方法能够处理前所未有的大规模数据集，例如包含100万个细胞的真实单细胞RNA测序数据集——这是以往方法无法实现的。本方法保持了现有工作的灵活性，例如能够处理多模态张量数据集，并能适应任意边缘分布的数据。相较于现有工作，本方法的额外优势在于其超参数具有易于解释的特性。

相关内容

关注 2

《图形模型》是国际公认的高评价的顶级期刊，专注于图形模型的创建、几何处理、动画和可视化，以及它们在工程、科学、文化和娱乐方面的应用。GMOD为其读者提供了经过彻底审查和精心挑选的论文，这些论文传播令人兴奋的创新，传授严谨的理论基础，提出健壮和有效的解决方案，或描述各种主题中的雄心勃勃的系统或应用程序。官网地址：http://dblp.uni-trier.de/db/journals/cvgip/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日