Hypergraphs are useful mathematical representations of overlapping and nested subsets of interacting units, including groups of genes or brain regions, economic cartels, political or military coalitions, and groups of products that are purchased together. Despite the vast range of applications, the statistical analysis of hypergraphs is challenging: There are many hyperedges of small and large sizes, and hyperedges can overlap or be nested. Existing approaches to hypergraphs are either not scalable or achieve scalability at the expense of model realism. We develop a statistical framework that enables scalable estimation, simulation, and model assessment of hypergraph models, which is supported by non-asymptotic and asymptotic theoretical guarantees. First, we introduce a novel model of hypergraphs capturing core-periphery structure in addition to proximity, by embedding units in an unobserved hyperbolic space. Second, we achieve scalability by developing manifold optimization algorithms for learning hyperbolic space models based on samples from a population hypergraph. Third, we provide non-asymptotic and asymptotic theoretical guarantees for learning hyperbolic space models based on samples from a population hypergraph. We use the proposed statistical framework to detect core-periphery structure along with proximity among U.S.\ politicians based on historical media reports.
翻译:超图是描述交互单元重叠嵌套子集的有用数学表示,涵盖基因或脑区群组、经济卡特尔、政治或军事联盟以及共同购买的产品群组等场景。尽管应用范围广泛,超图的统计分析仍面临挑战:存在大量不同规模(小型与大型)的超边,且超边可能相互重叠或嵌套。现有超图处理方法要么缺乏可扩展性,要么以牺牲模型真实性为代价实现可扩展性。我们开发了一个统计框架,能够实现超图模型的可扩展估计、仿真与模型评估,并得到非渐近与渐近理论保证的支持。首先,我们提出了一种新颖的超图模型,通过在未观测的双曲空间中嵌入单元,同时捕捉邻近性与核心-边缘结构。其次,我们开发了基于流形优化的算法,通过从总体超图中采样学习双曲空间模型,从而实现可扩展性。第三,我们为基于总体超图样本学习双曲空间模型提供了非渐近与渐近理论保证。我们运用所提出的统计框架,基于历史媒体报道数据,检测了美国政治人物间的核心-边缘结构与邻近性关系。