We study the implications of the modeling choice to use a graph, instead of a hypergraph, to represent real-world interconnected systems whose constituent relationships are of higher order by nature. Such a modeling choice typically involves an underlying projection process that maps the original hypergraph onto a graph, and is common in graph-based analysis. While hypergraph projection can potentially lead to loss of higher-order relations, there exists very limited studies on the consequences of doing so, as well as its remediation. This work fills this gap by doing two things: (1) we develop analysis based on graph and set theory, showing two ubiquitous patterns of hyperedges that are root to structural information loss in all hypergraph projections; we also quantify the combinatorial impossibility of recovering the lost higher-order structures if no extra help is provided; (2) we still seek to recover the lost higher-order structures in hypergraph projection, and in light of (1)'s findings we propose to relax the problem into a learning-based setting. Under this setting, we develop a learning-based hypergraph reconstruction method based on an important statistic of hyperedge distributions that we find. Our reconstruction method is evaluated on 8 real-world datasets under different settings, and exhibits consistently good performance. We also demonstrate benefits of the reconstructed hypergraphs via use cases of protein rankings and link predictions.
翻译:我们研究了使用图而非超图来表示现实世界中其组成关系本质上具有高阶性的互联系统时,这一建模选择所蕴含的影响。此类建模选择通常涉及一个将原始超图映射到图的投影过程,这在基于图的分析中十分常见。尽管超图投影可能导致高阶关系的丢失,但关于其后果以及修正方法的研究非常有限。本文通过以下两方面填补了这一空白:(1)我们基于图论和集合论展开分析,揭示了两种在所有超图投影中导致结构信息丢失的普遍超边模式;同时,我们量化了在没有额外辅助的情况下恢复丢失的高阶结构在组合意义上的不可能性;(2)尽管如此,我们仍寻求恢复超图投影中丢失的高阶结构,并根据(1)的发现,提出将问题松弛至基于学习的设置。在此设置下,我们基于所发现的超边分布的一项重要统计量,开发了一种基于学习的超图重构方法。我们的重构方法在8个真实世界数据集上、不同设置下进行了评估,并表现出持续良好的性能。我们还通过蛋白质排序和链接预测的用例,展示了重构超图的益处。