Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define hypergraph motifs (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the characteristic profile (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs' occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. Then, we demonstrate that CPs capture local structural patterns unique to each domain, and thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose MoCHy, a family of parallel algorithms for counting h-motifs' occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version MoCHy-A+ is more accurate and faster than the basic approximate and exact versions, respectively. Furthermore, we explore ternary hypergraph motifs that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.
翻译:超图自然地表示群体交互关系,这种关系在许多领域中无处不在:研究人员的合作、物品的共同购买以及蛋白质的联合相互作用等。本文提出以下研究问题的分析工具:(Q1)现实超图的结构设计原则是什么?(Q2)如何比较不同规模超图的局部结构?(Q3)如何识别超图所属的领域?我们首先定义超图模式(h-motifs),描述三个相连超边的重叠模式;随后定义每个模式在超图中的显著性,即其在随机化超图中的相对出现频率;最后定义特征轮廓(CP)为所有模式归一化显著性的向量。针对Q1,我们发现来自5个领域的11个现实超图中h-motifs的出现频率与随机化超图存在显著差异。进一步实验表明,CP能够捕获各领域特有的局部结构模式,因此通过比较不同超图的CP可解决Q2和Q3问题。我们将CP概念扩展为以向量形式表征每个节点或超边的连接模式,该表示在节点分类和超边预测任务中展现出实用价值。算法层面,我们提出MoCHy系列并行算法用于统计超图中h-motifs的出现次数,从理论上分析其速度与精度,并通过实证表明先进近似版本MoCHy-A+的准确率高于基础近似版本,运行速度快于精确版本。此外,我们探索三元超图模式,通过考虑超边间交集的基数(不仅包括存在性)扩展h-motifs,该扩展对所有前述应用均具有改进效果。