Graph Representation Learning (GRL) has become central for characterizing structures of complex networks and performing tasks such as link prediction, node classification, network reconstruction, and community detection. Whereas numerous generative GRL models have been proposed, many approaches have prohibitive computational requirements hampering large-scale network analysis, fewer are able to explicitly account for structure emerging at multiple scales, and only a few explicitly respect important network properties such as homophily and transitivity. This paper proposes a novel scalable graph representation learning method named the Hierarchical Block Distance Model (HBDM). The HBDM imposes a multiscale block structure akin to stochastic block modeling (SBM) and accounts for homophily and transitivity by accurately approximating the latent distance model (LDM) throughout the inferred hierarchy. The HBDM naturally accommodates unipartite, directed, and bipartite networks whereas the hierarchy is designed to ensure linearithmic time and space complexity enabling the analysis of very large-scale networks. We evaluate the performance of the HBDM on massive networks consisting of millions of nodes. Importantly, we find that the proposed HBDM framework significantly outperforms recent scalable approaches in all considered downstream tasks. Surprisingly, we observe superior performance even imposing ultra-low two-dimensional embeddings facilitating accurate direct and hierarchical-aware network visualization and interpretation.
翻译:图表示学习已成为刻画复杂网络结构、执行链接预测、节点分类、网络重构和社区检测等任务的核心技术。尽管已有大量生成式图表示学习模型被提出,但许多方法计算开销过高,阻碍了大规模网络分析;少数模型能显式解释多尺度涌现结构;而只有极少数模型能尊重同质性和传递性等重要网络属性。本文提出一种名为层次块距离模型(HBDM)的新型可扩展图表示学习方法。HBDM引入类似随机块模型的多尺度块结构,并通过在整个推断层次结构中精确逼近潜在距离模型来体现同质性和传递性。该模型天然支持单部、有向和二部网络,其层次设计确保了线性对数时间和空间复杂度,从而能够分析超大规模网络。我们在包含数百万节点的超大规模网络上评估了HBDM的性能。重要的是,我们发现所提出的HBDM框架在所有下游任务中均显著优于近期可扩展方法。令人惊讶的是,即使在采用超低二维嵌入的情况下,我们仍观察到其优越性能,这为准确的直接感知与层次感知网络可视化及解释提供了便利。