A simple statistic for determining the dimensionality of complex networks

Detecting the dimensionality of graphs is a central topic in machine learning. While the problem has been tackled empirically as well as theoretically, existing methods have several drawbacks. On the one hand, empirical tools are computationally heavy and lack theoretical foundation. On the other hand, theoretical approaches do not apply to graphs with heterogeneous degree distributions, which is often the case for complex real-world networks. To address these drawbacks, we consider geometric inhomogeneous random graphs (GIRGs) as a random graph model, which captures a variety of properties observed in practice. These include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbours of a vertex are adjacent. In GIRGs, $n$ vertices are distributed on a $d$-dimensional torus and weights are assigned to the vertices according to a power-law distribution. Two vertices are then connected with a probability that depends on their distance and their weights. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions, when the latter is at most logarithmic in $n$. This gives a first theoretical explanation for the low dimensionality of real-world networks observed by Almagro et. al. [Nature '22]. Our second result is a linear-time algorithm for determining the dimensionality of a given GIRG. We prove that our algorithm returns the correct number of dimensions with high probability when the input is a GIRG. As a result, our algorithm bridges the gap between theory and practice, as it not only comes with a rigorous proof of correctness but also yields results comparable to that of prior empirical approaches, as indicated by our experiments on real-world instances.

翻译：检测图的维度是机器学习中的一个核心问题。尽管该问题已在经验及理论层面得到探讨，但现有方法仍存在若干缺陷。一方面，经验工具计算量大且缺乏理论基础；另一方面，理论方法不适用于异构度分布的图——而这正是复杂现实网络的常见特征。为解决这些缺陷，我们考虑几何非均匀随机图（GIRGs）作为随机图模型，该模型捕捉了实际观测到的多种性质，包括异构度分布和非零聚类系数（即顶点两个随机邻居彼此相邻的概率）。在GIRGs中，$n$个顶点分布在$d$维环面上，并根据幂律分布为顶点分配权重。两个顶点以依赖于其距离与权重的概率相连。我们的第一个结果表明，当维度最多为$n$的对数级时，GIRGs的聚类系数关于维度数呈反指数标度。这为Almagro等人 [Nature '22] 观测到的现实世界网络低维度现象提供了首个理论解释。我们的第二个结果是一种用于确定给定GIRG维度的线性时间算法。我们证明，当输入为GIRG时，该算法能以高概率返回正确的维度数。因此，我们的算法弥合了理论与实践之间的鸿沟——它不仅具备严谨的正确性证明，还能在实际实例的实验中获得与先前经验方法相媲美的结果。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日