Detecting the dimensionality of graphs is a central topic in machine learning. While the problem has been tackled empirically as well as theoretically, existing methods have several drawbacks. On the one hand, empirical tools are computationally heavy and lack theoretical foundation. On the other hand, theoretical approaches do not apply to graphs with heterogeneous degree distributions, which is often the case for complex real-world networks. To address these drawbacks, we consider geometric inhomogeneous random graphs (GIRGs) as a random graph model, which captures a variety of properties observed in practice. These include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbours of a vertex are adjacent. In GIRGs, $n$ vertices are distributed on a $d$-dimensional torus and weights are assigned to the vertices according to a power-law distribution. Two vertices are then connected with a probability that depends on their distance and their weights. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions, when the latter is at most logarithmic in $n$. This gives a first theoretical explanation for the low dimensionality of real-world networks observed by Almagro et. al. [Nature '22]. Our second result is a linear-time algorithm for determining the dimensionality of a given GIRG. We prove that our algorithm returns the correct number of dimensions with high probability when the input is a GIRG. As a result, our algorithm bridges the gap between theory and practice, as it not only comes with a rigorous proof of correctness but also yields results comparable to that of prior empirical approaches, as indicated by our experiments on real-world instances.
翻译:检测图的维度是机器学习中的一个核心问题。尽管该问题已在经验及理论层面得到探讨,但现有方法仍存在若干缺陷。一方面,经验工具计算量大且缺乏理论基础;另一方面,理论方法不适用于异构度分布的图——而这正是复杂现实网络的常见特征。为解决这些缺陷,我们考虑几何非均匀随机图(GIRGs)作为随机图模型,该模型捕捉了实际观测到的多种性质,包括异构度分布和非零聚类系数(即顶点两个随机邻居彼此相邻的概率)。在GIRGs中,$n$个顶点分布在$d$维环面上,并根据幂律分布为顶点分配权重。两个顶点以依赖于其距离与权重的概率相连。我们的第一个结果表明,当维度最多为$n$的对数级时,GIRGs的聚类系数关于维度数呈反指数标度。这为Almagro等人 [Nature '22] 观测到的现实世界网络低维度现象提供了首个理论解释。我们的第二个结果是一种用于确定给定GIRG维度的线性时间算法。我们证明,当输入为GIRG时,该算法能以高概率返回正确的维度数。因此,我们的算法弥合了理论与实践之间的鸿沟——它不仅具备严谨的正确性证明,还能在实际实例的实验中获得与先前经验方法相媲美的结果。