Detecting the dimensionality of graphs is a central topic in machine learning. While the problem has been tackled empirically as well as theoretically, existing methods have several drawbacks. On the one hand, empirical tools are computationally heavy and lack theoretical foundation. On the other hand, theoretical approaches do not apply to graphs with heterogeneous degree distributions, which is often the case for complex real-world networks. To address these drawbacks, we consider geometric inhomogeneous random graphs (GIRGs) as a random graph model, which captures a variety of properties observed in practice. These include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbours of a vertex are adjacent. In GIRGs, $n$ vertices are distributed on a $d$-dimensional torus and weights are assigned to the vertices according to a power-law distribution. Two vertices are then connected with a probability that depends on their distance and their weights. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions, when the latter is at most logarithmic in $n$. This gives a first theoretical explanation for the low dimensionality of real-world networks observed by Almagro et. al. [Nature '22]. Our second result is a linear-time algorithm for determining the dimensionality of a given GIRG. We prove that our algorithm returns the correct number of dimensions with high probability when the input is a GIRG. As a result, our algorithm bridges the gap between theory and practice, as it not only comes with a rigorous proof of correctness but also yields results comparable to that of prior empirical approaches, as indicated by our experiments on real-world instances.
翻译:检测图的维度是机器学习中的核心课题。尽管这一问题已通过经验及理论方法得到处理,但现有方法存在若干缺陷。一方面,经验工具计算成本高且缺乏理论基础;另一方面,理论方法不适用于具有异质性度分布的图,而这正是复杂现实网络的常见情况。为解决这些缺陷,我们考虑几何非齐次随机图(GIRGs)作为随机图模型,该模型捕捉了实践中观察到的多种性质,包括异质性度分布和非零聚类系数——即一个顶点的两个随机邻居相邻的概率。在GIRGs中,n个顶点分布在d维环面上,并根据幂律分布为顶点赋予权重。两个顶点以取决于其距离和权重的概率相连。我们的第一个结果表明,当维数至多为n的对数时,GIRGs的聚类系数随维数呈指数衰减。这为Almagro等人[Nature '22]观察到的现实网络低维性提供了首个理论解释。我们的第二个结果是用于确定给定GIRG维度的线性时间算法。我们证明,当输入为GIRG时,该算法能以高概率返回正确维数。因此,我们的算法弥合了理论与实践的差距:它不仅具有严格的正确性证明,而且在现实实例上的实验表明,其结果可与先前的经验方法相媲美。