Community Recovery in the Geometric Block Model

from arxiv, 60 pages, 18 figures. Accepted at the Journal of Machine Learning Research (JMLR). Shorter versions accepted in AAAI 2018 (see arXiv:1709.05510) and RANDOM 2019 (see arXiv:1804.05013)

To capture inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a \textit{Geometric Block Model}. The geometric block model builds on the \emph{random geometric graphs} (Gilbert, 1961), one of the basic models of random graphs for spatial networks, in the same way that the well-studied stochastic block model builds on the Erd\H{o}s-R\'{en}yi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancements in community detection. To analyze the geometric block model, we first provide new connectivity results for \emph{random annulus graphs} which are generalizations of random geometric graphs. The connectivity properties of geometric graphs have been studied since their introduction, and analyzing them has been difficult due to correlated edge formation. We then use the connectivity results of random annulus graphs to provide necessary and sufficient conditions for efficient recovery of communities for the geometric block model. We show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. For this we consider two regimes of graph density. In the regime where the average degree of the graph grows logarithmically with number of vertices, we show that our algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model in the logarithmic degree regime. We also look at the regime where the average degree of the graph grows linearly with the number of vertices $n$, and hence to store the graph one needs $\Theta(n^2)$ memory. We show that our algorithm needs to store only $O(n \log n)$ edges in this regime to recover the latent communities.

翻译：为捕捉许多社区检测问题中固有的几何特征，我们提出一种新的随机图模型——几何块模型。该模型建立在随机几何图（Gilbert, 1961）这一空间网络基本随机图模型之上，类似于经典随机块模型对埃尔德什-雷尼随机图的扩展。它也是受近期社区检测理论与实践进展启发的随机社区模型的自然延伸。为分析几何块模型，我们首先给出随机环形图（随机几何图的推广）的新连通性结果。几何图的连通性质自其提出以来一直被研究，但由于边形成具有相关性，对其进行分析十分困难。随后，我们利用随机环形图的连通性结果，给出几何块模型中社区高效恢复的充要条件。我们证明，一种简单的三角形计数算法在几何块模型中检测社区时近乎最优。为此，我们考虑两种图密度情形：当图平均度数随顶点数对数增长时，我们的算法在理论和实践上均表现极佳；相比之下，在相同对数度数条件下，三角形计数算法对随机块模型远非最优。此外，我们研究图平均度数随顶点数$n$线性增长的情形（此时存储图需$\Theta(n^2)$内存），并证明在该情形下，我们的算法只需存储$O(n \log n)$条边即可恢复潜在社区。