Community Recovery in the Geometric Block Model

from arxiv, 53 pages, 18 figures. Accepted at the Journal of Machine Learning Research (JMLR). Shorter versions accepted in AAAI 2018 (see arXiv:1709.05510) and RANDOM 2019 (see arXiv:1804.05013). arXiv admin note: text overlap with arXiv:1804.05013

To capture the inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a Geometric Block Model. The geometric block model builds on the random geometric graphs (Gilbert, 1961), one of the basic models of random graphs for spatial networks, in the same way that the well-studied stochastic block model builds on the Erd\H{o}s-R\'{en}yi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancements in community detection. To analyze the geometric block model, we first provide new connectivity results for random annulus graphs which are generalizations of random geometric graphs. The connectivity properties of geometric graphs have been studied since their introduction, and analyzing them has been more difficult than their Erd\H{o}s-R\'{en}yi counterparts due to correlated edge formation. We then use the connectivity results of random annulus graphs to provide necessary and sufficient conditions for efficient recovery of communities for the geometric block model. We show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. For this we consider the following two regimes of graph density. In the regime where the average degree of the graph grows logarithmically with the number of vertices, we show that our algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model in the logarithmic degree regime. We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm.

翻译：为捕捉许多社区检测问题中固有的几何特征，我们提出了一种名为几何块模型的新随机图模型。该模型构建于随机几何图（Gilbert, 1961）——空间网络的基本随机图模型之一——之上，正如被广泛研究的随机块模型建立在Erdős–Rényi随机图上一样。同时，它也是受近期社区检测理论与实际进展启发的随机社区模型的自然延伸。为分析几何块模型，我们首先提供了随机环状图（随机几何图的推广）的新连通性结果。自几何图提出以来，其连通性性质便受到研究，但由于边形成的相关性，其分析比Erdős–Rényi图更为困难。随后，我们利用随机环状图的连通性结果，给出了在几何块模型中高效恢复社区的充要条件。我们证明，一种简单的三角形计数算法在几何块模型中进行社区检测时接近最优。为此，我们考虑图密度的两种情形。在图的平均度数随顶点数对数增长的情形中，该算法在理论和实践中均表现出色。相比之下，在对数度数情形下，三角形计数算法对随机块模型远非最优。我们在真实与合成数据集上模拟结果，展示了新模型和算法均具有优越性能。