Scalable community detection in massive networks via predictive assignment

Massive network datasets are becoming increasingly common in scientific applications. Existing community detection methods encounter significant computational challenges for such massive networks due to two reasons. First, the full network needs to be stored and analyzed on a single server, leading to high memory costs. Second, existing methods typically use matrix factorization or iterative optimization using the full network, resulting in high runtimes. We propose a strategy called \textit{predictive assignment} to enable computationally efficient community detection while ensuring statistical accuracy. The core idea is to avoid large-scale matrix computations by breaking up the task into a smaller matrix computation plus a large number of vector computations that can be carried out in parallel. Under the proposed method, community detection is carried out on a small subgraph to estimate the relevant model parameters. Next, each remaining node is assigned to a community based on these estimates. We prove that predictive assignment achieves strong consistency under the stochastic blockmodel and its degree-corrected version. We also demonstrate the empirical performance of predictive assignment on simulated networks and two large real-world datasets: DBLP (Digital Bibliography \& Library Project), a computer science bibliographical database, and the Twitch Gamers Social Network.

翻译：大规模网络数据集在科学应用中正变得越来越普遍。现有的社区检测方法在处理此类大规模网络时面临显著的计算挑战，原因有二：首先，整个网络需要存储在单个服务器上进行分析，导致高昂的内存成本；其次，现有方法通常采用矩阵分解或基于全网络的迭代优化，造成较长的运行时间。本文提出一种称为"预测分配"的策略，在保证统计精度的同时实现计算高效的社区检测。其核心思想是通过将任务分解为小规模矩阵计算与大量可并行执行的向量计算，避免大规模矩阵运算。在该方法中，首先在小型子图上进行社区检测以估计相关模型参数，随后基于这些估计将剩余节点分配到相应社区。我们证明，在随机块模型及其度校正版本下，预测分配方法具有强一致性。通过模拟网络和两个大型真实数据集——计算机科学文献数据库DBLP（数字书目与图书馆项目）和Twitch游戏玩家社交网络，我们验证了预测分配方法的实证性能。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

（CVPR2021）基于结构保持的弱监督目标定位

专知会员服务

21+阅读 · 2021年5月1日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日