基于特征值间隙比的网络数据社区数量确定检验 (An Eigengap Ratio Test for Determining the Number of Communities in Network Data)

To characterize the community structure in network data, researchers have introduced various block-type models, including the stochastic block model, degree-corrected stochastic block model, mixed membership block model, degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to our knowledge, existing methods for estimating the number of network communities often require model estimations or are unable to simultaneously account for network sparsity and a divergent number of communities. In this paper, we propose an eigengap-ratio based test that address these challenges. The test is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. We show that the proposed test statistic converges to a function of the type-I Tracy-Widom distributions under the null hypothesis, and that the test is asymptotically powerful under alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.

翻译：为刻画网络数据中的社区结构，研究者提出了多种区块型模型，包括随机区块模型、度校正随机区块模型、混合隶属度区块模型、度校正混合隶属度区块模型等。有效应用这些模型的关键步骤在于确定网络中的社区数量。然而，据我们所知，现有估计网络社区数量的方法通常需要进行模型估计，或无法同时处理网络稀疏性与社区数量发散性的问题。本文提出一种基于特征值间隙比的检验方法以应对这些挑战。该检验计算简便，无需参数调优，且可广泛应用于各类区块模型而无需估计网络分布参数。此外，该方法对具有发散社区数量的稠密网络与稀疏网络均具有良好效果。我们证明，在原假设下所提检验统计量收敛于I型Tracy-Widom分布的泛函，且在备择假设下检验具有渐近有效性。针对稠密与稀疏网络的仿真研究验证了所提方法的有效性。本文通过三个实际案例展示了所提检验的实用价值。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日