In the study of time-dependent (i.e., temporal) networks, researchers often examine the evolution of communities, which are sets of densely connected sets of nodes that are connected sparsely to other nodes. An increasingly prominent approach to studying community structure in temporal networks is statistical inference. In the present paper, we study the performance of a class of statistical-inference methods for community detection in temporal networks. We represent temporal networks as multilayer networks, with each layer encoding a time step, and we illustrate that statistical-inference models that generate community assignments via either a uniform distribution on community assignments or discrete-time Markov processes are biased against generating communities with large or small numbers of nodes. In particular, we demonstrate that statistical-inference methods that use such generative models tend to poorly identify community structure in networks with large or small communities. To rectify this issue, we introduce a novel statistical model that generates the community assignments of the nodes in given layer (i.e., at a given time) using all of the community assignments in the previous layer. We prove results that guarantee that our approach greatly mitigates the bias against large and small communities, so using our generative model is beneficial for studying community structure in networks with large or small communities. Our code is available at https://github.com/tfaust0196/TemporalCommunityComparison.
翻译:在时间依赖(即时序)网络的研究中,学者常关注社区的演化——社区即内部节点连接稠密而与其他节点连接稀疏的节点集合。统计推断方法正日益成为研究时序网络社区结构的主流途径。本文系统考察了一类用于时序网络社区检测的统计推断方法的性能。我们将时序网络表示为多层网络,其中每一层对应一个时间步,并证明通过均匀分布或离散时间马尔可夫过程生成社区分配的统计推断模型会系统性地偏差于生成节点数过多或过少的社区。具体而言,我们论证了采用此类生成模型的统计推断方法在识别包含大型或小型社区的网络结构时表现欠佳。为修正此问题,我们提出了一种新颖的统计模型,该模型在生成给定层(即特定时间点)的节点社区分配时,会综合考量前一层的全部社区分配信息。我们通过理论证明,该方法能显著缓解对大型与小型社区的生成偏差,因此采用我们的生成模型对于研究包含极端规模社区的网络结构具有重要价值。相关代码已发布于 https://github.com/tfaust0196/TemporalCommunityComparison。