We study the behavior of a label propagation algorithm (LPA) on the Erd\H{o}s-R\'enyi random graph $\mathcal{G}(n,p)$. Initially, given a network, each vertex starts with a random label in the interval $[0,1]$. Then, in each round of LPA, every vertex switches its label to the majority label in its neighborhood (including its own label). At the first round, ties are broken towards smaller labels, while at each of the next rounds, ties are broken uniformly at random. The algorithm terminates once all labels stay the same in two consecutive iterations. LPA is successfully used in practice for detecting communities in networks (corresponding to vertex sets with the same label after termination of the algorithm). Perhaps surprisingly, LPA's performance on dense random graphs is hard to analyze, and so far convergence to consensus was known only when $np\ge n^{3/4+\varepsilon}$, where LPA converges in three rounds. By defining an alternative label attribution procedure which converges to the label propagation algorithm after three rounds, a careful multi-stage exposure of the edges allows us to break the $n^{3/4+\varepsilon}$ barrier and show that, when $np \ge n^{5/8+\varepsilon}$, a.a.s.\ the algorithm terminates with a single label. Moreover, we show that, if $np\gg n^{2/3}$, a.a.s.\ this label is the smallest one, whereas if $n^{5/8+\varepsilon}\le np\ll n^{2/3}$, the surviving label is a.a.s.\ not the smallest one. En passant, we show a presumably new monotonicity lemma for Binomial random variables that might be of independent interest.
翻译:我们研究了标签传播算法(LPA)在Erdős-Rényi随机图$\mathcal{G}(n,p)$上的行为。初始时,给定一个网络,每个顶点以区间$[0,1]$内的随机标签开始。随后,在LPA的每一轮中,每个顶点将其标签切换为其邻域(包括其自身标签)中的多数标签。在第一轮中,平局时倾向于较小标签;而在后续每一轮中,平局时随机均匀地打破僵局。当所有标签在连续两次迭代中保持不变时,算法终止。LPA在实践中被成功用于检测网络中的社区(对应于算法终止后具有相同标签的顶点集合)。可能令人惊讶的是,LPA在稠密随机图上的性能难以分析,迄今为止,仅当$np\ge n^{3/4+\varepsilon}$时已知其收敛到共识,此时LPA在三轮内收敛。通过定义一个在三轮后收敛到标签传播算法的替代标签分配过程,并采用谨慎的多阶段边暴露方法,我们突破了$n^{3/4+\varepsilon}$的障碍,并证明当$np \ge n^{5/8+\varepsilon}$时,算法以高概率终止于单一标签。此外,我们证明,如果$np\gg n^{2/3}$,则该标签以高概率为最小标签;而如果$n^{5/8+\varepsilon}\le np\ll n^{2/3}$,则存活的标签以高概率不是最小标签。顺便地,我们展示了一个可能具有独立兴趣的二项随机变量的单调性引理,这可能是新的发现。