Community detection is a popular approach to understand the organization of interactions in static networks. For that purpose, the Clique Percolation Method (CPM), which involves the percolation of k-cliques, is a well-studied technique that offers several advantages. Besides, studying interactions that occur over time is useful in various contexts, which can be modeled by the link stream formalism. The Dynamic Clique Percolation Method (DCPM) has been proposed for extending CPM to temporal networks. However, existing implementations are unable to handle massive datasets. We present a novel algorithm that adapts CPM to link streams, which has the advantage that it allows us to speed up the computation time with respect to the existing DCPM method. We evaluate it experimentally on real datasets and show that it scales to massive link streams. For example, it allows to obtain a complete set of communities in under twenty-five minutes for a dataset with thirty million links, what the state of the art fails to achieve even after a week of computation. We further show that our method provides communities similar to DCPM, but slightly more aggregated. We exhibit the relevance of the obtained communities in real world cases, and show that they provide information on the importance of vertices in the link streams.
翻译:社区发现是理解静态网络中交互组织结构的常用方法。为此,基于k-团渗透的团渗透方法(CPM)是一种成熟的、具有多项优势的技术。此外,对随时间发生的交互进行研究在多种场景中具有重要意义,这可以通过流数据形式进行建模。动态团渗透方法(DCPM)已被提出用于将CPM扩展到时序网络。然而,现有实现无法处理大规模数据集。我们提出了一种新颖的算法,将CPM适配到流数据中,其优势在于能够相比现有DCPM方法显著提升计算速度。我们在真实数据集上对其进行了实验评估,证明该方法可扩展至大规模流数据。例如,该方法在不到二十五分钟内即可获得包含三千万条边的完整社区集,而现有技术即使经过一周计算也无法实现。我们进一步证明,该方法生成的社区与DCPM结果相似,但略为聚合。我们展示了所得社区在真实世界案例中的相关性,并表明它们提供了关于顶点在流数据中的重要性的信息。