Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and a weight $\lambda$, we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by $\lambda$, is maximized. We prove that this problem is NP-hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in $\mathcal{O}(n^2k^2 + m \log n + k^3 n)$ time per single iteration, and a greedy algorithm which runs in $\mathcal{O}(n^2k^2 + m \log n + k^3 n)$ time, where $k$ is the length of the graph sequence and $n$ and $m$ denote number of nodes and total number of edges respectively. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide interpretable results from real-world datasets. Finally, we present a case study that shows the usefulness of our problem.
翻译:寻找稠密子图是图挖掘中的核心问题,在多个领域具有广泛应用。与此同时,许多现实网络随时间动态变化,即数据集可表示为图快照序列。因此,自然要考虑在时序网络中寻找允许随时间适度变化的稠密子图问题。本文旨在搜索具有较大成对Jaccard相似系数的稠密子图。更形式化地,给定一组图快照和权重λ,我们寻找一个稠密子图集合,使得诱导子图的密度之和与经λ加权的Jaccard指数之和最大化。我们证明该问题是NP难的。为发现具有良好目标值的稠密子图,我们提出一种迭代算法(单次迭代时间复杂度为$\mathcal{O}(n^2k^2 + m \log n + k^3 n)$)和一种贪心算法(时间复杂度为$\mathcal{O}(n^2k^2 + m \log n + k^3 n)$),其中k为图序列长度,n和m分别表示节点数和总边数。实验表明,我们的算法高效且能发现合成数据集中的真实结构,并从现实数据集中提供可解释结果。最后,通过案例研究验证了问题的实用性。