Finding densely connected groups of nodes in networks is a widely used tool for analysis in graph mining. A popular choice for finding such groups is to find subgraphs with a high average degree. While useful, interpreting such subgraphs may be difficult. On the other hand, many real-world networks have additional information, and we are specifically interested in networks with labels on edges. In this paper, we study finding sets of labels that induce dense subgraphs. We consider two notions of density: average degree and the number of edges minus the number of nodes weighted by a parameter $\alpha$. There are many ways to induce a subgraph from a set of labels, and we study two cases: First, we study conjunctive-induced dense subgraphs, where the subgraph edges need to have all labels. Secondly, we study disjunctive-induced dense subgraphs, where the subgraph edges need to have at least one label. We show that both problems are NP-hard. Because of the hardness, we resort to greedy heuristics. We show that we can implement the greedy search efficiently: the respective running times for finding conjunctive-induced and disjunctive-induced dense subgraphs are in $O(p \log k)$ and $O(p \log^2 k)$, where $p$ is the number of edge-label pairs and $k$ is the number of labels. Our experimental evaluation demonstrates that we can find the ground truth in synthetic graphs and that we can find interpretable subgraphs from real-world networks.
翻译:在网络中寻找节点之间紧密连接的群组是图挖掘中广泛使用的分析工具。一个常用的方法是寻找具有高度平均度的子图。然而,尽管这种方法有效,但解释这类子图可能较为困难。另一方面,许多现实世界的网络包含额外信息,我们特别关注带有边标签的网络。本文研究如何寻找能够诱导稠密子图的标签集。我们考虑两种密度概念:平均度和边数减去由参数$\alpha$加权的节点数。从标签集合诱导子图有多种方式,我们研究两种情形:首先,研究合取诱导稠密子图,其中子图的边需包含所有标签;其次,研究析取诱导稠密子图,其中子图的边需至少包含一个标签。我们证明这两个问题均为NP难问题。鉴于问题的难度,我们采用贪心启发式算法。我们展示了如何高效实现贪心搜索:寻找合取诱导和析取诱导稠密子图的运行时间分别为$O(p \log k)$和$O(p \log^2 k)$,其中$p$是边-标签对的数量,$k$是标签数量。实验评估表明,我们能在合成图中找到真实解,并从现实网络中发现可解释的子图。