A structural graph summary is a small graph representation that preserves structural information necessary for a given task. The summary is used instead of the original graph to complete the task faster. We introduce multi-view structural graph summaries and propose an algorithm for merging two summaries. We conduct a theoretical analysis of our algorithm. We run experiments on three datasets, contributing two new ones. The datasets are of different domains (web graph, source code, and news) and sizes; the interpretation of multi-view depends on the domain and are pay-level domains on the web, control vs.\@ data flow of the code, and news broadcasters. We experiment with three graph summary models: attribute collection, class collection, and their combination. We observe that merging two structural summaries has an upper bound of quadratic complexity; but under reasonable assumptions, it has linear-time worst-case complexity. The running time of merging has a strong linear correlation with the number of edges in the two summaries. Therefore, the experiments support the assumption that the upper bound of quadratic complexity is not tight and that linear complexity is possible. Furthermore, our experiments show that always merging the two smallest summaries by the number of edges is the most efficient strategy for merging multiple structural summaries.
翻译:结构图摘要是一种小型图表示,能够保留特定任务所需的结构信息。该摘要被用于替代原始图,以更快地完成任务。我们引入了多视图结构图摘要,并提出了一种合并两个摘要的算法。我们对算法进行了理论分析。我们在三个数据集上进行了实验,并贡献了两个新的数据集。这些数据集来自不同领域(网络图、源代码和新闻)且规模各异;多视图的解释取决于具体领域,在网络中对应付费级别域名,在代码中对应控制流与数据流,在新闻中对应新闻广播机构。我们实验了三种图摘要模型:属性集合、类别集合以及两者的组合。我们观察到,合并两个结构摘要的复杂度上界为二次;但在合理的假设下,其最坏情况复杂度为线性时间。合并的运行时间与两个摘要中的边数呈强线性相关。因此,实验支持了二次复杂度上界并非紧界且线性复杂度是可能的这一假设。此外,我们的实验表明,在合并多个结构摘要时,始终合并边数最少的两个摘要是最有效的策略。