Summarizing web graphs is challenging due to the heterogeneity of the modeled information and its changes over time. We investigate the use of neural networks for lifelong graph summarization. Assuming we observe the web graph at a certain time, we train the networks to summarize graph vertices. We apply this trained network to summarize the vertices of the changed graph at the next point in time. Subsequently, we continue training and evaluating the network to perform lifelong graph summarization. We use the GNNs Graph-MLP and GraphSAINT, as well as an MLP baseline, to summarize the temporal graphs. We compare $1$-hop and $2$-hop summaries. We investigate the impact of reusing parameters from a previous snapshot by measuring the backward and forward transfer and the forgetting rate of the neural networks. Our extensive experiments on ten weekly snapshots of a web graph with over $100$M edges, sampled in 2012 and 2022, show that all networks predominantly use $1$-hop information to determine the summary, even when performing $2$-hop summarization. Due to the heterogeneity of web graphs, in some snapshots, the $2$-hop summary produces over ten times more vertex summaries than the $1$-hop summary. When using the network trained on the last snapshot from 2012 and applying it to the first snapshot of 2022, we observe a strong drop in accuracy. We attribute this drop over the ten-year time warp to the strongly increased heterogeneity of the web graph in 2022.
翻译:由于建模信息的异构性及其随时间的变化,对网络图进行摘要具有挑战性。我们研究了神经网络在终身图摘要中的应用。假设我们在某个时间点观察到网络图,我们训练神经网络来对图顶点进行摘要。随后,我们将此训练好的网络应用于对下一个时间点变化后的图顶点进行摘要。接着,我们持续训练和评估该网络,以执行终身图摘要任务。我们使用图神经网络 Graph-MLP 和 GraphSAINT,以及一个 MLP 基线模型,来对时序图进行摘要。我们比较了 $1$-跳和 $2$-跳摘要。通过测量神经网络的向后和向前传递以及遗忘率,我们研究了重用先前快照参数的影响。我们在十个周度快照(采样于2012年和2022年,包含超过 $100$M 条边)的网络图上进行了大量实验。结果表明,所有网络主要利用 $1$-跳信息来确定摘要,即使在执行 $2$-跳摘要时也是如此。由于网络图的异构性,在某些快照中,$2$-跳摘要产生的顶点摘要数量是 $1$-跳摘要的十倍以上。当使用在2012年最后一个快照上训练的网络并将其应用于2022年的第一个快照时,我们观察到准确率大幅下降。我们将这跨越十年的准确率下降归因于2022年网络图异构性的显著增强。