This study addresses the issue of balancing graph summarization and graph change detection. Graph summarization compresses large-scale graphs into a smaller scale. However, the question remains: To what extent should the original graph be compressed? This problem is solved from the perspective of graph change detection, aiming to detect statistically significant changes using a stream of summary graphs. If the compression rate is extremely high, important changes can be ignored, whereas if the compression rate is extremely low, false alarms may increase with more memory. This implies that there is a trade-off between compression rate in graph summarization and accuracy in change detection. We propose a novel quantitative methodology to balance this trade-off to simultaneously realize reliable graph summarization and change detection. We introduce a probabilistic structure of hierarchical latent variable model into a graph, thereby designing a parameterized summary graph on the basis of the minimum description length principle. The parameter specifying the summary graph is then optimized so that the accuracy of change detection is guaranteed to suppress Type I error probability (probability of raising false alarms) to be less than a given confidence level. First, we provide a theoretical framework for connecting graph summarization with change detection. Then, we empirically demonstrate its effectiveness on synthetic and real datasets.
翻译:本研究探讨了图摘要与图变化检测之间的平衡问题。图摘要将大规模图压缩至较小规模,但核心问题在于:原始图应被压缩到何种程度?本文从图变化检测的视角解决该问题,旨在利用摘要图流检测具有统计显著性的变化。若压缩率过高,重要变化可能被忽略;若压缩率过低,则可能因占用更多内存而增加误报。这表明图摘要中的压缩率与变化检测的准确性之间存在权衡。我们提出了一种新颖的定量方法以平衡这一权衡,从而实现可靠的图摘要与变化检测。通过在图中引入层次潜在变量模型的概率结构,我们基于最小描述长度原则设计了参数化的摘要图。随后优化摘要图的参数,使得变化检测的准确性得以保证,将第一类错误概率(误报概率)控制在给定置信水平以下。首先,我们提供了连接图摘要与变化检测的理论框架,随后在合成数据集与真实数据集上通过实验验证了其有效性。