This study addresses the issue of balancing graph summarization and graph change detection. Graph summarization compresses large-scale graphs into a smaller scale. However, the question remains: To what extent should the original graph be compressed? This problem is solved from the perspective of graph change detection, aiming to detect statistically significant changes using a stream of summary graphs. If the compression rate is extremely high, important changes can be ignored, whereas if the compression rate is extremely low, false alarms may increase with more memory. This implies that there is a trade-off between compression rate in graph summarization and accuracy in change detection. We propose a novel quantitative methodology to balance this trade-off to simultaneously realize reliable graph summarization and change detection. We introduce a probabilistic structure of hierarchical latent variable model into a graph, thereby designing a parameterized summary graph on the basis of the minimum description length principle. The parameter specifying the summary graph is then optimized so that the accuracy of change detection is guaranteed to suppress Type I error probability (probability of raising false alarms) to be less than a given confidence level. First, we provide a theoretical framework for connecting graph summarization with change detection. Then, we empirically demonstrate its effectiveness on synthetic and real datasets.
翻译:本研究探讨了图摘要与图变化检测之间的平衡问题。图摘要将大规模图压缩至较小规模,但核心问题在于:原始图应被压缩到何种程度?我们从图变化检测的角度解决该问题,旨在通过摘要图流检测具有统计显著性的变化。若压缩率极高,重要变化可能被忽略;若压缩率极低,则可能因内存占用增加而导致误报率上升。这表明图摘要中的压缩率与变化检测的准确性之间存在权衡关系。我们提出一种新型量化方法来平衡此权衡,以实现可靠的图摘要与变化检测同步操作。通过在图中引入分层潜变量模型的概率结构,我们基于最小描述长度原理设计了参数化摘要图。随后优化该摘要图的参数,使得变化检测的准确性得以保证——将第一类错误概率(误报概率)控制在给定置信水平以下。首先,我们建立了连接图摘要与变化检测的理论框架,然后在合成数据集与真实数据集上实验验证了其有效性。