The analysis of large-scale time-series network data, such as social media and email communications, remains a significant challenge for graph analysis methodology. In particular, the scalability of graph analysis is a critical issue hindering further progress in large-scale downstream inference. In this paper, we introduce a novel approach called "temporal encoder embedding" that can efficiently embed large amounts of graph data with linear complexity. We apply this method to an anonymized time-series communication network from a large organization spanning 2019-2020, consisting of over 100 thousand vertices and 80 million edges. Our method embeds the data within 10 seconds on a standard computer and enables the detection of communication pattern shifts for individual vertices, vertex communities, and the overall graph structure. Through supporting theory and synthesis studies, we demonstrate the theoretical soundness of our approach under random graph models and its numerical effectiveness through simulation studies.
翻译:大规模时序网络数据(如社交媒体和电子邮件通信)的分析对图分析方法论构成了重大挑战。特别是,图分析的可扩展性是阻碍大规模下游推理进一步发展的关键问题。本文提出一种名为"时序编码器嵌入"的新方法,能以线性复杂度高效嵌入大量图数据。我们将该方法应用于某大型组织2019-2020年间匿名化的时序通信网络,该网络包含超过10万个顶点和8000万条边。该方法可在标准计算机上10秒内完成数据嵌入,并支持检测单个顶点、顶点群落以及整体图结构的通信模式转变。通过支撑理论和综合研究,我们证明了该方法在随机图模型下的理论合理性,并通过模拟研究验证了其数值有效性。