Data generation is a fundamental research problem in data management due to its diverse use cases, ranging from testing database engines to data-specific applications. However, real-world entities often involve complex interactions that cannot be effectively modeled by traditional tabular data. Therefore, graph data generation has attracted increasing attention recently. Although various graph generators have been proposed in the literature, there are three limitations: i) They cannot capture the co-evolution pattern of graph structure and node attributes. ii) Few of them consider edge direction, leading to substantial information loss. iii) Current state-of-the-art dynamic graph generators are based on the temporal random walk, making the simulation process time-consuming. To fill the research gap, we introduce VRDAG, a novel variational recurrent framework for efficient dynamic attributed graph generation. Specifically, we design a bidirectional message-passing mechanism to encode both directed structural knowledge and attribute information of a snapshot. Then, the temporal dependency in the graph sequence is captured by a recurrence state updater, generating embeddings that can preserve the evolution pattern of early graphs. Based on the hidden node embeddings, a conditional variational Bayesian method is developed to sample latent random variables at the neighboring timestep for new snapshot generation. The proposed generation paradigm avoids the time-consuming path sampling and merging process in existing random walk-based methods, significantly reducing the synthesis time. Finally, comprehensive experiments on real-world datasets are conducted to demonstrate the effectiveness and efficiency of the proposed model.
翻译:数据生成是数据管理领域的一项基础研究课题,其应用场景广泛,涵盖从数据库引擎测试到特定数据应用的多个方面。然而,现实世界中的实体通常涉及复杂的交互关系,这些关系无法通过传统的表格数据有效建模。因此,图数据生成近年来受到越来越多的关注。尽管已有多种图生成器被提出,但它们存在三个主要局限性:i) 无法捕捉图结构与节点属性之间的协同演化模式;ii) 大多未考虑边的方向性,导致信息大量丢失;iii) 当前最先进的动态图生成器基于时序随机游走,使得模拟过程耗时较长。为填补这一研究空白,我们提出了VRDAG——一种新颖的变分循环框架,用于高效动态属性图生成。具体而言,我们设计了一种双向消息传递机制,以编码有向图快照的结构知识与属性信息。随后,通过循环状态更新器捕获图序列中的时序依赖关系,生成能够保留早期图演化模式的嵌入表示。基于隐藏节点嵌入,我们开发了一种条件变分贝叶斯方法,用于在相邻时间步采样潜在随机变量以生成新的快照。所提出的生成范式避免了现有基于随机游走方法中耗时的路径采样与合并过程,显著缩短了合成时间。最后,我们在真实数据集上进行了全面实验,验证了所提模型的有效性与高效性。