This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.
翻译:本研究探讨了利用生成模型进行图生成的问题。具体而言,我们关注图社区增强问题,即基于给定图数据集估计的概率分布,生成具有新社区且超出原分布范围的未见图结构。图社区增强意味着生成的图包含新的社区结构。通过生成具有新社区的图,我们有机会发现图中潜在的重要结构,例如在购买者网络等社交网络中。在难以收集足够真实图数据的情况下,图社区增强也有助于提升数据挖掘模型的泛化能力。实际上,在现有图中生成新社区存在多种方法。理想的生成方式是在保持原始图结构特征的前提下,发现具有新社区的图结构,以确保生成图的真实性。为此,我们提出了一种称为图社区增强(GCA)的算法。GCA的核心思想包括:(i)将原始图中节点嵌入的潜在空间数据点拟合为高斯混合模型(GMM);(ii)基于最小描述长度(MDL)准则,在潜在空间的新聚类中添加数据点以生成新社区。我们通过合成数据集和真实数据集的实验,验证了GCA在生成具有新社区结构的图方面的有效性。