Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding) -- a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.
翻译:异构信息网络(HIN)已广泛应用于推荐系统、社交网络和引文网络等实际任务。现有HIN表示学习方法虽能有效学习网络中的语义和结构特征,但鲜有关注单一HIN内子图的分布差异问题。然而我们发现,忽略多源子图间的分布差异会阻碍图嵌入学习算法的有效性。这促使我们提出SUMSHINE(可扩展无监督多源异构信息网络嵌入)——一种可扩展的无监督框架,用于对齐HIN多源间的嵌入分布。在多个下游任务中使用真实数据集的实验结果表明,与当前最先进的异构信息网络嵌入算法相比,我们的方法具有更优性能。