Random walk-based node embedding algorithms have attracted a lot of attention due to their scalability and ease of implementation. Previous research has focused on different walk strategies, optimization objectives, and embedding learning models. Inspired by observations on real data, we take a different approach and propose a new regularization technique. More precisely, the frequencies of node pairs generated by the skip-gram model on random walk node sequences follow a highly skewed distribution which causes learning to be dominated by a fraction of the pairs. We address the issue by designing an efficient sampling procedure that generates node pairs according to their {\em smoothed frequency}. Theoretical and experimental results demonstrate the advantages of our approach.
翻译:基于随机游走的节点嵌入算法因其可扩展性和易于实现而备受关注。先前的研究主要集中于不同的游走策略、优化目标和嵌入学习模型。受真实数据观察的启发,我们采用了一种不同的方法,提出了一种新的正则化技术。更准确地说,通过skip-gram模型在随机游走节点序列上生成的节点对频率遵循高度偏斜的分布,这导致学习过程被一小部分节点对所主导。我们通过设计一种高效的采样程序来解决这个问题,该程序根据节点对的{\em 平滑频率}来生成节点对。理论和实验结果均证明了我们方法的优势。