Convergence Guarantees for the DeepWalk Embedding on Block Models

Graph embeddings have emerged as a powerful tool for understanding the structure of graphs. Unlike classical spectral methods, recent methods such as DeepWalk, Node2Vec, etc. are based on solving nonlinear optimization problems on the graph, using local information obtained by performing random walks. These techniques have empirically been shown to produce ''better'' embeddings than their classical counterparts. However, due to their reliance on solving a nonconvex optimization problem, obtaining theoretical guarantees on the properties of the solution has remained a challenge, even for simple classes of graphs. In this work, we show convergence properties for the DeepWalk algorithm on graphs obtained from the Stochastic Block Model (SBM). Despite being simplistic, the SBM has proved to be a classic model for analyzing the behavior of algorithms on large graphs. Our results mirror the existing ones for spectral embeddings on SBMs, showing that even in the case of one-dimensional embeddings, the output of the DeepWalk algorithm provably recovers the cluster structure with high probability.

翻译：图嵌入已成为理解图结构的强大工具。与经典谱方法不同，DeepWalk、Node2Vec等近期方法基于解决图上的非线性优化问题，利用通过随机游走获取的局部信息。经验表明，这些技术能够产生比经典方法"更优"的嵌入。然而，由于这些方法依赖于求解非凸优化问题，即使在简单图类上，获得解的性质理论保证仍具挑战性。本工作证明了DeepWalk算法在随机块模型（SBM）生成图上的收敛性质。尽管SBM是简化模型，但已被证明是分析大型图算法行为的经典模型。我们的结果与现有SBM谱嵌入的研究结论相呼应，表明即使在一维嵌入情况下，DeepWalk算法的输出也能以高概率可证明地恢复聚类结构。

相关内容

DeepWalk

关注 3

DeepWalk是最早提出的基于 Word2vec 的节点向量化模型。其主要思路，就是利用构造节点在网络上的随机游走路径，来模仿文本生成的过程，提供一个节点序列，然后用Skip-gram和Hierarchical Softmax模型对随机游走序列中每个局部窗口内的节点对进行概率建模，最大化随机游走序列的似然概率，并使用最终随机梯度下降学习参数。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日