Variational inference has been widely used in machine learning literature to fit various Bayesian models. In network analysis, this method has been successfully applied to solve the community detection problems. Although these results are promising, their theoretical support is only for relatively dense networks, an assumption that may not hold for real networks. In addition, it has been shown recently that the variational loss surface has many saddle points, which may severely affect its performance, especially when applied to sparse networks. This paper proposes a simple way to improve the variational inference method by hard thresholding the posterior of the community assignment after each iteration. Using a random initialization that correlates with the true community assignment, we show that the proposed method converges and can accurately recover the true community labels, even when the average node degree of the network is bounded. Extensive numerical study further confirms the advantage of the proposed method over the classical variational inference and another state-of-the-art algorithm.
翻译:变分推断已在机器学习文献中被广泛用于拟合各种贝叶斯模型。在网络分析领域,该方法已被成功应用于解决社区发现问题。尽管已有结果令人鼓舞,但其理论支持仅针对相对稠密的网络,这一假设对实际网络而言未必成立。此外,新近研究表明变分损失曲面存在大量鞍点,这可能严重影响其性能,尤其在应用于稀疏网络时。本文提出了一种简单方法——在每次迭代后对社区分配的后验进行硬阈值化处理——以改进变分推断方法。通过采用与真实社区分配相关的随机初始化,我们证明了所提方法能够收敛,并且即使在网络平均节点度有界的情况下,也能准确恢复真实的社区标签。广泛的数值研究进一步证实,该方法相较于经典变分推断及另一种先进算法具有显著优势。