Variational Graph Auto-Encoders (VGAEs) have been widely used to solve the node clustering task. However, the state-of-the-art methods have numerous challenges. First, existing VGAEs do not account for the discrepancy between the inference and generative models after incorporating the clustering inductive bias. Second, current models are prone to degenerate solutions that make the latent codes match the prior independently of the input signal (i.e., Posterior Collapse). Third, existing VGAEs overlook the effect of the noisy clustering assignments (i.e., Feature Randomness) and the impact of the strong trade-off between clustering and reconstruction (i.e., Feature Drift). To address these problems, we formulate a variational lower bound in a contrastive setting. Our lower bound is a tighter approximation of the log-likelihood function than the corresponding Evidence Lower BOund (ELBO). Thanks to a newly identified term, our lower bound can escape Posterior Collapse and has more flexibility to account for the difference between the inference and generative models. Additionally, our solution has two mechanisms to control the trade-off between Feature Randomness and Feature Drift. Extensive experiments show that the proposed method achieves state-of-the-art clustering results on several datasets. We provide strong evidence that this improvement is attributed to four aspects: integrating contrastive learning and alleviating Feature Randomness, Feature Drift, and Posterior Collapse.
翻译:变分图自编码器(VGAEs)已被广泛用于解决节点聚类任务。然而,现有最先进方法面临诸多挑战。首先,现有VGAEs在引入聚类归纳偏置后未考虑推理模型与生成模型之间的差异。其次,当前模型容易产生退化解,使得潜编码独立于输入信号匹配先验分布(即后验坍塌)。第三,现有VGAEs忽视了噪声聚类分配的影响(即特征随机性)以及聚类与重构之间强权衡的影响(即特征漂移)。为解决这些问题,我们在对比学习框架下推导出变分下界。该下界是对数似然函数比对应证据下界(ELBO)更紧的近似。凭借新识别的项,我们的下界能够避免后验坍塌,并具有更大灵活性来调节推理模型与生成模型之间的差异。此外,我们的解决方案包含两种机制来控制特征随机性与特征漂移之间的权衡。大量实验表明,所提方法在多个数据集上取得了最先进的聚类结果。我们提供充分证据表明该改进归因于四个方面:整合对比学习并缓解特征随机性、特征漂移和后验坍塌。