Community detection and graph clustering are essential for unsupervised data exploration and understanding the high-level organisation of networked systems. Recently, graph clustering has received attention as a primary task for graph neural networks. Although hierarchical graph pooling has been shown to improve performance in graph and node classification tasks, it performs poorly in identifying meaningful clusters. Community detection has a long history in network science, but typically relies on optimising objective functions with custom-tailored search algorithms, not leveraging recent advances in deep learning, particularly from graph neural networks. In this paper, we narrow this gap between the deep learning and network science communities. We consider the map equation, an information-theoretic objective function for unsupervised community detection. Expressing it in a fully differentiable tensor form that produces soft cluster assignments, we optimise the map equation with deep learning through gradient descent. More specifically, the reformulated map equation is a loss function compatible with any graph neural network architecture, enabling flexible clustering and graph pooling that clusters both graph structure and data features in an end-to-end way, automatically finding an optimum number of clusters without explicit regularisation by following the minimum description length principle. We evaluate our approach experimentally using different neural network architectures for unsupervised clustering in synthetic and real data. Our results show that our approach achieves competitive performance against baselines, naturally detects overlapping communities, and avoids over-partitioning sparse graphs.
翻译:社区检测和图聚类对于无监督数据探索以及理解网络系统的高层组织至关重要。近年来,图聚类作为图神经网络的主要任务受到关注。尽管分层图池化在图形和节点分类任务中展现出了性能提升,但在识别有意义簇方面表现不佳。社区检测在网络科学领域有着悠久历史,但通常依赖定制搜索算法优化目标函数,未能利用深度学习特别是图神经网络的最新进展。本文旨在缩小深度学习与网络科学社区之间的差距。我们考虑地图方程——一种用于无监督社区检测的信息论目标函数。通过将其表示为完全可微的张量形式以生成软聚类分配,我们利用深度学习的梯度下降法优化地图方程。具体而言,重新表述后的地图方程可作为兼容任意图神经网络架构的损失函数,实现端到端的灵活聚类和图池化,既能聚类图结构又能聚类数据特征,并遵循最小描述长度原则自动确定最优聚类数量而无需显式正则化。我们使用不同神经网络架构在合成数据和真实数据上对无监督聚类方法进行了实验评估。结果表明,我们的方法在基线对比中取得了竞争性表现,能自然检测重叠社区,并避免对稀疏图的过度分区。