Graph Neural Networks (GNNs) can be trained to detect communities within a graph by learning from the duality of feature and connectivity information. Currently, the common approach for optimisation of GNNs is to use comparisons to ground-truth for hyperparameter tuning and model selection. In this work, we show that nodes can be clustered into communities with GNNs by solely optimising for modularity, without any comparison to ground-truth. Although modularity is a graph partitioning quality metric, we show that this can be used to optimise GNNs that also encode features without a drop in performance. We take it a step further and also study whether the unsupervised metric performance can predict ground-truth performance. To investigate why modularity can be used to optimise GNNs, we design synthetic experiments that show the limitations of this approach. The synthetic graphs are created to highlight current capabilities in distinct, random and zero information space partitions in attributed graphs. We conclude that modularity can be used for hyperparameter optimisation and model selection on real-world datasets as well as being a suitable proxy for predicting ground-truth performance, however, GNNs fail to balance the information duality when the spaces contain conflicting signals.
翻译:图神经网络(GNNs)可通过学习特征与连接信息的二元性,在图中检测社区结构。目前,优化GNN的常见方法是利用与真实标签的对比进行超参数调优和模型选择。本研究表明,仅通过优化模块度(无需任何真实标签对比),即可利用GNN将节点聚类为社区。尽管模块度是图划分质量度量指标,但我们证明其可用于优化同时编码特征的GNN,且性能未下降。我们进一步探究无监督度量性能能否预测真实标签性能。为探究模块度为何能用于优化GNN,我们设计合成实验揭示该方法的局限性。通过构建合成图,重点分析属性图中离散、随机及零信息空间分割下的当前能力边界。结论表明:模块度可用于真实数据集的超参数优化与模型选择,亦可作为预测真实标签性能的有效代理指标;但当空间包含冲突信号时,GNN无法平衡信息二元性。