Graph Neural Networks (GNNs) have improved unsupervised community detection of clustered nodes due to their ability to encode the dual dimensionality of the connectivity and feature information spaces of graphs. Identifying the latent communities has many practical applications from social networks to genomics. Current benchmarks of real world performance are confusing due to the variety of decisions influencing the evaluation of GNNs at this task. To address this, we propose a framework to establish a common evaluation protocol. We motivate and justify it by demonstrating the differences with and without the protocol. The W Randomness Coefficient is a metric proposed for assessing the consistency of algorithm rankings to quantify the reliability of results under the presence of randomness. We find that by ensuring the same evaluation criteria is followed, there may be significant differences from the reported performance of methods at this task, but a more complete evaluation and comparison of methods is possible.
翻译:图神经网络(GNNs)因能够编码图结构连接性与节点特征信息空间的双重维度,在无监督聚类节点社区检测中取得了性能提升。从社交网络到基因组学,识别潜在社区具有诸多实际应用。当前基于真实世界性能的基准评估结果因影响GNN任务评估的多种决策因素而存在混淆。为此,我们提出一个建立通用评估协议的框架。通过展示采用与未采用该协议时的差异,我们对该框架进行论证与合理性说明。W随机性系数是一个用于评估算法排名一致性的指标,旨在量化随机性存在下结果的可靠性。我们发现,在遵循相同评估准则的情况下,各方法在该任务上的报告性能可能存在显著差异,但由此可实现对方法的更完整评估与比较。