Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, besides defining homophily for the multi-label scenario, we develop a new approach that dynamically fuses the feature and label correlation information to learn label-informed representations. Finally, we perform a large-scale comparative study with $10$ methods and $9$ datasets which also showcase the effectiveness of our approach. We release our benchmark at \url{https://anonymous.4open.science/r/LFLF-5D8C/}.
翻译:图神经网络(GNN)在图的节点分类任务中展现了最先进的性能提升。尽管这些提升主要在多类分类场景中得到验证,但一个更通用且更现实的场景——每个节点可能具有多个标签——迄今为止受到的关注有限。针对多标签节点分类进行专注研究的首要挑战是公开可用的多标签图数据集数量有限。因此,作为我们的第一项贡献,我们收集并发布了三个真实的生物数据集,并开发了一个多标签图生成器,用于生成具有可调属性的数据集。虽然高标签相似性(高同质性)通常被认为是GNN成功的关键因素,但我们认为多标签场景并不遵循迄今为止为多类场景定义的同质性与异质性的常规语义。作为我们的第二项贡献,除了为多标签场景定义同质性外,我们开发了一种新方法,该方法动态融合特征与标签关联信息,以学习标签感知的表示。最后,我们进行了包含$10$种方法和$9$个数据集的大规模比较研究,这也展示了我们方法的有效性。我们将基准测试发布在 \url{https://anonymous.4open.science/r/LFLF-5D8C/}。