In the node classification task, it is intuitively understood that densely connected nodes tend to exhibit similar attributes. However, it is crucial to first define what constitutes a dense connection and to develop a reliable mathematical tool for assessing node cohesiveness. In this paper, we propose a probability-based objective function for semi-supervised node classification that takes advantage of higher-order networks' capabilities. The proposed function embodies the philosophy most aligned with the intuition behind classifying within higher-order networks, as it is designed to reduce the likelihood of nodes interconnected through higher-order networks bearing different labels. We evaluate the function using both balanced and imbalanced datasets generated by the Planted Partition Model (PPM), as well as a real-world political book dataset. According to the results, in challenging classification contexts characterized by low homo-connection probability, high hetero-connection probability, and limited prior information of nodes, higher-order networks outperform pairwise interactions in terms of objective function performance. Notably, the objective function exhibits elevated Recall and F1-score relative to Precision in the imbalanced dataset, indicating its potential applicability in many domains where detecting false negatives is critical, even at the expense of some false positives.
翻译:在节点分类任务中,直观上理解,密集连接的节点倾向于表现出相似属性。然而,首先需要定义何为密集连接,并开发可靠的数学工具来评估节点的凝聚性。本文提出一种基于概率的半监督节点分类目标函数,该函数充分利用高阶网络的建模能力。所提出的函数体现了与高阶网络内分类直觉最为契合的原理,其设计旨在降低通过高阶网络互联的节点具有不同标签的可能性。我们使用由Planted Partition Model(PPM)生成的均衡与非均衡数据集以及一个真实的政治书籍数据集对该函数进行评估。结果表明,在低同质连接概率、高异质连接概率以及节点先验信息有限的具有挑战性的分类场景中,高阶网络在目标函数性能上优于成对交互。值得注意的是,在非均衡数据集中,该目标函数在召回率和F1分数上相较于精确率表现更突出,这表明其在许多以检测假阴性为关键目标(即使以产生部分假阳性为代价)的领域中具有潜在适用性。