Semantic-aware Node Synthesis for Imbalanced Heterogeneous Information Networks

Heterogeneous graph neural networks (HGNNs) have exhibited exceptional efficacy in modeling the complex heterogeneity in heterogeneous information networks (HINs). The critical advantage of HGNNs is their ability to handle diverse node and edge types in HINs by extracting and utilizing the abundant semantic information for effective representation learning. However, as a widespread phenomenon in many real-world scenarios, the class-imbalance distribution in HINs creates a performance bottleneck for existing HGNNs. Apart from the quantity imbalance of nodes, another more crucial and distinctive challenge in HINs is semantic imbalance. Minority classes in HINs often lack diverse and sufficient neighbor nodes, resulting in biased and incomplete semantic information. This semantic imbalance further compounds the difficulty of accurately classifying minority nodes, leading to the performance degradation of HGNNs. To tackle the imbalance of minority classes and supplement their inadequate semantics, we present the first method for the semantic imbalance problem in imbalanced HINs named Semantic-aware Node Synthesis (SNS). By assessing the influence on minority classes, SNS adaptively selects the heterogeneous neighbor nodes and augments the network with synthetic nodes while preserving the minority semantics. In addition, we introduce two regularization approaches for HGNNs that constrain the representation of synthetic nodes from both semantic and class perspectives to effectively suppress the potential noises from synthetic nodes, facilitating more expressive embeddings for classification. The comprehensive experimental study demonstrates that SNS consistently outperforms existing methods by a large margin in different benchmark datasets.

翻译：异构图神经网络（HGNNs）在建模异质信息网络（HINs）的复杂异质性方面展现出卓越性能。HGNNs的关键优势在于通过提取和利用丰富的语义信息来处理HINs中多样的节点和边类型，从而实现有效的表示学习。然而，作为许多现实场景中的普遍现象，HINs中的类别不平衡分布为现有HGNNs造成了性能瓶颈。除了节点数量不平衡外，HINs中另一个更关键且独特的挑战是语义不平衡。HINs中的少数类往往缺乏多样且充足的邻居节点，导致语义信息存在偏差且不完整。这种语义不平衡进一步加剧了准确分类少数节点的难度，导致HGNNs性能退化。为应对少数类的不平衡问题并补充其缺失语义，我们提出了首个针对不平衡HINs中语义不平衡问题的方法——语义感知节点合成（SNS）。通过评估对少数类的影响，SNS自适应地选择异质邻居节点，在保持少数类语义的同时用合成节点扩充网络。此外，我们引入了两种面向HGNNs的正则化方法，从语义和类别两个角度约束合成节点的表示，有效抑制合成节点带来的潜在噪声，从而促进更具表达力的分类嵌入。全面的实验研究表明，在不同基准数据集上，SNS均以较大优势持续优于现有方法。