In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predictive model. However, existing RDL methods neglect the imbalance problem of relational data in RDBs and risk under-representing the minority entities, leading to an unusable model in practice. In this work, we investigate, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature. Specifically, to mitigate the issue of minority-related information being submerged by majority counterparts, we design the relation-wise gating controller to modulate neighborhood messages from each individual relation type. Based on the relational-gated representations, we further propose the relation-guided minority synthesizer for over-sampling, which integrates the entity relational signatures to maintain relational consistency. Extensive experiments on 12 entity classification datasets provide compelling evidence for the superiority of Rel-MOSS, yielding an average improvement of up to 2.46% and 4.00% in terms of Balanced Accuracy and G-Mean, compared with SOTA RDL methods and classic methods for handling class imbalance.
翻译:在近期进展中,为在关系数据库上实现完全数据驱动的学习范式,研究者提出了关系深度学习,其将关系数据库结构化为异质实体图并采用图神经网络作为预测模型。然而,现有关系深度学习方法忽视了关系数据库中关系数据的不平衡问题,可能导致少数实体表征不足,致使模型在实际中无法应用。本工作首次针对关系数据库实体分类中的类别不平衡问题展开研究,设计了以关系为中心的少数类合成过采样图神经网络,以填补当前文献中的关键空白。具体而言,为缓解少数类相关信息被多数类信息淹没的问题,我们设计了关系门控控制器来调节来自各独立关系类型的邻域信息。基于关系门控表征,我们进一步提出关系引导的少数类合成器进行过采样,该合成器通过整合实体关系特征以保持关系一致性。在12个实体分类数据集上的大量实验为Rel-MOSS的优越性提供了有力证据:与最先进的关系深度学习方法和经典类别不平衡处理方法相比,其在平衡准确率和G均值指标上分别实现了平均高达2.46%和4.00%的性能提升。