Recently, significant attention has been given to the idea of viewing relational databases as heterogeneous graphs, enabling the application of graph neural network (GNN) technology for predictive tasks. However, existing GNN methods struggle with the complexity of the heterogeneous graphs induced by databases with numerous tables and relations. Traditional approaches either consider all possible relational meta-paths, thus failing to scale with the number of relations, or rely on domain experts to identify relevant meta-paths. A recent solution does manage to learn informative meta-paths without expert supervision, but assumes that a node's class depends solely on the existence of a meta-path occurrence. In this work, we present a self-explainable heterogeneous GNN for relational data, that supports models in which class membership depends on aggregate information obtained from multiple occurrences of a meta-path. Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model's reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenario.
翻译:近年来,将关系数据库视为异质图以应用图神经网络(GNN)技术进行预测任务的观点受到了广泛关注。然而,现有GNN方法难以处理由包含大量表和关系的数据所诱导出的复杂异质图。传统方法要么考虑所有可能的关系元路径,从而无法随关系数量扩展,要么依赖领域专家来识别相关元路径。一种近期提出的解决方案确实能够在无需专家监督的情况下学习信息丰富的元路径,但其假设节点的类别仅取决于元路径实例的存在性。在本工作中,我们提出了一种用于关系数据的自解释异质GNN,该模型支持类别归属依赖于从元路径的多个实例中获得的聚合信息。实验结果表明,在关系数据库的背景下,我们的方法能有效识别信息丰富的元路径,这些路径忠实地捕捉了模型的推理机制。在合成与真实场景中,其性能均显著优于现有方法。