Recently, significant attention has been given to the idea of viewing relational databases as heterogeneous graphs, enabling the application of graph neural network (GNN) technology for predictive tasks. However, existing GNN methods struggle with the complexity of the heterogeneous graphs induced by databases with numerous tables and relations. Traditional approaches either consider all possible relational meta-paths, thus failing to scale with the number of relations, or rely on domain experts to identify relevant meta-paths. A recent solution does manage to learn informative meta-paths without expert supervision, but assumes that a node's class depends solely on the existence of a meta-path occurrence. In this work, we present a self-explainable heterogeneous GNN for relational data, that supports models in which class membership depends on aggregate information obtained from multiple occurrences of a meta-path. Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model's reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenario.
翻译:近年来,将关系数据库视为异质图以应用图神经网络(GNN)技术进行预测任务的观点受到广泛关注。然而,现有GNN方法难以处理由包含大量表和关系的数据集所引发的复杂异质图结构。传统方法要么考虑所有可能的关系元路径,从而无法随关系数量扩展;要么依赖领域专家识别相关元路径。近期一种解决方案确实能在无专家监督的情况下学习信息性元路径,但假设节点的类别仅取决于元路径实例的存在性。本文提出一种用于关系数据的自解释异质GNN,该模型支持类别归属取决于从元路径的多个实例中获取的聚合信息。实验结果表明,在关系数据库场景下,我们的方法能有效识别信息性元路径,这些路径忠实地捕捉了模型的推理机制。该方法在合成场景和实际场景中均显著优于现有方法。