Graph data widely exists in real life, with large amounts of data and complex structures. It is necessary to map graph data to low-dimensional embedding. Graph classification, a critical graph task, mainly relies on identifying the important substructures within the graph. At present, some graph classification methods do not combine the multi-granularity characteristics of graph data. This lack of granularity distinction in modeling leads to a conflation of key information and false correlations within the model. So, achieving the desired goal of a credible and interpretable model becomes challenging. This paper proposes a causal disentangled multi-granularity graph representation learning method (CDM-GNN) to solve this challenge. The CDM-GNN model disentangles the important substructures and bias parts within the graph from a multi-granularity perspective. The disentanglement of the CDM-GNN model reveals important and bias parts, forming the foundation for its classification task, specifically, model interpretations. The CDM-GNN model exhibits strong classification performance and generates explanatory outcomes aligning with human cognitive patterns. In order to verify the effectiveness of the model, this paper compares the three real-world datasets MUTAG, PTC, and IMDM-M. Six state-of-the-art models, namely GCN, GAT, Top-k, ASAPool, SUGAR, and SAT are employed for comparison purposes. Additionally, a qualitative analysis of the interpretation results is conducted.
翻译:图数据广泛存在于现实生活中,具有数据量大、结构复杂的特点。将图数据映射到低维嵌入表示是必要的。作为图领域中的关键任务,图分类主要依赖于识别图中重要的子结构。当前,部分图分类方法未能结合图数据的多粒度特性,建模时缺乏粒度区分,导致关键信息与虚假关联在模型中混淆,从而难以实现可信赖与可解释模型的期望目标。本文提出一种因果解耦的多粒度图表示学习方法(CDM-GNN)以解决这一挑战。CDM-GNN模型从多粒度视角解耦图中的重要子结构与偏差部分。通过解耦,CDM-GNN模型揭示了重要部分与偏差部分,为分类任务及模型解释提供了基础。该模型展现了强大的分类性能,并生成了符合人类认知模式的解释结果。为验证模型有效性,本文在三个真实数据集(MUTAG、PTC、IMDM-M)上进行了对比实验,采用GCN、GAT、Top-k、ASAPool、SUGAR和SAT六种先进模型作为基线,并对解释结果进行了定性分析。