Many real-world datasets, such as citation networks, social networks, and molecular structures, are naturally represented as heterogeneous graphs, where nodes belong to different types and have additional features. For example, in a citation network, nodes representing "Paper" or "Author" may include attributes like keywords or affiliations. A critical machine learning task on these graphs is node classification, which is useful for applications such as fake news detection, corporate risk assessment, and molecular property prediction. Although Heterogeneous Graph Neural Networks (HGNNs) perform well in these contexts, their predictions remain opaque. Existing post-hoc explanation methods lack support for actual node features beyond one-hot encoding of node type and often fail to generate realistic, faithful explanations. To address these gaps, we propose DiGNNExplainer, a model-level explanation approach that synthesizes heterogeneous graphs with realistic node features via discrete denoising diffusion. In particular, we generate realistic discrete features (e.g., bag-of-words features) using diffusion models within a discrete space, whereas previous approaches are limited to continuous spaces. We evaluate our approach on multiple datasets and show that DiGNNExplainer produces explanations that are realistic and faithful to the model's decision-making, outperforming state-of-the-art methods.
翻译:许多现实世界数据集,如引文网络、社交网络和分子结构,天然表示为异质图,其中节点属于不同类型并具有附加特征。例如,在引文网络中,代表"论文"或"作者"的节点可能包含关键词或所属机构等属性。这些图上的关键机器学习任务是节点分类,这对于虚假新闻检测、企业风险评估和分子性质预测等应用非常有用。尽管异质图神经网络在这些场景中表现良好,但其预测过程仍不透明。现有的后验解释方法仅支持节点类型的独热编码,缺乏对实际节点特征的支持,且往往无法生成真实可信的解释。为解决这些问题,我们提出了DiGNNExplainer——一种通过离散去噪扩散合成具有真实节点特征的异质图的模型级解释方法。特别地,我们在离散空间内使用扩散模型生成真实的离散特征(如词袋特征),而先前方法仅限于连续空间。我们在多个数据集上评估了该方法,结果表明DiGNNExplainer生成的解释既真实又忠实于模型的决策过程,其性能优于现有最先进方法。