Graph homomorphism counts, first explored by Lov\'asz in 1967, have recently garnered interest as a powerful tool in graph-based machine learning. Grohe (PODS 2020) proposed the theoretical foundations for using homomorphism counts in machine learning on graph level as well as node level tasks. By their very nature, these capture local structural information, which enables the creation of robust structural embeddings. While a first approach for graph level tasks has been made by Nguyen and Maehara (ICML 2020), we experimentally show the effectiveness of homomorphism count based node embeddings. Enriched with node labels, node weights, and edge weights, these offer an interpretable representation of graph data, allowing for enhanced explainability of machine learning models. We propose a theoretical framework for isomorphism-invariant homomorphism count based embeddings which lend themselves to a wide variety of downstream tasks. Our approach capitalises on the efficient computability of graph homomorphism counts for bounded treewidth graph classes, rendering it a practical solution for real-world applications. We demonstrate their expressivity through experiments on benchmark datasets. Although our results do not match the accuracy of state-of-the-art neural architectures, they are comparable to other advanced graph learning models. Remarkably, our approach demarcates itself by ensuring explainability for each individual feature. By integrating interpretable machine learning algorithms like SVMs or Random Forests, we establish a seamless, end-to-end explainable pipeline. Our study contributes to the advancement of graph-based techniques that offer both performance and interpretability.
翻译:图同态计数最早由Lovász于1967年探索,近年来作为基于图的机器学习中的有力工具引起广泛关注。Grohe(PODS 2020)提出了在同态计数用于图级和节点级机器学习任务中的理论基础。这些计数本质上捕获局部结构信息,从而能够创建鲁棒的结构嵌入。尽管Nguyen和Maehara(ICML 2020)在图级任务上提出了初步方法,我们通过实验展示了基于同态计数的节点嵌入的有效性。这些嵌入结合节点标签、节点权重和边权重后,提供了图数据的可解释表示,从而增强机器学习模型的可解释性。我们提出了基于同态计数且满足同构不变性的嵌入理论框架,该框架适用于多种下游任务。我们的方法充分利用了有界树宽图类中图同态计数的高效可计算性,使其成为实际应用的实用解决方案。通过基准数据集实验,我们展示了其表达能力。尽管结果未能达到前沿神经架构的准确率,但与其他先进的图学习模型相当。值得注意的是,我们的方法通过确保每个特征的可解释性而独树一帜。通过集成支持向量机或随机森林等可解释机器学习算法,我们构建了无缝的端到端可解释流水线。本研究为同时具备性能与可解释性的图技术发展做出了贡献。