Graph homomorphism counts, first explored by Lov\'asz in 1967, have recently garnered interest as a powerful tool in graph-based machine learning. Grohe (PODS 2020) proposed the theoretical foundations for using homomorphism counts in machine learning on graph level as well as node level tasks. By their very nature, these capture local structural information, which enables the creation of robust structural embeddings. While a first approach for graph level tasks has been made by Nguyen and Maehara (ICML 2020), we experimentally show the effectiveness of homomorphism count based node embeddings. Enriched with node labels, node weights, and edge weights, these offer an interpretable representation of graph data, allowing for enhanced explainability of machine learning models. We propose a theoretical framework for isomorphism-invariant homomorphism count based embeddings which lend themselves to a wide variety of downstream tasks. Our approach capitalises on the efficient computability of graph homomorphism counts for bounded treewidth graph classes, rendering it a practical solution for real-world applications. We demonstrate their expressivity through experiments on benchmark datasets. Although our results do not match the accuracy of state-of-the-art neural architectures, they are comparable to other advanced graph learning models. Remarkably, our approach demarcates itself by ensuring explainability for each individual feature. By integrating interpretable machine learning algorithms like SVMs or Random Forests, we establish a seamless, end-to-end explainable pipeline. Our study contributes to the advancement of graph-based techniques that offer both performance and interpretability.
翻译:图同态计数最早由Lovász于1967年探索,近年来作为基于图的机器学习中的强大工具而备受关注。Grohe(PODS 2020)提出了在同态计数基础上进行图级别及节点级别任务机器学习的理论基础。其本质特性使其能够捕获局部结构信息,从而构建鲁棒的结构嵌入。虽然Nguyen和Maehara(ICML 2020)已在图级别任务上提出初步方法,但我们通过实验展示了基于同态计数的节点嵌入的有效性。通过融入节点标签、节点权重和边权重,这些嵌入提供了图数据的可解释表示,增强了机器学习模型的可解释性。我们提出了一种基于同态计数的同构不变性嵌入理论框架,该框架适用于多种下游任务。我们的方法利用了有界树宽图类中图同态计数的高效可计算性,使其成为实际应用中的实用解决方案。我们通过基准数据集上的实验展示了其表达能力。尽管我们的结果未达到最先进神经架构的准确率,但与其他高级图学习模型相比具有可比性。值得注意的是,我们的方法通过确保每个单独特征的可解释性而独树一帜。通过集成可解释机器学习算法(如SVM或随机森林),我们建立了一个无缝的端到端可解释流水线。本研究为兼顾性能与可解释性的图技术发展做出了贡献。