When dealing with tabular data, models based on decision trees are a popular choice due to their high accuracy on these data types, their ease of application, and explainability properties. However, when it comes to graph-structured data, it is not clear how to apply them effectively, in a way that incorporates the topological information with the tabular data available on the vertices of the graph. To address this challenge, we introduce Decision Trees with Dynamic Graph Features (TREE-G). Rather than only using the pre-defined given features in the data, TREE-G acts on dynamic features, which are computed as the graph traverses the tree. These dynamic features combine the vertex features with the topological information, as well as the cumulative information learned by the tree. Therefore, the features adapt to the predictive task and the graph in hand. We analyze the theoretical properties of TREE-G and demonstrate its benefits empirically on multiple graph and node prediction benchmarks. In these experiments,TREE-G consistently outperformed other tree-based models and often outperformed other graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels, sometimes by large margins. Finally, we also provide an explainability mechanism for TREE-G, and demonstrate that it can provide informative and intuitive explanations.
翻译:在处理表格数据时,基于决策树的模型因其对这类数据的高精度、易于应用和可解释性而成为热门选择。然而,当面对图结构数据时,如何有效应用这些模型,将拓扑信息与图中顶点上的表格数据相结合,仍不明确。为应对这一挑战,我们引入了具有动态图特征的决策树(TREE-G)。与仅使用数据中预定义的给定特征不同,TREE-G作用于动态特征,这些特征在图遍历树时计算得出。这些动态特征将顶点特征与拓扑信息以及树所学习的累积信息相结合。因此,这些特征能适应预测任务和当前图。我们分析了TREE-G的理论特性,并在多个图和节点预测基准上实证了其优势。在这些实验中,TREE-G始终优于其他基于树的模型,且通常优于其他图学习算法,如图神经网络(GNNs)和图核,有时甚至以较大幅度领先。最后,我们还为TREE-G提供了一种可解释性机制,并证明它能提供信息丰富且直观的解释。