When dealing with tabular data, models based on decision trees are a popular choice due to their high accuracy on these data types, their ease of application, and explainability properties. However, when it comes to graph-structured data, it is not clear how to apply them effectively, in a way that incorporates the topological information with the tabular data available on the vertices of the graph. To address this challenge, we introduce TREE-G. TREE-G modifies standard decision trees, by introducing a novel split function that is specialized for graph data. Not only does this split function incorporate the node features and the topological information, but it also uses a novel pointer mechanism that allows split nodes to use information computed in previous splits. Therefore, the split function adapts to the predictive task and the graph at hand. We analyze the theoretical properties of TREE-G and demonstrate its benefits empirically on multiple graph and vertex prediction benchmarks. In these experiments, TREE-G consistently outperforms other tree-based models and often outperforms other graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels, sometimes by large margins. Moreover, TREE-Gs models and their predictions can be explained and visualized
翻译:摘要:在处理表格数据时,基于决策树的模型因其在这些数据类型上的高精度、易用性和可解释性而备受青睐。然而,当面对图结构数据时,如何有效应用这些模型以将拓扑信息与图顶点上的表格数据相结合尚不明确。为解决这一挑战,我们提出TREE-G。TREE-G通过引入一种专门针对图数据的新型分裂函数,对标准决策树进行了改进。该分裂函数不仅融合了节点特征与拓扑信息,还采用了一种新型指针机制,使得分裂节点能够利用先前分裂计算得到的信息。因此,该分裂函数能够适应预测任务及当前图的特点。我们分析了TREE-G的理论性质,并在多个图和顶点预测基准上通过实验验证了其优势。在这些实验中,TREE-G始终优于其他基于树的模型,并且常常能大幅超越图神经网络(GNNs)和图核等其他图学习算法。此外,TREE-G模型及其预测结果均具备可解释性与可视化能力。