Accurate predictions on tabular data rely on capturing complex, dataset-specific feature interactions. Attention-based methods and graph neural networks, referred to as graph-based tabular deep learning (GTDL), aim to improve predictions by modeling these interactions as a graph. In this work, we analyze how these methods model the feature interactions. Current GTDL approaches primarily focus on optimizing predictive accuracy, often neglecting the accurate modeling of the underlying graph structure. Using synthetic datasets with known ground-truth graph structures, we find that current GTDL methods fail to recover meaningful feature interactions, as their edge recovery is close to random. This suggests that the attention mechanism and message-passing schemes used in GTDL do not effectively capture feature interactions. Furthermore, when we impose the true interaction structure, we find that the predictive accuracy improves. This highlights the need for GTDL methods to prioritize accurate modeling of the graph structure, as it leads to better predictions.
翻译:在表格数据上实现精确预测依赖于捕捉复杂且数据集特定的特征交互。基于注意力的方法和图神经网络,即基于图的表格深度学习(GTDL),旨在通过将这些交互建模为图来改进预测。在本工作中,我们分析了这些方法如何对特征交互进行建模。当前的GTDL方法主要侧重于优化预测准确性,常常忽视对底层图结构的精确建模。通过使用具有已知真实图结构的合成数据集,我们发现当前的GTDL方法未能恢复有意义的特征交互,因为其边恢复结果近乎随机。这表明GTDL中使用的注意力机制和消息传递方案未能有效捕捉特征交互。此外,当我们施加真实的交互结构时,我们发现预测准确性得到了提升。这突显了GTDL方法需要优先考虑对图结构的精确建模,因为这将带来更好的预测结果。