Feature interaction is crucial in predictive machine learning models, as it captures the relationships between features that influence model performance. In this work, we focus on pairwise interactions and investigate their importance in constructing feature graphs for Graph Neural Networks (GNNs). Rather than proposing new methods, we leverage existing GNN models and tools to explore the relationship between feature graph structures and their effectiveness in modeling interactions. Through experiments on synthesized datasets, we uncover that edges between interacting features are important for enabling GNNs to model feature interactions effectively. We also observe that including non-interaction edges can act as noise, degrading model performance. Furthermore, we provide theoretical support for sparse feature graph selection using the Minimum Description Length (MDL) principle. We prove that feature graphs retaining only necessary interaction edges yield a more efficient and interpretable representation than complete graphs, aligning with Occam's Razor. Our findings offer both theoretical insights and practical guidelines for designing feature graphs that improve the performance and interpretability of GNN models.
翻译:特征交互在预测性机器学习模型中至关重要,因为它捕获了影响模型性能的特征间关系。在本工作中,我们聚焦于成对交互,并研究其在构建用于图神经网络(GNNs)的特征图时的重要性。我们并非提出新方法,而是利用现有的GNN模型和工具来探索特征图结构与其在建模交互方面的有效性之间的关系。通过在合成数据集上的实验,我们发现交互特征之间的边对于使GNN能够有效建模特征交互至关重要。我们还观察到,包含非交互边可能作为噪声,降低模型性能。此外,我们为使用最小描述长度(MDL)原理进行稀疏特征图选择提供了理论支持。我们证明,仅保留必要交互边的特征图相比完全图能产生更高效且可解释的表示,这与奥卡姆剃刀原理一致。我们的发现为设计能够提升GNN模型性能与可解释性的特征图提供了理论洞见与实践指导。