We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions and reducing dimensionality by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may be unsuited to classical feature importance metrics. Inspired by this, we present a data-, model-, and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, we show that (i) our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions; (ii) we return meaningful feature importance scores well before the GNN is fully trained; and (iii) our scores demonstrably extract relevant properties that inform feature importance for various graph learning settings.
翻译:我们提出了一种用于图神经网络(GNN)的自适应节点特征选择方法,能够在训练过程中识别并移除不必要的特征。衡量特征对模型输出的贡献能力,对于解释模型决策以及通过消除无帮助变量来降低维度至关重要。然而,图结构数据引入了复杂的依赖关系,使得传统特征重要性指标可能不再适用。受此启发,我们提出了一种与数据、模型和任务无关的方法,通过基于特征值置换后验证集性能的变化,在训练过程中确定相关特征。我们从理论上论证了该方法:通过刻画节点数据与图结构之间关系如何影响GNN性能,为方法提供理论支撑。实验表明:(i) 我们高度通用的方法可与利用先验假设的定制化特征选择方法相媲美;(ii) 在GNN完全训练之前,我们就能返回有意义的特征重要性评分;(iii) 我们的评分能够明确提取出与图学习各类场景中特征重要性相关的关键属性。