We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions and reducing dimensionality by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may be unsuited to classical feature importance metrics. Inspired by this, we present a data-, model-, and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, we show that (i) our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions; (ii) we return meaningful feature importance scores well before the GNN is fully trained; and (iii) our scores demonstrably extract relevant properties that inform feature importance for various graph learning settings.
翻译:我们提出一种面向图神经网络(GNNs)的自适应节点特征选择方法,可在训练过程中识别并移除冗余特征。衡量特征对模型输出的贡献能力,对于解释决策过程以及通过消除无效变量实现降维至关重要。然而,图结构数据引入了复杂的依赖关系,可能不适用于经典的特征重要性度量方法。受此启发,我们提出一种与数据、模型和任务无关的方法,该方法基于训练过程中对特征值进行置换引起的验证性能变化来确定相关特征。我们从理论上论证了该方法的合理性,刻画了节点数据与图结构之间的关联如何影响GNN性能。实验结果表明:(i) 我们的高度通用方法可媲美利用先验假设的特化特征选择方法的性能;(ii) 在GNN完全训练前即可获得有意义的特征重要性评分;(iii) 我们的评分可明确提取出与各种图学习场景中特征重要性相关的属性。