Graph neural networks (GNNs) achieve strong performance on graph learning tasks, but training on large-scale networks remains computationally challenging. Transferability results show that GNNs with fixed weights can generalize from smaller graphs to larger ones drawn from the same family, motivating the use of sampled subgraphs to boost training efficiency. Yet most existing sampling strategies rely on reliable access to the target graph structure, which in practice may be noisy, incomplete, or unavailable prior to training. In lieu of precise connectivity information, we study feature-driven subgraph sampling for transferable GNNs, with the goal of preserving spectral properties of graph operators that control GNN expressivity. We adopt an alignment-based perspective linking node feature statistics to graph spectral structure and develop two complementary notions of feature-graph alignment. For coarse alignment, we formalize feature homophily through a Laplacian-based measure quantifying the alignment of feature principal components with graph eigenvectors, and establish a lower bound on the Laplacian trace in terms of feature statistics. This motivates a simple, non-sequential sampling algorithm that operates on the feature matrix and preserves a trace-based proxy for operator rank. For fine alignment, we assume a stationary model where the feature covariance and Laplacian share an eigenbasis, and establish that diagonal covariance entries reflect node-degree ordering under monotone filters. We empirically validate that filter monotonicity dictates the relationship between feature variance and spectral energy. On real-world benchmarks, selecting the retention rule that maximizes the Laplacian trace consistently yields GNNs with superior transferability and reduced generalization gaps.
翻译:图神经网络(GNN)在图学习任务中表现优异,但在大规模网络上的训练仍面临计算挑战。可迁移性结果表明,固定权重的GNN能够从同一族系的小图泛化至大图,这促使研究者利用采样子图提升训练效率。然而,现有采样策略大多依赖于对目标图结构的可靠访问,而在实际场景中,此类结构信息可能含噪、不完整,或训练前不可获取。针对精确连接信息的缺失问题,我们研究面向可迁移GNN的特征驱动子图采样方法,旨在保持控制GNN表达力的图算子谱特性。采用基于对齐的视角建立节点特征统计量与图谱结构之间的关联,并发展出两种互补的特征-图对齐概念。对于粗粒度对齐,我们通过基于拉普拉斯算子的度量形式化特征同质性:该度量量化特征主成分与图特征向量的对齐程度,并建立以特征统计量表示的拉普拉斯迹下界。这催生了一种简单的非序贯采样算法,其基于特征矩阵操作并保持算子秩的迹代理量。对于细粒度对齐,我们假设特征协方差矩阵与拉普拉斯算子共享特征基的平稳模型,并证明对角协方差项在单调滤波条件下反映节点度序关系。通过实验验证了滤波单调性对特征方差与谱能量关系的决定性作用。在真实世界基准测试中,选择最大化拉普拉斯迹的保留规则,能持续获得具有更优可迁移性和更小泛化差距的GNN。