Supervised learning on graphs is a challenging task due to the high dimensionality and inherent structural dependencies in the data, where each edge depends on a pair of vertices. Existing conventional methods designed for Euclidean data do not account for this graph dependency structure. To address this issue, this paper proposes an iterative vertex screening method to identify the signal subgraph that is most informative for the given graph attributes. The method screens the rows and columns of the adjacency matrix concurrently and stops when the resulting distance correlation is maximized. We establish the theoretical foundation of our method by proving that it estimates the true signal subgraph with high probability. Additionally, we establish the convergence rate of classification error under the Erdos-Renyi random graph model and prove that the subsequent classification can be asymptotically optimal, outperforming the entire graph under high-dimensional conditions. Our method is evaluated on various simulated datasets and real-world human and murine graphs derived from functional and structural magnetic resonance images. The results demonstrate its excellent performance in estimating the ground-truth signal subgraph and achieving superior classification accuracy.
翻译:图上的监督学习是一项具有挑战性的任务,因为数据具有高维性和固有的结构依赖性,其中每条边依赖于一对顶点。现有的针对欧几里得数据设计的常规方法无法考虑这种图依赖结构。为解决此问题,本文提出了一种迭代顶点筛选方法,用于识别对给定图属性最具信息量的信号子图。该方法同时筛选邻接矩阵的行和列,并在所得距离相关性最大化时停止。我们通过证明该方法能以高概率估计真实信号子图,为其奠定了理论基础。此外,我们在Erdos-Renyi随机图模型下建立了分类误差的收敛速率,并证明后续分类在高维条件下可渐近最优,优于整个图的分类性能。我们的方法在多种模拟数据集以及来自功能性和结构性磁共振图像的真实人类和小鼠图上进行了评估。结果表明,该方法在估计真实信号子图和实现优越的分类准确率方面表现出色。