We study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is $O(1)$ in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. We introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. We then compute the generalization error of this classifier and compare its performance against existing learning methods theoretically on a well-studied statistical model with naturally identifiable signal-to-noise ratios (SNRs) in the data. We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical convolution in the regime of high graph signal. Furthermore, we prove a corresponding non-asymptotic result.
翻译:我们研究稀疏设置下特征装饰图的节点分类问题,即当节点期望度数为 $O(1)$(相对于节点数量)时,在固定维数渐近框架下(即特征数据维度固定而节点数量较大)的分类任务。此类图通常具有局部树状结构。我们引入节点分类任务的贝叶斯最优性概念——称为渐近局部贝叶斯最优性,并在一个相当通用的统计数据模型(允许节点特征和边连通性具有任意分布)下,根据该准则计算最优分类器。该最优分类器可通过消息传递图神经网络架构实现。随后,我们计算该分类器的泛化误差,并在一个具有自然可识别信噪比(SNR)的经典统计模型上,从理论上将其性能与现有学习方法进行比较。研究发现,最优消息传递架构在低图信号区域与标准MLP之间插值,在高图信号区域与典型卷积之间插值。此外,我们证明了相应的非渐近结果。