Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs.
翻译:尽管将机器学习技术应用于结构化数据引起了广泛关注,但此类技术的表达能力(即对可学习内容的描述)仍未被充分理解。本文研究基于图神经网络(GNN)的数据变换。首先,我们注意到数据编码为GNN可处理的数值形式的方式会模糊模型表达能力的刻画,并论证规范化编码可作为适当基础。其次,我们研究单调最大和GNN的表达能力,这类GNN涵盖使用最大和聚合函数的子类。我们证明,对于每个此类GNN,均可计算出一个Datalog程序,使得将该GNN应用于任意数据集时,产生的数据事实与程序规则在数据集上单次应用的结果一致。单调最大和GNN可对无界数量的特征向量求和,导致特征值任意增大,而规则应用仅需有界数量的常数。因此,我们的结果表明单调最大和GNN的无界求和并不会增强其表达能力。第三,我们将结果精化至仅使用最大聚合函数的单调最大GNN子类,并识别出对应的Datalog程序类别。