Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. It is often believed that heterophilous graphs are challenging for standard message-passing graph neural networks (GNNs), and much effort has been put into developing efficient methods for this setting. However, there is no universally agreed-upon measure of homophily in the literature. In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets. For this, we formalize desirable properties for a proper homophily measure and verify which measures satisfy which properties. In particular, we show that a measure that we call adjusted homophily satisfies more desirable properties than other popular homophily measures while being rarely used in graph machine learning literature. Then, we go beyond the homophily-heterophily dichotomy and propose a new characteristic that allows one to further distinguish different sorts of heterophily. The proposed label informativeness (LI) characterizes how much information a neighbor's label provides about a node's label. We prove that this measure satisfies important desirable properties. We also observe empirically that LI better agrees with GNN performance compared to homophily measures, which confirms that it is a useful characteristic of the graph structure.
翻译:同质性描述图中节点倾向于与相似节点相连的性质,其对立性质称为异质性。通常认为,异质性图对标准消息传递图神经网络(GNN)具有挑战性,且已有大量研究致力于开发适用于该场景的高效方法。然而,文献中对同质性的度量尚未形成统一共识。本研究揭示了常用同质性度量存在的关键缺陷,阻碍了不同数据集间同质性水平的比较。为此,我们形式化定义了同质性度量应满足的期望性质,并验证了各度量对这些性质的满足情况。特别地,我们证明了一种称为调整同质性的度量比其他主流同质性度量满足更多期望性质,尽管它在图机器学习文献中鲜有使用。进而,我们超越同质性-异质性二分法,提出了一种能进一步区分不同异质性类型的新特征。该特征——标签信息量(LI)——刻画了邻居标签对节点标签的信息提供程度。我们证明了该度量满足重要的期望性质,并通过实验观察到,与同质性度量相比,LI与GNN性能的一致性更佳,这证实了其作为图结构有效特征的价值。