Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. It is often believed that heterophilous graphs are challenging for standard message-passing graph neural networks (GNNs), and much effort has been put into developing efficient methods for this setting. However, there is no universally agreed-upon measure of homophily in the literature. In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets. For this, we formalize desirable properties for a proper homophily measure and verify which measures satisfy which properties. In particular, we show that a measure that we call adjusted homophily satisfies more desirable properties than other popular homophily measures while being rarely used in graph machine learning literature. Then, we go beyond the homophily-heterophily dichotomy and propose a new characteristic that allows one to further distinguish different sorts of heterophily. The proposed label informativeness (LI) characterizes how much information a neighbor's label provides about a node's label. We prove that this measure satisfies important desirable properties. We also observe empirically that LI better agrees with GNN performance compared to homophily measures, which confirms that it is a useful characteristic of the graph structure.
翻译:同质性是一种描述边倾向于连接相似节点的图属性;其对立面被称为异质性。通常认为,异质性图对于标准的消息传递图神经网络(GNN)具有挑战性,为此已有大量研究致力于开发适用于该场景的高效方法。然而,文献中尚未形成普遍认可的同质性度量标准。在本工作中,我们表明常用的同质性度量存在关键缺陷,阻碍了不同数据集间同质性水平的比较。为此,我们形式化了一个恰当同质性度量应具备的理想性质,并验证了各度量满足哪些性质。特别地,我们证明了一种称为调整同质性的度量比其他流行的同质性度量满足更多理想性质,尽管它在图机器学习文献中极少被使用。随后,我们超越了同质性-异质性二分法,提出了一种新特征,用于进一步区分不同类型的异质性。所提出的标签信息量(LI)刻画了邻居标签能为节点标签提供多少信息。我们证明了该度量满足重要的理想性质。此外,我们通过实证观察到,与同质性度量相比,LI能更好地与GNN性能达成一致,这证实了它作为图结构的一个有用特征的有效性。