Homophily is a graph property describing the tendency of edges to connect similar nodes. There are several measures used for assessing homophily but all are known to have certain drawbacks: in particular, they cannot be reliably used for comparing datasets with varying numbers of classes and class size balance. To show this, previous works on graph homophily suggested several properties desirable for a good homophily measure, also noting that no existing homophily measure has all these properties. Our paper addresses this issue by introducing a new homophily measure - unbiased homophily - that has all the desirable properties and thus can be reliably used across datasets with different label distributions. The proposed measure is suitable for undirected (and possibly weighted) graphs. We show both theoretically and via empirical examples that the existing homophily measures have serious drawbacks while unbiased homophily has a desirable behavior for the considered scenarios. Finally, when it comes to directed graphs, we prove that some desirable properties contradict each other and thus a measure satisfying all of them cannot exist.
翻译:同质性是一种描述图中边倾向于连接相似节点的图属性。现有多种用于评估同质性的度量方法,但均存在特定缺陷:尤其无法可靠地应用于具有不同类别数量和类别规模平衡度的数据集间的比较。为揭示此问题,先前关于图同质性的研究提出了理想同质性度量应具备的若干性质,同时指出现有度量均未能同时满足所有这些性质。本文通过引入一种新的同质性度量——无偏同质性——来解决该问题,该度量具备所有理想性质,因而可可靠地应用于具有不同标签分布的数据集。所提出的度量适用于无向图(可能带权重)。我们通过理论证明和实证案例表明,现有同质性度量存在严重缺陷,而无偏同质性在所考察场景中表现出理想特性。最后,针对有向图,我们证明了某些理想性质之间存在互斥关系,因此无法构造出同时满足所有这些性质的度量。