Adjusted similarity measures, such as Cohen's kappa for inter-rater reliability and the adjusted Rand index used to compare clustering algorithms, are a vital tool for comparing discrete labellings. These measures are intended to have the property of 0 expectation under a null distribution and maximum value 1 under maximal similarity to aid in interpretation. Measures are frequently adjusted with respect to the permutation distribution for historic and analytic reasons. There is currently renewed interest in considering other null models more appropriate for context, such as clustering ensembles permitting a random number of identified clusters. The purpose of this work is two -- fold: (1) to generalize the study of the adjustment operator to general null models and to a more general procedure which includes statistical standardization as a special case and (2) to identify sufficient conditions for the adjustment operator to produce the intended properties, where sufficient conditions are related to whether and how observed data are incorporated into null distributions. We demonstrate how violations of the sufficient conditions may lead to substantial breakdown, such as by producing a non-positive measure under traditional adjustment rather than one with mean 0, or by producing a measure which is deterministically 0 under statistical standardization.
翻译:调整相似性度量,如用于评估者间信度的Cohen's kappa和用于比较聚类算法的调整兰德指数,是比对离散标注结果的重要工具。这些度量旨在具备零期望特性(在零分布下)和最大值1特性(在最大相似性下),以辅助结果解释。出于历史和分析原因,度量常基于置换分布进行调整。当前学界重新关注采用更符合情境的其他零模型,例如允许随机聚类数量的聚类集成方法。本研究目的有二:(1)将调整算子的研究推广至一般零模型及更广义的处理流程(其中统计标准化作为特例);(2)确定调整算子产生预期性质的充分条件,这些条件与观测数据是否及如何纳入零分布密切相关。我们论证了违反充分条件可能导致严重失效,例如传统调整可能产生非正度量而非零均值度量,或统计标准化下可能产生确定性为零的度量。