The Adjusted Rand Index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand Index to fuzzy clusterings, but the assumptions of the most common random model is difficult to justify in fuzzy settings. We propose a single framework for computing the ARI with three random models that are intuitive and explainable for both hard and fuzzy clusterings, along with the benefit of lower computational complexity. The theory and assumptions of the proposed models are contrasted with the existing permutation model. Computations on synthetic and benchmark data show that each model has distinct behaviour, meaning that accurate model selection is important for the reliability of results.
翻译:调整兰德指数(Adjusted Rand Index, ARI)是硬聚类比较中广泛使用的指标,但其随机模型的选择通常未被明确阐述。近年来虽有研究将兰德指数推广至模糊聚类,但最常用随机模型的假设在模糊场景下难以论证。本文提出一个统一框架,基于三种直观可解释的随机模型计算ARI,适用于硬聚类与模糊聚类,并具有更低的计算复杂度。这些模型的理论基础与假设与现有排列模型进行了对比分析。合成数据与基准数据集上的计算表明,不同模型呈现差异化行为,因此准确选择模型对结果可靠性至关重要。