This paper introduces the Stable Matching Based Pairing (SMBP) algorithm, a high-performance external validity index for clustering evaluation in large-scale datasets with a large number of clusters. SMBP leverages the stable matching framework to pair clusters across different clustering methods, significantly reducing computational complexity to $O(N^2)$, compared to traditional Maximum Weighted Matching (MWM) with $O(N^3)$ complexity. Through comprehensive evaluations on real-world and synthetic datasets, SMBP demonstrates comparable accuracy to MWM and superior computational efficiency. It is particularly effective for balanced, unbalanced, and large-scale datasets with a large number of clusters, making it a scalable and practical solution for modern clustering tasks. Additionally, SMBP is easily implementable within machine learning frameworks like PyTorch and TensorFlow, offering a robust tool for big data applications. The algorithm is validated through extensive experiments, showcasing its potential as a powerful alternative to existing methods such as Maximum Match Measure (MMM) and Centroid Ratio (CR).
翻译:本文提出了一种基于稳定匹配的配对(SMBP)算法,这是一种用于评估具有大量聚类的大规模数据集聚类性能的高性能外部有效性指标。SMBP利用稳定匹配框架在不同聚类方法之间进行聚类配对,与复杂度为$O(N^3)$的传统最大加权匹配(MWM)相比,其计算复杂度显著降低至$O(N^2)$。通过对真实世界数据集和合成数据集的综合评估,SMBP展现出与MWM相当的准确性以及更优的计算效率。该算法特别适用于具有大量聚类的平衡、非平衡及大规模数据集,为现代聚类任务提供了一个可扩展且实用的解决方案。此外,SMBP易于在PyTorch和TensorFlow等机器学习框架中实现,为大数据应用提供了一个强大的工具。通过大量实验验证了该算法的有效性,展示了其作为最大匹配度量(MMM)和质心比率(CR)等现有方法的强大替代方案的潜力。