With the membership function being strictly positive, the conventional fuzzy c-means clustering method sometimes causes imbalanced influence when clusters of vastly different sizes exist. That is, an outstandingly large cluster drags to its center all the other clusters, however far they are separated. To solve this problem, we propose a hybrid fuzzy-crisp clustering algorithm based on a target function combining linear and quadratic terms of the membership function. In this algorithm, the membership of a data point to a cluster is automatically set to exactly zero if the data point is ``sufficiently'' far from the cluster center. In this paper, we present a new algorithm for hybrid fuzzy-crisp clustering along with its geometric interpretation. The algorithm is tested on twenty simulated data generated and five real-world datasets from the UCI repository and compared with conventional fuzzy and crisp clustering methods. The proposed algorithm is demonstrated to outperform the conventional methods on imbalanced datasets and can be competitive on more balanced datasets.
翻译:针对传统模糊C均值聚类方法中隶属度函数严格为正所导致的问题——当存在规模差异显著的簇时,异常庞大的簇会将其他所有簇(无论距离多远)拉向其中心——本文基于结合隶属度函数线性项与二次项的目标函数,提出一种混合模糊-清晰聚类算法。该算法中,当数据点与簇中心"足够"远时,其对该簇的隶属度会被自动精确归零。本文在给出混合模糊-清晰聚类新算法的同时,阐释了其几何解释。通过在二十组模拟数据集与五组UCI真实数据集上的实验,与经典模糊及清晰聚类方法进行对比,证明所提算法在不平衡数据集上性能优于传统方法,且在较平衡数据集中同样具备竞争力。