Although numerous clustering algorithms have been developed, many existing methods still leverage k-means technique to detect clusters of data points. However, the performance of k-means heavily depends on the estimation of centers of clusters, which is very difficult to achieve an optimal solution. Another major drawback is that it is sensitive to noise and outlier data. In this paper, from manifold learning perspective, we rethink k-means and present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter such that distance between any two data points in the same clusters equals to a small constant, while increasing the distance between other data pairs from different clusters. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization on the 3rd-order tensor which consists of indicator matrices of different views. Finally, an efficient alternating algorithm is derived to optimize our model. The constructed sequence was proved to converge to the stationary KKT point. Extensive experimental results indicate the superiority of our proposed method.
翻译:尽管已有众多聚类算法被提出,但许多现有方法仍依赖k-means技术检测数据簇。然而,k-means的性能高度依赖于聚类中心的估计,而这一过程极难获得最优解。另一个主要缺陷是其对噪声和异常值敏感。本文从流形学习视角重新审视k-means算法,提出一种无需均值估计即可直接检测数据簇的新型聚类算法。具体而言,我们通过巴特沃斯滤波器构建数据点间的距离矩阵,使得同一簇内任意两点间距离等于一个较小常数,同时增大不同簇数据点之间的距离。为了充分利用多视角数据中嵌入的互补信息,我们在由不同视角指示矩阵构成的三阶张量上引入张量Schatten p-范数正则化。最终,我们推导出一种高效交替算法对模型进行优化,并证明所构造序列可收敛至稳定KKT点。大量实验结果表明了所提方法的优越性。