Support vector clustering is an important clustering method. However, it suffers from a scalability issue due to its computational expensive cluster assignment step. In this paper we accelertate the support vector clustering via spectrum-preserving data compression. Specifically, we first compress the original data set into a small amount of spectrally representative aggregated data points. Then, we perform standard support vector clustering on the compressed data set. Finally, we map the clustering results of the compressed data set back to discover the clusters in the original data set. Our extensive experimental results on real-world data set demonstrate dramatically speedups over standard support vector clustering without sacrificing clustering quality.
翻译:支持向量聚类是一种重要的聚类方法。然而,由于其计算开销较大的簇分配步骤,该方法面临可扩展性问题。本文通过谱保持数据压缩来加速支持向量聚类。具体而言,我们首先将原始数据集压缩为少量具有谱代表性的聚合数据点,然后对压缩后的数据集执行标准的支持向量聚类,最后将压缩数据集聚类结果映射回原始数据集以发现其中的簇结构。在真实数据集上的大量实验结果表明,该方法在保持聚类质量的同时,显著提升了相对于标准支持向量聚类的计算速度。