This paper proposes a novel framework for accelerating support vector clustering. The proposed method first computes much smaller compressed data sets while preserving the key cluster properties of the original data sets based on a novel spectral data compression approach. Then, the resultant spectrally-compressed data sets are leveraged for the development of fast and high quality algorithm for support vector clustering. We conducted extensive experiments using real-world data sets and obtained very promising results. The proposed method allows us to achieve 100X and 115X speedups over the state of the art SVC method on the Pendigits and USPS data sets, respectively, while achieving even better clustering quality. To the best of our knowledge, this represents the first practical method for high-quality and fast SVC on large-scale real-world data sets
翻译:本文提出了一种加速支持向量聚类的新框架。该方法首先基于一种新颖的谱数据压缩方法,在保留原始数据集关键聚类属性的同时,生成规模显著缩小的压缩数据集。随后,利用这些谱压缩后的数据集开发出快速且高质量的支持向量聚类算法。我们在真实数据集上进行了大量实验,获得了极具前景的结果。与现有最先进的SVC方法相比,所提方法在Pendigits和USPS数据集上分别实现了100倍和115倍的加速比,同时聚类质量甚至更优。据我们所知,这是首个能够在大规模真实数据集上实现高质量快速SVC的实用方法。