This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this bottleneck, we assign the initial cluster hypervectors by exploring the similarity of the encoded data, referred to as \textit{query} hypervectors. Intra-cluster hypervectors have a higher similarity than inter-cluster hypervectors. Harnessing the similarity results among query hypervectors, this paper proposes four HDC-based clustering algorithms: similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation. Experimental results illustrate that: (i) Compared to the existing HDCluster, our proposed HDC-based clustering algorithms can achieve better accuracy, more robust performance, fewer iterations, and less execution time. Similarity-based affinity propagation outperforms the other three HDC-based clustering algorithms on eight datasets by 2~38% in clustering accuracy. (ii) Even for one-pass clustering, i.e., without any iterative update of the cluster hypervectors, our proposed algorithms can provide more robust clustering accuracy than HDCluster. (iii) Over eight datasets, five out of eight can achieve higher or comparable accuracy when projected onto the hyperdimensional space. Traditional clustering is more desirable than HDC when the number of clusters, $k$, is large.
翻译:本文研究了超维计算(HDC)领域的聚类问题。已有工作提出了基于HDC的聚类框架HDCluster,但现有HDCluster的性能缺乏鲁棒性。由于初始化阶段随机选取簇超向量,导致HDCluster的性能下降。为克服这一瓶颈,本文通过探索编码数据(称为查询超向量)的相似性来分配初始簇超向量。簇内超向量比簇间超向量具有更高的相似度。利用查询超向量间的相似性结果,本文提出四种基于HDC的聚类算法:基于相似度的k-means、等宽直方图、等频直方图和基于相似度的近邻传播算法。实验结果表明:(i)与现有HDCluster相比,本文提出的HDC聚类算法可实现更高精度、更优鲁棒性、更少迭代次数和更短执行时间。基于相似度的近邻传播算法在八个数据集上的聚类精度比其他三种HDC聚类算法提升2%~38%。(ii)即使采用单次聚类(即不进行簇超向量的迭代更新),所提算法也能提供比HDCluster更鲁棒的聚类精度。(iii)在八个数据集中,有五个数据集在投影到超维空间后能达到更高或相当精度。当聚类数$k$较大时,传统聚类方法比HDC更具优势。