Conventional clustering methods based on pairwise affinity usually suffer from the concentration effect while processing huge dimensional features yet low sample sizes data, resulting in inaccuracy to encode the sample proximity and suboptimal performance in clustering. To address this issue, we propose a unified tensor clustering method (UTC) that characterizes sample proximity using multiple samples' affinity, thereby supplementing rich spatial sample distributions to boost clustering. Specifically, we find that the triadic tensor affinity can be constructed via the Khari-Rao product of two affinity matrices. Furthermore, our early work shows that the fourth-order tensor affinity is defined by the Kronecker product. Therefore, we utilize arithmetical products, Khatri-Rao and Kronecker products, to mathematically integrate different orders of affinity into a unified tensor clustering framework. Thus, the UTC jointly learns a joint low-dimensional embedding to combine various orders. Finally, a numerical scheme is designed to solve the problem. Experiments on synthetic datasets and real-world datasets demonstrate that 1) the usage of high-order tensor affinity could provide a supplementary characterization of sample proximity to the popular affinity matrix; 2) the proposed method of UTC is affirmed to enhance clustering by exploiting different order affinities when processing high-dimensional data.
翻译:基于成对亲和性的传统聚类方法在处理高维特征但样本量较小的数据时通常会受到浓度效应的影响,导致样本邻近性编码不准确且聚类性能次优。为解决此问题,我们提出一种统一张量聚类方法(UTC),通过利用多个样本的亲和性刻画样本邻近性,从而补充丰富的空间样本分布以提升聚类效果。具体而言,我们发现三阶张量亲和性可通过两个亲和性矩阵的Khatri-Rao积构建。此外,我们早期研究表明四阶张量亲和性由Kronecker积定义。因此,我们利用算术积、Khatri-Rao积和Kronecker积,将不同阶数的亲和性数学集成到统一张量聚类框架中。UTC通过联合学习一个低维嵌入来融合各阶信息。最后,设计数值求解方案解决该问题。在合成数据集与真实世界数据集上的实验表明:1)高阶张量亲和性可为流行的亲和性矩阵提供样本邻近性的补充刻画;2)所提出的UTC方法在处理高维数据时,通过利用不同阶亲和性确实能增强聚类效果。