The main objective of the Multiple Kernel k-Means (MKKM) algorithm is to extract non-linear information and achieve optimal clustering by optimizing base kernel matrices. Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities. Nevertheless, relying solely on a single metric, such as correlation or dissimilarity, to define kernel relationships introduces bias and incomplete characterization. Consequently, this limitation hinders efficient information extraction, ultimately compromising clustering performance. To tackle this challenge, we introduce a novel method that systematically integrates both kernel correlation and dissimilarity. Our approach comprehensively captures kernel relationships, facilitating more efficient classification information extraction and improving clustering performance. By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information and significantly improving clustering precision, supported by theoretical rationale. We assess the performance of our algorithm on 13 challenging benchmark datasets, demonstrating its superiority over contemporary state-of-the-art MKKM techniques.
翻译:摘要:多核k均值聚类算法的主要目标是通过优化基核矩阵提取非线性信息并实现最优聚类。现有方法通过基于相关性或差异性的度量,利用多核间的相互依赖关系增强信息多样性并降低冗余性。然而,仅依赖单一度量(如相关性或差异性)定义核关系会导致偏差与不完整的表征。这种局限性阻碍了高效的信息提取,最终损害聚类性能。为应对这一挑战,我们提出一种系统性整合核相关性与差异性的新方法。该方法全面捕捉核关系,促进更高效的分类信息提取并提升聚类性能。通过强调核相关性与差异性的一致性,我们的方法为提取非线性信息提供了更客观透明的策略,并显著提升了聚类精度,同时具备理论支持。我们在13个具有挑战性的基准数据集上评估了算法性能,结果表明其优于当前最先进的多核k均值聚类技术。