Spectral Embedding (SE) has often been used to map data points from non-linear manifolds to linear subspaces for the purpose of classification and clustering. Despite significant advantages, the subspace structure of data in the original space is not preserved in the embedding space. To address this issue subspace clustering has been proposed by replacing the SE graph affinity with a self-expression matrix. It works well if the data lies in a union of linear subspaces however, the performance may degrade in real-world applications where data often spans non-linear manifolds. To address this problem we propose a novel structure-aware deep spectral embedding by combining a spectral embedding loss and a structure preservation loss. To this end, a deep neural network architecture is proposed that simultaneously encodes both types of information and aims to generate structure-aware spectral embedding. The subspace structure of the input data is encoded by using attention-based self-expression learning. The proposed algorithm is evaluated on six publicly available real-world datasets. The results demonstrate the excellent clustering performance of the proposed algorithm compared to the existing state-of-the-art methods. The proposed algorithm has also exhibited better generalization to unseen data points and it is scalable to larger datasets without requiring significant computational resources.
翻译:谱嵌入(Spectral Embedding,SE)常被用于将非线性流形上的数据点映射到线性子空间,以实现分类和聚类。尽管其具有显著优势,但原始空间中数据的子空间结构在嵌入空间中未能得到保留。为解决此问题,子空间聚类通过将谱嵌入图亲和矩阵替换为自表达矩阵而被提出。当数据位于线性子空间的并集时该方法效果良好,然而在现实应用中数据往往跨越非线性流形,其性能可能下降。为此,我们提出了一种新颖的结构感知深层谱嵌入方法,通过结合谱嵌入损失与结构保持损失来实现。具体而言,我们设计了一种深度神经网络架构,可同时编码两类信息,旨在生成结构感知的谱嵌入。输入数据的子空间结构通过基于注意力的自表达学习进行编码。所提算法在六个公开的真实世界数据集上进行了评估。结果表明,与现有最先进方法相比,该算法展现了卓越的聚类性能。此外,所提算法对未见数据点具有更好的泛化能力,且可扩展至更大规模数据集而无需显著的计算资源。