Motivated by the growing interest in representation learning approaches that uncover the latent structure of high-dimensional data, this work proposes new algorithms for reconstruction-based manifold learning within Reproducing-Kernel Hilbert Spaces (RKHS). Each observation is first reconstructed as a linear combination of the other samples in the RKHS, by optimizing a vector form of the Representer Theorem for their autorepresentation property. A separable operator-valued kernel extends the formulation to vector-valued data while retaining the simplicity of a single scalar similarity function. A subsequent kernel-alignment task projects the data into a lower-dimensional latent space whose Gram matrix aims to match the high-dimensional reconstruction kernel, thus transferring the auto-reconstruction geometry of the RKHS to the embedding. Therefore, the proposed algorithms represent an extended approach to the autorepresentation property, exhibited by many natural data, by using and adapting well-known results of Kernel Learning Theory. Numerical experiments on both simulated (concentric circles and swiss-roll) and real (cancer molecular activity and IoT network intrusions) datasets provide empirical evidence of the practical effectiveness of the proposed approach.
翻译:受对揭示高维数据潜在结构的表示学习方法日益增长的兴趣所启发,本研究提出了在再生核希尔伯特空间内基于重构的流形学习新算法。每个观测值首先通过优化其自表示性质的表示定理向量形式,在RKHS中被重构为其他样本的线性组合。可分离的算子值核将公式推广到向量值数据,同时保留了单一标量相似度函数的简洁性。随后的核对齐任务将数据投影到一个低维潜在空间,其Gram矩阵旨在匹配高维重构核,从而将RKHS的自重构几何传递到嵌入中。因此,所提出的算法通过运用和调整核学习理论的著名结果,为许多自然数据表现出的自表示性质提供了一种扩展方法。在模拟数据集(同心圆和瑞士卷)和真实数据集(癌症分子活性和物联网网络入侵)上的数值实验为所提方法的实际有效性提供了实证证据。