A novel method, named Curvature-Augmented Manifold Embedding and Learning (CAMEL), is proposed for high dimensional data classification, dimension reduction, and visualization. CAMEL utilizes a topology metric defined on the Riemannian manifold, and a unique Riemannian metric for both distance and curvature to enhance its expressibility. The method also employs a smooth partition of unity operator on the Riemannian manifold to convert localized orthogonal projection to global embedding, which captures both the overall topological structure and local similarity simultaneously. The local orthogonal vectors provide a physical interpretation of the significant characteristics of clusters. Therefore, CAMEL not only provides a low-dimensional embedding but also interprets the physics behind this embedding. CAMEL has been evaluated on various benchmark datasets and has shown to outperform state-of-the-art methods, especially for high-dimensional datasets. The method's distinct benefits are its high expressibility, interpretability, and scalability. The paper provides a detailed discussion on Riemannian distance and curvature metrics, physical interpretability, hyperparameter effect, manifold stability, and computational efficiency for a holistic understanding of CAMEL. Finally, the paper presents the limitations and future work of CAMEL along with key conclusions.
翻译:本文提出了一种名为曲率增强流形嵌入与学习(CAMEL)的新方法,用于高维数据分类、降维及可视化。CAMEL利用定义在黎曼流形上的拓扑度量,并采用同时考虑距离与曲率的独特黎曼度量以增强其表征能力。该方法还在黎曼流形上引入光滑单位分割算子,将局部正交投影转换为全局嵌入,从而同时捕捉整体拓扑结构与局部相似性。局部正交向量为聚类的重要特征提供了物理解释。因此,CAMEL不仅提供低维嵌入,还阐释了该嵌入背后的物理机理。CAMEL在多个基准数据集上进行了评估,尤其是在高维数据集上,其性能优于现有最先进方法。该方法的主要优势在于高表征能力、可解释性和可扩展性。本文从黎曼距离与曲率度量、物理可解释性、超参数效应、流形稳定性及计算效率等方面展开详细讨论,以全面理解CAMEL。最后,本文总结了CAMEL的局限性、未来工作方向及关键结论。