Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret them. This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective. It describes the trade-offs, and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.
翻译:流形学习(Manifold Learning,简称ML)又称非线性降维,是一类用于发现数据低维结构的方法。大规模高维数据的降维不仅仅是简化数据的手段;通过流形学习获得的新表示与描述符能够揭示高维点云的几何形状,并支持对其进行可视化、去噪与解释。本综述从实践统计学的视角阐述了流形学习的基本原理、代表性方法及其统计基础,描述了不同方法之间的权衡,并探讨了理论如何指导我们选择参数与算法以获得可靠结论。