Diffusion Map is a spectral dimensionality reduction technique which is able to uncover nonlinear submanifolds in high-dimensional data. And, it is increasingly applied across a wide range of scientific disciplines, such as biology, engineering, and social sciences. But data preprocessing, parameter settings and component selection have a significant influence on the resulting manifold, something which has not been comprehensively discussed in the literature so far. We provide a practice oriented review of the Diffusion Map technique, illustrate pitfalls and showcase a recently introduced technique for identifying the most relevant components. Our results show that the first components are not necessarily the most relevant ones.
翻译:扩散映射是一种谱降维技术,能够揭示高维数据中的非线性子流形结构。该技术正日益广泛应用于生物学、工程学和社会科学等多个学科领域。然而,数据预处理、参数设置与成分选择对最终流形结构的构建具有显著影响,这一关键问题在现有文献中尚未得到系统讨论。本文从实践角度对扩散映射技术进行综述,阐明常见误区,并展示一种新近提出的关键成分识别方法。研究结果表明,前几个主成分未必是最具相关性的成分。