Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.
翻译:降维是可视化探索高维数据并在二维或三维空间中揭示其聚类结构的关键工具之一。文献中的绝大多数降维方法未考虑实践者可能拥有的关于所研究数据集的任何先验知识。我们提出了一种生成信息嵌入的新方法,该方法不仅能剔除与不同类型先验知识相关的结构,同时致力于揭示任何潜在的剩余结构。为实现这一目标,我们采用两个目标的线性组合:首先是对比主成分分析,用于消减与先验信息相关的结构;其次是峰度投影寻踪,确保在所得嵌入中实现有意义的数据分离。我们将此任务表述为流形优化问题,并在考虑三种不同类型先验知识的多种数据集上进行了实证验证。最后,我们提供了一个自动化框架来执行高维数据的迭代可视化探索。