Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.
翻译:降维是可视化探索高维数据并在二维或三维空间中揭示其聚类结构的关键工具之一。文献中的绝大多数降维方法未考虑实践者可能拥有的关于所研究数据集的任何先验知识。我们提出了一种生成信息嵌入的新方法,该方法不仅能分解与不同类型先验知识相关的结构,还致力于揭示任何潜在的剩余结构。为实现这一目标,我们采用两个目标的线性组合:首先是消除先验信息相关结构的对比主成分分析,其次是确保所获嵌入中数据有效分离的峰度投影寻踪。我们将此任务表述为流形优化问题,并在考虑三种不同类型先验知识的多种数据集上进行了实证验证。最后,我们提供了一个自动化框架来执行高维数据的迭代可视化探索。