Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.
翻译:降维(DR)是实现高维数据可视化探索、揭示其二维或三维空间中聚类结构的关键工具之一。现有文献中的绝大多数降维方法均未考虑研究人员对数据集的先验知识。本文提出一种新颖方法生成信息性嵌入,该方法不仅能够分离与不同类型先验知识相关的结构,还能揭示剩余潜在结构。为此,我们采用两种目标的线性组合:其一,对比主成分分析可消减先验信息相关结构;其二,峰度投影追踪可确保所得嵌入中数据的有效分离。我们将该任务建模为流形优化问题,并在三种不同先验知识类型下,通过多种数据集进行实证验证。最后,我们构建了一套自动框架,用于实现高维数据的迭代式可视化探索。