Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data. DR is a critical step in many analysis pipelines as it enables visualisation, noise reduction and efficient downstream processing of the data. In this work, we introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework. ProbDR encompasses PCA, CMDS, LLE, LE, MVU, diffusion maps, kPCA, Isomap, (t-)SNE, and UMAP. In our framework, a low-dimensional latent variable is used to construct a covariance, precision, or a graph Laplacian matrix, which can be used as part of a generative model for the data. Inference is done by optimizing an evidence lower bound. We demonstrate the internal consistency of our framework and show that it enables the use of probabilistic programming languages (PPLs) for DR. Additionally, we illustrate that the framework facilitates reasoning about unseen data and argue that our generative models approximate Gaussian processes (GPs) on manifolds. By providing a unified view of DR, our framework facilitates communication, reasoning about uncertainties, model composition, and extensions, particularly when domain knowledge is present.
翻译:降维算法将高维数据压缩至低维表示,同时保留数据的关键特征。降维是许多分析流程中的关键步骤,能够实现数据可视化、降噪及高效下游处理。本文提出ProbDR变分框架,将多种经典降维算法重新诠释为该框架下的概率推断算法。ProbDR涵盖PCA、CMDS、LLE、LE、MVU、扩散映射、kPCA、Isomap、(t-)SNE及UMAP。在该框架中,低维潜变量用于构建协方差矩阵、精度矩阵或图拉普拉斯矩阵,这些矩阵可作为数据生成模型的一部分。推断过程通过优化证据下界实现。我们验证了该框架的内在一致性,并证明其支持使用概率编程语言进行降维。此外,我们展示了该框架能促进对未见数据的推理,并论证了所提生成模型近似流形上的高斯过程。通过提供降维的统一视角,本框架有助于沟通交流、不确定性推理、模型组合及扩展,尤其在包含领域知识时更具优势。