Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data. DR is a critical step in many analysis pipelines as it enables visualisation, noise reduction and efficient downstream processing of the data. In this work, we introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework. ProbDR encompasses PCA, CMDS, LLE, LE, MVU, diffusion maps, kPCA, Isomap, (t-)SNE, and UMAP. In our framework, a low-dimensional latent variable is used to construct a covariance, precision, or a graph Laplacian matrix, which can be used as part of a generative model for the data. Inference is done by optimizing an evidence lower bound. We demonstrate the internal consistency of our framework and show that it enables the use of probabilistic programming languages (PPLs) for DR. Additionally, we illustrate that the framework facilitates reasoning about unseen data and argue that our generative models approximate Gaussian processes (GPs) on manifolds. By providing a unified view of DR, our framework facilitates communication, reasoning about uncertainties, model composition, and extensions, particularly when domain knowledge is present.
翻译:降维(DR)算法将高维数据压缩为低维表示,同时保留数据的重要特征。作为许多分析流程中的关键步骤,DR能够实现数据可视化、降噪及高效的后处理。本研究提出ProbDR变分框架,在该框架中,多种经典降维算法被解释为概率推断算法。ProbDR涵盖PCA、CMDS、LLE、LE、MVU、扩散映射、kPCA、Isomap、(t-)SNE及UMAP。该框架通过低维潜变量构建协方差、精度矩阵或图拉普拉斯矩阵,并将其作为数据生成模型的组成部分。推断过程通过优化证据下界(ELBO)实现。我们验证了该框架的内在一致性,并表明其支持使用概率编程语言(PPL)进行降维。此外,我们论证了该框架能够促进对未见数据的推理,并指出我们的生成模型近似流形上的高斯过程(GP)。通过提供降维的统一视角,该框架有助于领域知识存在时的沟通、不确定性推理、模型组合与扩展。