This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in ProbDR, that describes the graph Laplacian (an estimate of the data precision matrix) using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, by showing that variances corresponding to these covariances are low (potentially misspecified), and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied.
翻译:本文证明,UMAP和t-SNE等降维方法可近似重构为ProbDR模型中引入的MAP推断方法。该模型采用Wishart分布描述图拉普拉斯矩阵(数据精度矩阵的估计量),其均值由隐变量上的非线性协方差函数给出。通过揭示这些协方差对应的方差较低(可能存在误设),并证明可用已知核函数描述图拉普拉斯矩阵蕴含的协方差从而建立与高斯过程隐变量模型的联系,该解释为理解此类算法提供了更深入的理论与语义洞见。本文还提出了可用于研究同类降维方法的新工具。