This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in ProbDR, that describes the graph Laplacian (an estimate for the precision/inverse covariance) matrix using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, by showing that variances corresponding to these covariances are low (and misspecified), and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied, and pose two areas of research arising from these interpretations.
翻译:本文表明,UMAP与t-SNE等降维方法可近似重构为ProbDR模型中引入的MAP推断方法。该模型使用Wishart分布描述图拉普拉斯矩阵(作为精度矩阵/逆协方差矩阵的估计量),其均值由隐变量上的非线性协方差函数给出。此解释通过揭示这些协方差对应的方差较低(且存在误设),并证明可用经典核函数描述图拉普拉斯矩阵隐含的协方差结构,从而建立与高斯过程隐变量模型的联系,为此类算法提供了更深入的理论与语义洞见。我们同时提出了可用于研究同类降维方法的分析工具,并基于这些解释提出了两个值得探索的研究方向。