tSNE and UMAP are popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. Despite their popularity, however, little work has been done to study their full span of differences. We theoretically and experimentally evaluate the space of parameters in both tSNE and UMAP and observe that a single one -- the normalization -- is responsible for switching between them. This, in turn, implies that a majority of the algorithmic differences can be toggled without affecting the embeddings. We discuss the implications this has on several theoretic claims behind UMAP, as well as how to reconcile them with existing tSNE interpretations. Based on our analysis, we provide a method (\ourmethod) that combines previously incompatible techniques from tSNE and UMAP and can replicate the results of either algorithm. This allows our method to incorporate further improvements, such as an acceleration that obtains either method's outputs faster than UMAP. We release improved versions of tSNE, UMAP, and \ourmethod that are fully plug-and-play with the traditional libraries at https://github.com/Andrew-Draganov/GiDR-DUN
翻译:tSNE和UMAP因其计算速度快和可解释的低维嵌入而成为流行的降维算法。然而,尽管它们广受欢迎,但关于两者完整差异范围的研究仍十分有限。我们通过理论分析与实验评估,系统考察了tSNE和UMAP中全部参数空间,发现二者之间的切换仅由单一参数——归一化方式——决定。这一发现意味着,大多数算法层面的差异可以在不影响嵌入结果的前提下进行切换。我们探讨了这一发现对UMAP若干理论主张的影响,以及如何将其与现有的tSNE解释相协调。基于分析,我们提出了一种方法(\ourmethod),该方法融合了tSNE和UMAP中以往互不兼容的技术,并可复现任意一种算法的结果。这使得我们的方法能够集成进一步改进,例如采用加速策略以获得比UMAP更快的两种方法的输出结果。我们在https://github.com/Andrew-Draganov/GiDR-DUN上发布了tSNE、UMAP及\ourmethod的改进版本,这些版本可完全即插即用地兼容传统库。