Dimensionality reduction (DR) techniques inherently distort the original structure of input high-dimensional data, producing imperfect low-dimensional embeddings. Diverse distortion measures have thus been proposed to evaluate the reliability of DR embeddings. However, implementing and executing distortion measures in practice has so far been time-consuming and tedious. To address this issue, we present ZADU, a Python library that provides distortion measures. ZADU is not only easy to install and execute but also enables comprehensive evaluation of DR embeddings through three key features. First, the library covers a wide range of distortion measures. Second, it automatically optimizes the execution of distortion measures, substantially reducing the running time required to execute multiple measures. Last, the library informs how individual points contribute to the overall distortions, facilitating the detailed analysis of DR embeddings. By simulating a real-world scenario of optimizing DR embeddings, we verify that our optimization scheme substantially reduces the time required to execute distortion measures. Finally, as an application of ZADU, we present another library called ZADUVis that allows users to easily create distortion visualizations that depict the extent to which each region of an embedding suffers from distortions.
翻译:降维(DR)技术本质上会扭曲原始高维数据的结构,产生不完美的低维嵌入。为评估DR嵌入的可靠性,已提出多种失真度量方法。然而,在实践中实施并运行这些失真度量一直耗时且繁琐。为解决此问题,我们提出ZADU——一个提供失真度量的Python库。ZADU不仅易于安装和执行,还通过三个关键特性实现对DR嵌入的综合评估。首先,该库涵盖广泛的失真度量;其次,它能自动优化失真度量的执行,大幅降低运行多个度量所需的时间;最后,该库可告知单个数据点对整体失真的贡献,便于对DR嵌入进行详细分析。通过模拟优化DR嵌入的真实场景,我们验证了所提优化方案能显著缩短执行失真度量所需的时间。作为ZADU的应用,我们进一步提出ZADUVis库,允许用户轻松创建失真可视化,直观展示嵌入中各区域遭受失真的程度。