Dimensionality reduction (DR) techniques inherently distort the original structure of input high-dimensional data, producing imperfect low-dimensional embeddings. Diverse distortion measures have thus been proposed to evaluate the reliability of DR embeddings. However, implementing and executing distortion measures in practice has so far been time-consuming and tedious. To address this issue, we present ZADU, a Python library that provides distortion measures. ZADU is not only easy to install and execute but also enables comprehensive evaluation of DR embeddings through three key features. First, the library covers a wide range of distortion measures. Second, it automatically optimizes the execution of distortion measures, substantially reducing the running time required to execute multiple measures. Last, the library informs how individual points contribute to the overall distortions, facilitating the detailed analysis of DR embeddings. By simulating a real-world scenario of optimizing DR embeddings, we verify that our optimization scheme substantially reduces the time required to execute distortion measures. Finally, as an application of ZADU, we present another library called ZADUVis that allows users to easily create distortion visualizations that depict the extent to which each region of an embedding suffers from distortions.
翻译:降维(DR)技术本质上会扭曲输入高维数据的原始结构,从而产生不完美的低维嵌入。因此,研究人员提出了多种失真度量来评估DR嵌入的可靠性。然而,在实践中实现和运行这些失真度量一直耗时且繁琐。为解决这一问题,我们提出了ZADU——一个提供失真度量的Python库。ZADU不仅易于安装和运行,还通过三个关键特性实现对DR嵌入的全面评估。首先,该库涵盖了广泛的失真度量。其次,它自动优化失真度量的执行过程,大幅缩短运行多个度量所需的时间。最后,该库能揭示单个数据点对整体失真的贡献程度,便于对DR嵌入进行详细分析。通过模拟真实场景下的DR嵌入优化过程,我们验证了优化方案能显著减少执行失真度量所需的时间。作为ZADU的应用,我们进一步提出ZADUVis库,使用户能够轻松创建用于展示嵌入各区域失真程度的失真可视化。