t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
翻译:t-分布随机邻域嵌入(t-SNE)作为一种多维数据可视化方法,已在广泛领域得到成功应用,成为广受欢迎的技术。尽管t-SNE投影具有实用性,但其结果可能难以解读甚至产生误导,这有损于结果的可靠性。理解t-SNE本身的细节以及其输出中特定模式背后的原因,对于非降维专家而言往往是一项艰巨任务。本研究提出t-viSNE——一款用于t-SNE投影可视化探索的交互式工具,使分析人员能够检查其准确性和含义的多个方面,例如超参数的影响、距离与邻域保持度、特定邻域的密度与代价、以及维度与视觉模式之间的相关性。我们提出了一套连贯、易用且高度集成的多视图方案,用于t-SNE投影的可视化。通过基于真实数据集的假设使用场景,展示了t-viSNE的适用性和易用性。最后,我们呈现了评估该工具有效性的用户研究结果。通过揭示通常会在运行t-SNE后丢失的信息,我们希望支持分析人员更好地使用t-SNE,并使其结果更易于理解。