Two-dimensional embeddings obtained from dimensionality reduction techniques, such as MDS, t-SNE, and UMAP, are widely used across various disciplines to visualize high-dimensional data. These visualizations provide a valuable tool for exploratory data analysis, allowing researchers to visually identify clusters, outliers, and other interesting patterns in the data. However, interpreting the resulting visualizations can be challenging, as it often requires additional manual inspection to understand the differences between data points in different regions of the embedding space. To address this issue, we propose Visual Explanations via Region Annotation (VERA), an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding. VERA produces informative explanations that characterize distinct regions in the embedding space, allowing users to gain an overview of the embedding landscape at a glance. Unlike most existing approaches, which typically require some degree of manual user intervention, VERA produces static explanations, automatically identifying and selecting the most informative visual explanations to show to the user. We illustrate the usage of VERA on a real-world data set and validate the utility of our approach with a comparative user study. Our results demonstrate that the explanations generated by VERA are as useful as fully-fledged interactive tools on typical exploratory data analysis tasks but require significantly less time and effort from the user.
翻译:通过降维技术(如MDS、t-SNE和UMAP)获得的二维嵌入被广泛应用于各学科领域,以实现高维数据的可视化。这些可视化结果为探索性数据分析提供了重要工具,使研究人员能够直观识别数据中的聚类、异常值及其他有意义的结构模式。然而,对可视化结果的解读往往具有挑战性,通常需要额外的人工检查才能理解嵌入空间不同区域数据点之间的差异。为解决这一问题,我们提出了基于区域标注的可视化解释方法(VERA),这是一种能够为任意二维嵌入自动生成可视化解释的嵌入标注方法。VERA通过生成信息丰富的解释来描述嵌入空间中的不同区域,使用户能够快速把握嵌入空间的整体分布特征。与多数需要用户手动干预的现有方法不同,VERA能够自动识别并选择最具信息量的可视化解释呈现给用户,生成静态解释结果。我们在真实数据集上展示了VERA的应用效果,并通过对比用户研究验证了该方法的实用性。结果表明,在典型的探索性数据分析任务中,VERA生成的解释与成熟交互工具具有同等效用,同时显著减少了用户所需投入的时间与精力。