Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts which lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This paper presents Distance Grid (DGrid), a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being one of the fastest techniques, especially for large datasets. A user study with 51 participants also shows that DGrid is consistently ranked among the top techniques for preserving the original scatterplots' visual characteristics and the aesthetics of the final results.
翻译:降维散点图布局已成为分析多维数据集的常用可视化工具。尽管其应用广泛,但此类散点图存在遮挡问题,尤其在采用信息性符号表示数据实例时,可能遮蔽分析过程中关键信息。现有应对策略包括:生成无重叠布局但丧失当代降维技术揭示有趣数据模式的能力,或采用后处理策略消除重叠。后处理方法虽效果显著,但多数最优方法通常需要扩张或扭曲散点图区域,导致符号尺寸(有时)缩减至难以辨认的程度,违背了消除重叠的初衷。本文提出距离网格(DGrid)——一种新型后处理策略,在保持原始布局特征并约束最小符号尺寸的前提下消除降维布局的重叠。通过涵盖多项指标的广泛比较评估,我们证明DGrid在重叠去除性能上超越现有最优方法,同时也是处理大规模数据集时速度最快的技术之一。针对51名参与者的用户研究进一步表明:在保持原始散点图视觉特征与最终结果美感方面,DGrid始终位列最优技术之列。