SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guidance, yet choosing and evaluating those groupings should not itself be unsupervised. We present \emph{SmartIterator}~(SI), a visual analytics approach that treats the full sequence of grouping results across a parameter sweep as a first-class analytical object. For each method family, SI provides a structured six-phase workflow that guides the analyst through systematic exploration of grouping results -- from quality-metric overview through transition-stability assessment, membership-confidence evaluation, content and context inspection, and recurrent-archetype verification to an informed decision -- building cumulative understanding of data structure along the way. The workflows are operationalized through \emph{IteraScope}~(IS), a coordinated visual display combining quality-metric charts with semantic color encoding, a 1D group embedding with Sankey-style transition flows and violin plots of membership confidence, a 2D group embedding with HDBSCAN-detected recurrent archetypes that highlights iterations capturing all persistent patterns, and domain-specific linked views for contextualized interpretation. We demonstrate the three workflows on: (1)~simulated social-media messages from the VAST Challenge 2011 (density-based clustering, validated against ground truth), (2)~EU population statistics across ${\sim}1\,500$ NUTS-3 regions (partition-based clustering), and (3)~30 years of IEEE VIS papers (NMF topic modeling). The workflows constitute the main contribution: they provide actionable, method-specific guidance for navigating parameter spaces, studying how data structure evolves across configurations, and grounding analytical understanding in domain context -- yielding knowledge about the data that no single ``best'' result can provide.

翻译：摘要：无监督学习方法（主题建模、基于划分和基于密度的聚类）无需人工指导即可产生数据分组，但选择与评估这些分组本身不应是无监督的。我们提出 SmartIterator (SI)——一种可视化分析方法，将参数扫描过程中分组结果的完整序列视为一类分析对象。针对每种方法族，SI 提供结构化的六阶段工作流，引导分析人员系统性地探索分组结果：从质量指标概览、过渡稳定性评估、成员置信度评价、内容与上下文检查、递归原型验证，直至形成知情决策，在此过程中逐步积累对数据结构的理解。这些工作流通过 IteraScope (IS) 实现——一个协调的可视化视图组合，结合了质量指标图表与语义颜色编码、带有桑基式过渡流与成员置信度小提琴图的一维分组嵌入、使用 HDBSCAN 检测递归原型并突出捕获所有持久模式的迭代的二维分组嵌入，以及用于情境化解释的领域特定关联视图。我们在以下案例中演示了三个工作流：(1) VAST 挑战赛 2011 的模拟社交媒体消息（基于密度的聚类，对照真实标签验证）；(2) 约 1500 个 NUTS-3 区域的欧盟人口统计数据（基于划分的聚类）；(3) 30 年 IEEE VIS 论文（NMF 主题建模）。这些工作流构成主要贡献：它们提供可操作、针对特定方法的指导，用于导航参数空间、研究数据结构如何随配置演化，并将分析理解植根于领域背景中——从而产生关于数据的知识，这是任何单一“最佳”结果无法提供的。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

专知会员服务

14+阅读 · 2025年11月30日

【CVPR2024】GroupContrast：语义感知的自监督表示学习用于三维理解

专知会员服务

18+阅读 · 2024年3月15日