Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guidance, yet choosing and evaluating those groupings should not itself be unsupervised. We present \emph{SmartIterator}~(SI), a visual analytics approach that treats the full sequence of grouping results across a parameter sweep as a first-class analytical object. For each method family, SI provides a structured six-phase workflow that guides the analyst through systematic exploration of grouping results -- from quality-metric overview through transition-stability assessment, membership-confidence evaluation, content and context inspection, and recurrent-archetype verification to an informed decision -- building cumulative understanding of data structure along the way. The workflows are operationalized through \emph{IteraScope}~(IS), a coordinated visual display combining quality-metric charts with semantic color encoding, a 1D group embedding with Sankey-style transition flows and violin plots of membership confidence, a 2D group embedding with HDBSCAN-detected recurrent archetypes that highlights iterations capturing all persistent patterns, and domain-specific linked views for contextualized interpretation. We demonstrate the three workflows on: (1)~simulated social-media messages from the VAST Challenge 2011 (density-based clustering, validated against ground truth), (2)~EU population statistics across ${\sim}1\,500$ NUTS-3 regions (partition-based clustering), and (3)~30 years of IEEE VIS papers (NMF topic modeling). The workflows constitute the main contribution: they provide actionable, method-specific guidance for navigating parameter spaces, studying how data structure evolves across configurations, and grounding analytical understanding in domain context -- yielding knowledge about the data that no single ``best'' result can provide.
翻译:摘要:无监督学习方法(主题建模、基于划分和基于密度的聚类)无需人工指导即可产生数据分组,但选择与评估这些分组本身不应是无监督的。我们提出 SmartIterator (SI)——一种可视化分析方法,将参数扫描过程中分组结果的完整序列视为一类分析对象。针对每种方法族,SI 提供结构化的六阶段工作流,引导分析人员系统性地探索分组结果:从质量指标概览、过渡稳定性评估、成员置信度评价、内容与上下文检查、递归原型验证,直至形成知情决策,在此过程中逐步积累对数据结构的理解。这些工作流通过 IteraScope (IS) 实现——一个协调的可视化视图组合,结合了质量指标图表与语义颜色编码、带有桑基式过渡流与成员置信度小提琴图的一维分组嵌入、使用 HDBSCAN 检测递归原型并突出捕获所有持久模式的迭代的二维分组嵌入,以及用于情境化解释的领域特定关联视图。我们在以下案例中演示了三个工作流:(1) VAST 挑战赛 2011 的模拟社交媒体消息(基于密度的聚类,对照真实标签验证);(2) 约 1500 个 NUTS-3 区域的欧盟人口统计数据(基于划分的聚类);(3) 30 年 IEEE VIS 论文(NMF 主题建模)。这些工作流构成主要贡献:它们提供可操作、针对特定方法的指导,用于导航参数空间、研究数据结构如何随配置演化,并将分析理解植根于领域背景中——从而产生关于数据的知识,这是任何单一“最佳”结果无法提供的。