Preliminary Guidelines For Combining Data Integration and Visual Data Analysis

from arxiv, Accepted to IEEE TVCG. 13 pages, 5 figures. For a study breakdown video, see https://youtu.be/NzVxHn-OpqQ . The source code, data and analysis are available at https://github.com/AdamCoscia/Integration-Guidelines-VA

Data integration is often performed to consolidate information from multiple disparate data sources during visual data analysis. However, integration operations are usually separate from visual analytics operations such as encode and filter in both interface design and empirical research. We conducted a preliminary user study to investigate whether and how data integration should be incorporated directly into the visual analytics process. We used two interface alternatives featuring contrasting approaches to the data preparation and analysis workflow: manual file-based ex-situ integration as a separate step from visual analytics operations; and automatic UI-based in-situ integration merged with visual analytics operations. Participants were asked to complete specific and free-form tasks with each interface, browsing for patterns, generating insights, and summarizing relationships between attributes distributed across multiple files. Analyzing participants' interactions and feedback, we found both task completion time and total interactions to be similar across interfaces and tasks, as well as unique integration strategies between interfaces and emergent behaviors related to satisficing and cognitive bias. Participants' time spent and interactions revealed that in-situ integration enabled users to spend more time on analysis tasks compared with ex-situ integration. Participants' integration strategies and analytical behaviors revealed differences in interface usage for generating and tracking hypotheses and insights. With these results, we synthesized preliminary guidelines for designing future visual analytics interfaces that can support integrating attributes throughout an active analysis process.

翻译：数据整合通常用于在可视化数据分析过程中整合来自多个不同数据源的信息。然而，在界面设计和实证研究中，整合操作通常与编码、过滤等可视化分析操作分离。我们进行了一项初步用户研究，探讨是否以及如何将数据整合直接纳入可视化分析流程。我们采用了两种界面方案，分别对应数据准备与分析工作流的不同方法：基于手动文件的外部整合（作为与可视化分析操作分离的独立步骤）；以及基于自动界面的内部整合（与可视化分析操作融合）。参与者被要求使用每种界面完成特定任务和开放式任务，包括浏览模式、生成洞察以及总结跨多个文件的属性间关系。通过分析参与者的交互行为与反馈，我们发现不同界面和任务的任务完成时间与总交互次数相近，同时界面间存在独特的整合策略以及与满足行为和认知偏差相关的涌现行为。参与者的时间分配与交互行为显示，与外部整合相比，内部整合使用户能够将更多时间用于分析任务。参与者的整合策略与分析行为揭示了在生成和追踪假设与洞察时界面使用的差异。基于这些结果，我们综合提出了初步指南，用于设计支持在活跃分析过程中整合属性的未来可视化分析界面。