Exploratory Data Analysis (EDA) is a routine task for data analysts, often conducted using flexible computational notebooks. During EDA, data workers process, visualize, and interpret data tables, making decisions about subsequent analysis. However, the cell-by-cell programming approach, while flexible, can lead to disorganized code, making it difficult to trace the state of data tables across cells and increasing the cognitive load on data workers. This paper introduces NoteFlow, a notebook library that recommends charts as ``sight glasses'' for data tables, allowing users to monitor their dynamic updates throughout the EDA process. To ensure visual consistency and effectiveness, NoteFlow adapts chart encodings in response to data transformations, maintaining a coherent and insightful representation of the data. The proposed method was evaluated through user studies, demonstrating its ability to provide an overview of the EDA process and convey critical insights in the data tables.
翻译:探索性数据分析(EDA)是数据分析师的一项常规任务,通常使用灵活的计算笔记本进行。在EDA过程中,数据工作者处理、可视化并解释数据表,从而决定后续分析步骤。然而,逐单元格的编程方式虽然灵活,却可能导致代码组织混乱,使得跨单元格追踪数据表状态变得困难,并增加了数据工作者的认知负荷。本文介绍了NoteFlow,一个将图表作为数据表“观察窗”进行推荐的笔记本库,使用户能够在整个EDA过程中监控数据的动态更新。为确保视觉一致性与有效性,NoteFlow会根据数据转换调整图表编码,从而保持对数据连贯且富有洞察力的呈现。通过用户研究对所提方法进行了评估,结果表明其能够提供EDA过程的概览并传达数据表中的关键洞察。