Data Point Selection for Line Chart Visualization: Methodological Assessment and Evidence-Based Guidelines

Time series visualization plays a crucial role in identifying patterns and extracting insights across various domains. However, as datasets continue to grow in size, visualizing them effectively becomes challenging. Downsampling, which involves data aggregation or selection, is a well-established approach to overcome this challenge. This work focuses on data selection algorithms, which accomplish downsampling by selecting values from the original time series. Despite their widespread adoption in visualization platforms and time series databases, there is limited literature on the evaluation of these techniques. To address this, we propose an extensive metrics-based evaluation methodology. Our methodology analyzes visual representativeness by assessing how well a downsampled time series line chart visually approximates the original data. Moreover, our methodology includes a novel concept called "visual stability", which captures visual changes when updating (streaming) or interacting with the visualization (panning and zooming). We evaluated four data point selection algorithms across three open-source visualization toolkits using our proposed methodology, considering various figure-drawing properties. Following the analysis of our findings, we formulated a set of evidence-based guidelines for line chart visualization at scale with downsampling. To promote reproducibility and enable the qualitative evaluation of new advancements in time series data point selection, we have made our methodology and results openly accessible. The proposed evaluation methodology, along with the obtained insights from this study, establishes a foundation for future research in this domain.

翻译：时间序列可视化在跨领域识别模式与提取洞察中发挥着关键作用。然而，随着数据集规模持续增长，高效可视化面临挑战。数据降采样（通过数据聚合或选取实现）是应对这一挑战的成熟方法。本研究聚焦于数据选取算法——通过从原始时间序列中择取数值实现降采样。尽管此类算法在可视化平台和时间序列数据库中广泛应用，但对其评估的系统性文献仍显不足。为此，我们提出了一种基于多维指标的评估方法论。该方法通过衡量降采样后的时间序列折线图在视觉上对原始数据的近似程度，分析其视觉表征性。此外，本方法论创新性地引入"视觉稳定性"概念，用以捕捉数据更新（流式处理）或可视化交互（平移与缩放）时的视觉变化。我们运用所提出的方法论，结合多种图形绘制属性，对三个开源可视化工具包中的四种数据点选取算法进行了评估。基于实验结果分析，我们总结出一套适用于大规模降采样折线图可视化的循证指南。为促进研究可复现性及时间序列数据点选取新进展的定性评估，我们的方法论与结果已完全开放共享。本研究提出的评估方法论及所得见解，为该领域的后续研究奠定了基础。