Data visualization of aggregation queries is one of the most common ways of doing data exploration and data science as it can help identify correlations and patterns in the data. We propose DIVAN, a system that automatically normalizes the one-dimensional axes by frequency to generate large numbers of two-dimensional visualizations. DIVAN normalizes the input data via binning to allocate more pixels to data values that appear more frequently in the dataset. DIVAN can utilize either CPUs or Processing-in-Memory (PIM) architectures to quickly calculate aggregates to support the visualizations. On real world datasets, we show that DIVAN generates visualizations that highlight patterns and correlations, some expected and some unexpected. By using PIM, we can calculate aggregates 45%-64% faster than modern CPUs on large datasets. For use cases with 100 million rows and 32 columns, our system is able to compute 4,960 aggregates (each of size 128x128x128) in about a minute.
翻译:聚合查询的数据可视化是数据探索与数据科学中最常用的方法之一,因为它有助于识别数据中的相关性与模式。我们提出DIVAN系统,该系统通过频率自动归一化一维坐标轴以生成大量二维可视化结果。DIVAN通过分箱归一化输入数据,为数据集中出现频率更高的数值分配更多像素。DIVAN可利用CPU或存内计算(PIM)架构快速计算聚合结果以支持可视化。在真实数据集上,我们证明DIVAN生成的可视化结果能有效突显模式与相关性,其中部分符合预期而部分超出预期。通过采用PIM技术,我们在大规模数据集上的聚合计算速度比现代CPU提升45%-64%。对于包含1亿行32列的数据用例,我们的系统能够在大约一分钟内完成4,960个聚合计算(每个聚合尺寸为128x128x128)。