In the era of burgeoning data generation, managing and storing large-scale time-varying datasets poses significant challenges. With the rise of supercomputing capabilities, the volume of data produced has soared, intensifying storage and I/O overheads. To address this issue, we propose a dynamic spatio-temporal data summarization technique that identifies informative features in key timesteps and fuses less informative ones. This approach minimizes storage requirements while preserving data dynamics. Unlike existing methods, our method retains both raw and summarized timesteps, ensuring a comprehensive view of information changes over time. We utilize information-theoretic measures to guide the fusion process, resulting in a visual representation that captures essential data patterns. We demonstrate the versatility of our technique across diverse datasets, encompassing particle-based flow simulations, security and surveillance applications, and biological cell interactions within the immune system. Our research significantly contributes to the realm of data management, introducing enhanced efficiency and deeper insights across diverse multidisciplinary domains. We provide a streamlined approach for handling massive datasets that can be applied to in situ analysis as well as post hoc analysis. This not only addresses the escalating challenges of data storage and I/O overheads but also unlocks the potential for informed decision-making. Our method empowers researchers and experts to explore essential temporal dynamics while minimizing storage requirements, thereby fostering a more effective and intuitive understanding of complex data behaviors.
翻译:在数据生成迅猛增长的时代,管理和存储大规模时变数据集面临重大挑战。随着超算能力的提升,所产生的数据量急剧增加,进一步加剧了存储和输入/输出开销。为解决这一问题,我们提出了一种动态时空数据摘要技术,该技术识别关键时间步中的信息特征,并融合信息量较少的时间步。这种方法在保持数据动态特性的同时,最大限度地降低了存储需求。与现有方法不同,我们的方法同时保留原始时间步和摘要时间步,从而确保对随时间变化的信息的全面了解。我们利用信息论度量来指导融合过程,生成能够捕捉关键数据模式的可视化表示。我们在多种数据集上展示了该技术的适用性,包括基于粒子的流动模拟、安防监控应用以及免疫系统中的生物细胞交互。我们的研究为数据管理领域做出了重要贡献,在多学科交叉领域中引入了更高的效率和更深刻的洞察力。我们提供了一种简化的大规模数据处理方法,可应用于原位分析和后验分析。这不仅能应对日益严峻的数据存储和输入/输出开销挑战,还能释放出用于明智决策的潜力。我们的方法使研究人员和专家能够在最小化存储需求的同时探索关键的时间动态,从而促进对复杂数据行为更有效、更直观的理解。