This paper introduces EXaCTz, a parallel algorithm that concurrently preserves extremum graphs and contour trees in lossy-compressed scalar field data. While error-bounded lossy compression is essential for large-scale scientific simulations and workflows, existing topology-preserving methods suffer from (1) a significant throughput disparity, where topology correction speeds are on the order of MB/s, lagging orders of magnitude behind compression speeds on the order of GB/s, (2) limited support for diverse topological descriptors, and (3) a lack of theoretical convergence bounds. To address these challenges, EXaCTz introduces a high-performance, bounded-iteration algorithm that enforces topological consistency by deriving targeted edits for decompressed data. Unlike prior methods that rely on explicit topology reconstruction, EXaCTz enforces consistent min/max neighbors of all vertices, along with global ordering among critical points. As such, the algorithm enforces consistent critical-point classification, saddle extremum connectivity, and the preservation of merge/split events. We theoretically prove the convergence of our algorithm, bounded by the longest path in a vulnerability graph that characterizes potential cascading effects during correction. Experiments on real-world datasets show that EXaCTz achieves a single-GPU throughput of up to 4.52 GB/s, outperforming the state-of-the-art contour-tree-preserving method (Gorski et al.) by up to 213x (with a single-core CPU implementation for fair comparison) and 3,285x (with a single-GPU version). In distributed environments, EXaCTz scales to 128 GPUs with 55.6\% efficiency (compared with 6.4\% for a naive parallelization), processing datasets of up to 512 GB in under 48 seconds and achieving an aggregate correction throughput of up to 32.69 GB/s.
翻译:本文提出EXaCTz算法,这是一种面向有损压缩标量场数据的并行算法,可同步保持数据的极值图与等高线树。尽管误差有界的有损压缩对于大规模科学模拟及工作流至关重要,但现有拓扑保持方法存在以下缺陷:(1)显著的吞吐量差距——拓扑校正速度仅为MB/s量级,落后于压缩速度(GB/s量级)数个数量级;(2)对多样化拓扑描述符的支持有限;(3)缺乏理论收敛界。为应对上述挑战,EXaCTz采用高性能有界迭代算法,通过对解压缩数据实施定向修正来保证拓扑一致性。不同于依赖显式拓扑重构的既有方法,EXaCTz强制执行所有顶点极小/极大邻域的一致性,以及临界点间的全局序关系。该算法因此可保证临界点分类、鞍点-极值连通性及合并/分裂事件的一致性。我们从理论上证明了算法的收敛性,其收敛界由表征校正过程中潜在级联效应的脆弱性图中的最长路径决定。在真实数据集上的实验表明,EXaCTz在单GPU环境下可达4.52 GB/s的吞吐量,相较于现有最先进的等高线树保持方法(Gorski等)在单核CPU公平对比场景中提速213倍,在单GPU版本对比中提速3,285倍。在分布式环境中,EXaCTz可在55.6%的并行效率下扩展至128个GPU(对比朴素并行化的6.4%),于48秒内处理512 GB数据集,聚合校正吞吐量最高达32.69 GB/s。