Lossy compression, widely used by scientists to reduce data from simulations, experiments, and observations, can distort features of interest even under bounded error. Such distortions may compromise downstream analyses and lead to incorrect scientific conclusions in applications such as combustion and cosmology. This paper presents a distributed and parallel algorithm for correcting topological features, specifically, piecewise linear Morse Smale segmentations (PLMSS), which decompose the domain into monotone regions labeled by their corresponding local minima and maxima. While a single GPU algorithm (MSz) exists for PLMSS correction after compression, no methodology has been developed that scales beyond a single GPU for extreme scale data. We identify the key bottleneck in scaling PLMSS correction as the parallel computation of integral paths, a communication-intensive computation that is notoriously difficult to scale. Instead of explicitly computing and correcting integral paths, our algorithm simplifies MSz by preserving steepest ascending and descending directions across all locations, thereby minimizing interprocess communication while introducing negligible additional storage overhead. With this simplified algorithm and relaxed synchronization, our method achieves over 90% parallel efficiency on 128 GPUs on the Perlmutter supercomputer for real world datasets.
翻译:有损压缩被科学家广泛用于缩减来自模拟、实验与观测的数据,但即使在有界误差条件下,它仍可能扭曲感兴趣的特征。此类扭曲可能损害下游分析,并在燃烧学、宇宙学等应用中导致错误的科学结论。本文提出一种用于校正拓扑特征的分布式并行算法,具体针对分段线性莫尔斯-斯梅尔分割(PLMSS)——该分割将定义域分解为由对应局部极小值与极大值标记的单调区域。虽然已有针对压缩后PLMSS校正的单GPU算法(MSz),但尚未开发出能够超越单GPU范围以适应极端规模数据的方法。我们指出扩展PLMSS校正的关键瓶颈在于积分路径的并行计算,这是一种通信密集且 notoriously 难以扩展的计算过程。我们的算法通过在所有位置保持最陡上升与下降方向,而非显式计算并校正积分路径,从而简化了MSz,在引入可忽略额外存储开销的同时最大限度地减少了进程间通信。借助该简化算法及宽松的同步机制,我们的方法在Perlmutter超级计算机的128个GPU上对真实世界数据集实现了超过90%的并行效率。