Optimizing Scientific Data Transfer on Globus with Error-bounded Lossy Compression

The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision data, and thus researchers may wish to compromise on data precision to reduce transfer and storage costs. Error-bounded lossy compression presents a promising approach as it can significantly reduce data volumes while preserving data integrity based on user-specified error bounds. In this paper, we propose a novel data transfer framework called Ocelot that integrates error-bounded lossy compression into the Globus data transfer infrastructure. We note four key contributions: (1) Ocelot is the first integration of lossy compression in Globus to significantly improve scientific data transfer performance over wide area network (WAN). (2) We propose an effective machine-learning based lossy compression quality estimation model that can predict the quality of error-bounded lossy compressors, which is fundamental to ensure that transferred data are acceptable to users. (3) We develop optimized strategies to reduce the compression time overhead, counter the compute-node waiting time, and improve transfer speed for compressed files. (4) We perform evaluations using many real-world scientific applications across different domains and distributed Globus endpoints. Our experiments show that Ocelot can improve dataset transfer performance substantially, and the quality of lossy compression (time, ratio and data distortion) can be predicted accurately for the purpose of quality assurance.

翻译：随着科学数据体量和流速的持续增长，常规研究活动需要频繁转移海量数据。然而，有限的广域网带宽常导致研究进展出现瓶颈。但在许多场景中，消费端应用（如分析、可视化和机器学习）可在降低精度数据上获得可接受的性能表现，因此研究者可能希望权衡数据精度以降低传输与存储成本。误差有界有损压缩提供了一种具有前景的方案——它能在用户指定误差边界的前提下，通过显著压缩数据量同时确保数据完整性。本文提出名为Ocelot的新型数据传输框架，该框架将误差有界有损压缩集成至Globus数据传输基础设施。我们归纳了四项关键贡献：（1）Ocelot是首个将有损压缩集成至Globus的方案，可显著提升广域网科学数据传输性能；（2）提出基于机器学习的有效有损压缩质量评估模型，该模型能预测误差有界有损压缩器的质量，这对确保传输数据满足用户要求至关重要；（3）开发优化策略以减少压缩时间开销、应对计算节点等待时间并提升压缩文件传输速度；（4）采用跨不同领域的真实科学应用及分布式Globus端点进行评估。实验表明，Ocelot能大幅提升数据集传输性能，且可准确预测有损压缩质量（时间、压缩比及数据失真度），从而保障数据质量。