While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications.
翻译:数据库与高性能计算(HPC)领域均采用无损压缩方法来减少浮点数据体积,但两者之间仍存在隔阂。各领域以特定方式设计和评估压缩方法,导致难以判断HPC压缩技术能否惠及数据库应用,反之亦然。随着HPC领域日益倾向原位分析与可视化,来自科学模拟的浮点数据正越来越多地存储于键值存储等数据库中,并通过内存检索范式进行查询。这一趋势凸显了系统研究这些压缩方法优势与局限的迫切性——不仅需评估其跨域数据压缩性能,更要剖析其运行时特征。本研究基于浮点压缩基准(FCBench)中汇集33个真实世界数据集,全面评估了两大领域开发的8种基于CPU与5种基于GPU的压缩方法性能。同时,利用屋顶线模型刻画其运行时瓶颈。我们旨在揭示这些压缩方法的特性,为研究人员在数据库与HPC融合应用中选择现有方法或开发新型方法提供参考。