While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications.
翻译:尽管数据库和高性能计算(HPC)领域均采用无损压缩方法来减少浮点数据规模,但两者之间仍存在隔阂。各领域以特定领域的方式设计并评估压缩方法,导致难以判断HPC压缩技术能否惠及数据库应用,反之亦然。随着HPC领域日益倾向于原位分析与可视化,来自科学仿真的更多浮点数据被存储在键值存储等数据库中,并通过内存检索范式进行查询。这一趋势凸显了综合研究这些压缩方法优势与局限的紧迫性——不仅需要评估其在多领域数据压缩中的性能,还需分析其运行时特征。本研究利用浮点压缩基准(FCBench)中整合的33个真实数据集,对来自两领域的八种基于CPU和五种基于GPU的压缩方法进行了全面性能评估。此外,我们采用屋顶线模型对其运行时瓶颈进行了剖析。旨在通过提供关于这些压缩方法的深入见解,帮助研究人员在集成数据库与HPC应用场景下选择现有方法或开发新方法。