We present BLITZCRANK, a high-speed semantic compressor designed for OLTP databases. Previous solutions are inadequate for compressing row-stores: they suffer from either low compression factor due to a coarse compression granularity or suboptimal performance due to the inefficiency in handling dynamic data sets. To solve these problems, we first propose novel semantic models that support fast inferences and dynamic value set for both discrete and continuous data types. We then introduce a new entropy encoding algorithm, called delayed coding, that achieves significant improvement in the decoding speed compared to modern arithmetic coding implementations. We evaluate BLITZCRANK in both standalone microbenchmarks and a multicore in-memory row-store using the TCPC-C benchmark. Our results show that BLITZCRANK achieves a sub-microsecond latency for decompressing a random tuple while obtaining high compression factors. This leads to an 85% memory reduction in the TPC-C evaluation with a moderate (19%) throughput degradation. For data sets larger than the available physical memory, BLITZCRANK help the database sustain a high throughput for more transactions before the l/O overhead dominates.
翻译:本文提出BLITZCRANK,一种面向OLTP数据库的高速语义压缩器。现有方案在压缩行存储时存在不足:或因压缩粒度粗导致压缩率低,或因处理动态数据集效率低下而性能欠佳。为解决这些问题,我们首先提出新型语义模型,支持对离散与连续数据类型的快速推断和动态值集处理。随后引入名为延迟编码的新型熵编码算法,与现代算术编码实现相比,该算法在解码速度上取得显著提升。我们通过独立微基准测试和基于TPC-C基准的多核内存行存储系统对BLITZCRANK进行评估。实验结果表明,BLITZCRANK在实现高压缩率的同时,对随机元组的解压缩延迟低于微秒级。在TPC-C评估中,该系统实现了85%的内存占用降低,而吞吐量仅适度下降(19%)。对于超过可用物理内存的数据集,BLITZCRANK能帮助数据库在I/O开销占主导前,维持更高的事务处理吞吐量。