Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz cannot decompress arbitrary gzip files. It requires the decompressed stream to only contain byte values 9-126. In this work, we present a generalization of the parallelization scheme used by pugz that can be reliably applied to arbitrary gzip-compressed data without compromising performance. We show that the requirements on the file contents posed by pugz can be dropped by implementing an architecture based on a cache and a parallelized prefetcher. This architecture can safely handle faulty decompression results, which can appear when threads start decompressing in the middle of a gzip file by using trial and error. Using 128 cores, our implementation reaches 8.7 GB/s decompression bandwidth for gzip-compressed base64-encoded data, a speedup of 55 over the single-threaded GNU gzip, and 5.6 GB/s for the Silesia corpus, a speedup of 33 over GNU gzip.
翻译:Gzip是一种广泛使用的文件压缩格式。尽管存在大量gzip实现,但只有pugz能够充分利用当前多核处理器架构进行解压缩。然而,pugz无法解压缩任意gzip文件,它要求解压缩流中仅包含字节值9-126。本文提出了一种对pugz所采用的并行化方案的泛化方法,该方法可在不牺牲性能的前提下可靠地应用于任意gzip压缩数据。我们证明,通过实现基于缓存与并行化预取器的架构,可以取消pugz对文件内容的要求。这种架构能够安全处理因线程尝试在gzip文件中间通过试错方式开始解压缩时可能产生的错误解压缩结果。使用128核处理器,我们的实现对于gzip压缩的base64编码数据达到了8.7 GB/s的解压缩带宽,相比单线程的GNU gzip实现了55倍加速;对于Silesia语料库达到了5.6 GB/s,相比GNU gzip实现了33倍加速。