We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P$^{2}$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P$^{2}$-LLM can beat SOTA classical and learned codecs.
翻译:近来我们观察到,“智能”与“压缩”实为同一枚硬币的两面,其中具备空前智能的语言大模型(LLM)可作为一种面向多种数据模态的通用无损压缩器。这一特性对无损图像压缩领域尤其具有吸引力,因为在当前流媒体时代,对高分辨率图像的压缩需求日益增长。因此,一个自然的设想随之产生:LLM的压缩性能能否将无损图像压缩推向新的高度?然而,我们的研究发现,在常见基准数据集上,基于LLM的无损图像压缩器的简单应用与现有最先进(SOTA)编解码器相比存在显著的性能差距。鉴于此,我们致力于在无损图像压缩任务中充分释放LLM的空前智能(压缩)潜力,从而弥合理论与实际压缩性能之间的鸿沟。具体而言,我们提出了P$^{2}$-LLM,一种基于下一像素预测的LLM。该模型整合了多种精心设计的见解与方法,例如像素级先验、LLM的上下文能力以及像素级语义保持策略,以增强模型对像素序列的理解能力,从而实现更优的下一像素预测。在基准数据集上的大量实验表明,P$^{2}$-LLM能够超越SOTA的经典及学习型编解码器。